Bibliographic API – HathiTrust Digital Library (original) (raw)

This API returns bibliographic, rights, and volume information when given a single or multiple standard identifiers (ISBN, LCCN, OCLC, etc.). It is intended for use to retrieve information about small numbers of items at a time. The HathiTrust Bibliographic API is not a search API (e.g., where you use a keyword to search across the collection).

Bulk retrieval should be done using OAI or the HathiTrust tab-delimited inventory files (hathifiles). Note that use of the data may be subject to third-party agreements, such as OCLC’s Record Use policy. Permission must be sought for bulk retrieval of OCLC records by non-OCLC members.

HathiTrust Metadata Sharing: Under HathiTrust Digital Library’s (HTDL) Metadata Sharing Policy, independent users, member institutions, and other third parties are free to harvest (for example, through our OAI feed or the HathiFiles), modify and/or otherwise make use of any metadata contained in HTDL unless restricted by contractual obligations residing with the parties that have contributed the metadata (“Depositing Institutions”) to HTDL. Furthermore, HTDL provides no warranties on the data made available through any sharing mechanisms. Use of the data is undertaken at the user’s own risk. Any contributions made by HTDL to the metadata in the repository have been placed into the public domain by HTDL via a CC0 Public Domain Dedication.

Definitions:

For the purposes of this specification

Simple, single-identifier API

In the simplest case, to retrieve volume information based on a single identifier, the following syntax would be used:

https://catalog.hathitrust.org/api/volumes/**brief**//.json https://catalog.hathitrust.org/api/volumes/**full**//.json

The difference between a _brief_and full API request is that complete MARC-XML is returned in a full response.

For example, to get information about any item(s) associated with records that have the OCLC number 424023 (Infinite Series), the request would be:

https://catalog.hathitrust.org/api/volumes/brief/oclc/424023.json

or

https://catalog.hathitrust.org/api/volumes/full/oclc/424023.json

Valid id types are:

The data should be url-encoded if necessary; LCCN’s in particular tend to have spaces and forward-slashes in them.

The Basic JSON return structure

The return value will look like this:

{
“records”:{
“000578050”:{
“recordURL”:“https:\/\/catalog.hathitrust.org\/Record\/000578050”,
“titles”:[“Infinite series.”],
“isbns”:[“0030110408”],
“issns”:[],
“oclcs”:[“00424023”],
“lccns”:[“62009520”],
“marc-xml”: “the marc-xml, only if requested via a full URL”
}
},
“items”:[
{
“orig”:“University of California”,
“fromRecord”:“000578050”,
“htid”:“uc1.b4405602”,
“itemURL”:“https:\/\/hdl.handle.net\/2027\/uc1.b4405602”,
“rightsCode”:“ic”,
“lastUpdate”:“20090903”,
“enumcron”:false,
“usRightsString”:“Limited (search-only)”
},
{
“orig”:“University of Michigan”,
“fromRecord”:“000578050”,
“htid”:“mdp.39015025315527”,
“itemURL”:“https:\/\/hdl.handle.net\/2027\/mdp.39015025315527”,
“rightsCode”:“ic”,
“lastUpdate”:“20090612”,
“enumcron”:false,
“usRightsString”:“Limited (search-only)”
}
]
}

There are two sections: records which holds basic metadata about the set of records which match the query, and items which lists the complete set of individual HathiTrust items (volumes) associated with those records.

The records section

The records structure is a hash keyed on the nine-digit record number of each matched record. It may easily contain multiple records, since duplicates, while not common, are certainly possible.

For each record, we list:

The items section

The items structure is an array of hashes describing all the available items associated with matched records. There may be multiple items because the record(s) in question describe a serial or multi-volume set, or because identical volumes were digitized at more than one contributing institution.

For each item, we list:

As noted, a reasonably-sophisticated attempt is made to sort items by their enumcron (when present), often resulting in the items listed correctly by volume/number. Variation in the way these data have been entered at different institutions and at different times makes it impractical to guarantee the order will be correct, but it is more often than not correct.

The multi-id request format

The multi-id request format allows two extensions to the simple API described above:

The basic URL structure for these requests is

https://catalog.hathitrust.org/api/volumes/brief//||...| https://catalog.hathitrust.org/api/volumes/full//||...|

A simple example to get items associated with a single record based on multiple identifiers would be

https://catalog.hathitrust.org/api/volumes/brief/json/id:BJD1;oclc:424023;isbn:0030110408 or https://catalog.hathitrust.org/api/volumes/full/json/id:BJD1;oclc:424023;isbn:0030110408

This example is looking for json results describing records (and the items attached to those records) that have the given OCLC number and/or the given ISBN. The local ID (from which the OCLC and ISBN were retrieved) is ‘BJD1’. The data returned will be identified with that string [so the HathiTrust data can be added to a local display].

The return type

…is always json at the moment. No other return types are offered (but please ask if you need something else).

The search specification and how a match is determined

A search specification is a set of : pairs separated by semi-colons. The id types and values are exactly as described for the Simple API.

It is possible to provide a special “id:MyID” pair as part of the search specification. If this is done, the basic JSON return structure associated with this search will be keyed on “MyID”. If not, it will be keyed on the whole search specification (colons, semicolons, etc.).

A record matches if all : pairs provided match or are not present. As an example, if an OCLC number and an LCCN number are provide (along with the ignored-for-matching id):

id:1;oclc:45678;lccn:70628581

Requesting several records at once

Up to 20 records may be requested at once by providing multiple search specifications separated by the pipe ( | ) character.

Regardless of whether a request is made for one record or many, what is returned is a set of basic JSON return structures keyed on the provided ids (or on the whole search specification if the request does not include an id).

So, the URL

https://catalog.hathitrust.org/api/volumes/brief/json/id:552;lccn:70628581|isbn:0030110408

…will return a hash with two elements. One is keyed on ‘552’ (the provided id) and the other on ‘isbn:0030110408′ (since no id was provided).

JSON-P requests supported

JSON-P is a convention used to allow cross-site scripting in AJAX calls to get JSON data. The Rights API supports JSON-P requests — just add ‘&callback=” to the end of your URL.