cataloging « The LibraryThing Blog (original) (raw)

Archive for the ‘cataloging’ Category

Five million more Library of Congress records

trash_lc

We’ve recently imported 5,148,400 new records from the Library of Congress into OverCat, LibraryThing’s data repository. This brings the total number of records in OverCat to nearly 78 million! That’s 78 million high-quality library (MARC) records you can use in cataloging on LibraryThing.

This new dataset was produced by the Library of Congress from records in the 2014 Retrospective file sets—the most recent currently available. The Library of Congress provides these MARC records as part of its MARC Open Access program. Although LibraryThing adds MARC records to Overcat as members search for them, this is our first mass update from Library of Congress data since OverCat debuted in 2010.

Thanks to developer CCatalfo‘s efforts to make this happen. Notice anything different in OverCat? Join the discussion and tell us about it on Talk.

Further reading

Labels: cataloging, library of congress

BOOM! Add Books Adds 749 Library Sources, 38 New Countries

UPDATE: As of today (May 19th), we’ve reached a grand total of 2,160 working library sources, covering 110 countries! See the updated map at right reflecting our latest stats. New countries include: Ethiopia, Egypt, Bahrain, Nepal, Belarus, Luxembourg, (Northern) Cyprus, and the US Virgin Islands.


Last week we announced six new data sources: Amazon in India, Brazil, Italy, Mexico, Spain and China.

Today we’re announcing a far larger advance in sources—a leap from 426 working library sources last week to 1,175 working library sources today! For this, as we will explain, we have LT members to thank.

All told, we’ve gone from sources in 40 countries before, to sources in 78 countries now, covering many new regions and languages.

Entirely new sources total 668, but another 81 were fixed—sources that had died sometime in recent years. Other “working” sources were tweaked, fixing search and character-set problems.

Dead sources accumulated because LibraryThing didn’t have the staff resources, or a good system to monitor and edit existing sources. We now have a new, interactive system for adding, editing and testing library sources. And we have also opened this up to members, starting with a hand-picked set of librarians and library workers with experience handling these systems (z39.50 servers).

We expected we’d get help, but we were astounded by how much. Top honors go to davidgn, who added more than 500 new libraries, and fixed many as well. Members lesmel and bnielsen also contributed considerably, together with LT staffer Chris Catalfo, who wrote the code for the new system. A round of applause for all!

New Sources, New Countries, New Languages

At the top of this post is an animation demonstrating the growth of the sources—initial sources, new countries (red), and finally, where we are today.You can see the individual frames here, here, and here.

You can see big advances in Central and South America, which went from one source in one country to 35 sources in nine countries. Africa went from 0 countries to six, and many were added in Eastern Europe, the Middle East, and East Asia. The countries that already had many sources also grew—the UK went from 44 to 60, Canada from 42 to 106 and the USA from 261 to 544! (The generosity and public-spiritedness of American public and academic libraries in providing open z39.50 connections is truly remarkable.)

Some of the most useful and important new sources are:

North America: Brooklyn Public Library, California State Library, Massachusetts Historical Society (USA), National Library Service for the Blind and Physically Handicapped (USA), Maine State Library (Maine), Vancouver Public Library (Canada), University of Toronto (Canada), University of Waterloo (Canada), University of Ottawa (Canada), Instituto Politécnico Nacional (Mexico).

South America: Pontificia Universidad Javeriana (Colombia), Biblioteca Nacional Mariano Moreno (Argentina), Universidade de São Paulo (Brazil), Pontificia Universidad Católica del Perú (Peru).

Europe: London School of Economics (UK), University of Warwick (UK), University of Cyprus (Cyprus), Armenian Libraries Union Catalog (Armenia), FENNICA and VIOLA, the national bibliography and discography of Finland, Latvian Academic Union Catalog, Biblioteca Nacional de Portugal (Portugal), Universidade de Coimbra (Portugal), Universitat Politècnica de Catalunya (Spain/Catalonia), Universidad de Sevilla (Spain).

Africa and the Middle East: University of Ghana, American University of Kuwait, American University of Beirut, University of Lagos (Nigeria), Qatar Faculty of Islamic Studies, Sultan Qaboos University (Oman), National University of Lesotho, Ege Üniversitesi (Turkey).

Asia and Oceanea: University of Melbourne (Australia), Okayama University (Japan), National Taiwan University, University of Macao, Africa University (Zimbabwe).

A New, User-Editable Sources System

As mentioned above, the updates were made possible by a new system which allows select LibraryThing members to edit and add library sources. Those members are able to change any out of date connection parameters, which have been a perennial problem as libraries change systems and settings over time.

See the screenshots on the right for how it works.

How can you help?

Post your feedback and questions on Talk. If you have a library you’d like to be able to use in cataloging your books here on LibraryThing, post them on that same Talk thread! Going forward, you can post about it in the Recommended Site Improvements group at any time.

If you’re a librarian or library professional who’d like to help with updating and adding new sources, get in touch with our developer Chris Catalfo (ccatalfo) and we’ll add you to the group Library Add Books Sources Maintenance, which opens up source editing. Because the details are so technical, and there’s some danger of messing things up, we’re making group membership by request only.

Labels: cataloging, new features

Six New Sources: Amazon India, Italy, Brazil, Spain, Mexico, and China

We’re pleased to announce the addition of six new Amazon sites to LibraryThing’s cataloging sources. They are:

This is big news, because although we’ve had academic library sources for these countries and languages, Amazon has far more books for most readers, and is always faster.

UPDATE: Books, Music, and Movies

Initially these sources were available for books only. However, we’ve now added movies and music data from all but one of them. Amazon Brazil only has data for books available. Amazon India, Italy, Spain, Mexico, and China all have the option to search their books, music, and movies data.

To use them, go to Add Books, look under “Search where?” on the left-hand side of the page, and click “Add from 1077 sources.”

If you run into any issues, or have other feedback or questions, post them on Talk.

LibraryThing in Not-English?

Many members don’t know, but LibraryThing is available in more than a dozen languages, including ones for the new sources:

All translations have been done by members—an amazing amount of love and effort. Other sites include French, Germany, and our best-maintained translation, Catalan. See all of them.

Labels: cataloging, new features

Music and movie cataloging (but we’re still a book site)

Short version: LibraryThing is and will remain a book site. But we never stopped people from cataloging other media, like movies and music. We’re now making it much easier to do. Check it out and add your non-book library at https://www.librarything.com/addbooks.

Medium version: LibraryThing is a book site, and will remain so. But many members, especially our small libraries, have always cataloged other media, such as movies and music. We allowed it, but didn’t support it well at all. In particular, we disabled non-book searching on Amazon, allowing it only on our library sources.

A few months ago we introduced a robust concept of media format. We’ve now opened up cataloging other media on the Amazon sources, which are far easier and better for the purpose.

Check it out at https://www.librarything.com/addbooks

trash_moviesmusic

Long version:

Why Are We Doing This? Adding other media has been planned for years. The main driver has been small libraries—churches, community centers, small museums, etc.—a major constituent of LibraryThing’s success. Although small libraries mostly collect books, they don’t limit themselves to books any more than public and academic libraries do. Our failings in the area really hurt us.

This change means that LibraryThing is now a “complete” cataloging system. This lets us reach small libraries as we never could before—something we plan to do even more strongly when TinyCat debuts.

We are also conscious that many “regular” members wanted to catalog their non-book libraries. I want to, anyway, and I know I’m not alone.

Worried? We are conscious of some members’ worries, for example that LibraryThing is “turning into” a movie site. These are valid concerns. Here’s how we responded and will respond:

Screenshot 2015-09-14 14.16.30

Movies have been on LibraryThing for a long time.

New Features. The following features have been added, or changed, in order of importance.

Cataloging Non-Books Media. Movies and music aren’t books, but libraries catalog them with some of the same basic structure and concepts. Movies and music have titles, publication dates, subjects, Dewey classifications, etc. “Authors” is more complex. Library records generally mix directors, actors, producers and screenwriters into one set of contributors, with their roles not always marked. Amazon records are better here, clearly delineating the various roles. But they don’t have the name-control libraries have.

We’ve solved this as follows:

Let Us Know. Let us know what you think on Talk.

Labels: cataloging, new feature, new features

Edit and reorder sources in Add Books

Good news: We’ve improved the sources system within Add Books a lot.

Bad news: We had to transition to an entirely new sources system. Most members kept their sources, but some members and some sources couldn’t go into the new system easily. If you lost sources, you may need to choose them again. Fortunately, the new system’s a lot better at that.

You can find the new options on Add Books:
searchwhere

Everything now happens in a light box. The “Your Sources” tab allows you to reorder and delete sources.
yoursources

You can browse and choose sources, divided into “Featured” and “All Sources” on the other two tabs.
featured

As you’ll notice, a fair number of our sources are currently down. We’re working to get as many up again as possible, and add new ones. If you’d like to help and know something about Z39.50 connections, you’ll find we give our current connection details when you click the yellow warning marker.

You’ll also see other, very significant new stuff. But that’s a matter for another blog post!

Three cheers to our developer Ammar for the add-books changes!

Labels: cataloging, new features

New Feature: MARC Import

This is not a bobcat

MARC is the library standard for bibliographic records. We’ve always parsed MARC records behind the scenes, when members searched one of our 700 library sources, or our Overcat collection. A few years ago, we introduced the ability export your LibraryThing collections as MARC records, even if your records didn’t start out in MARC.

Now, we’re adding the last piece: MARC importing, for all the small but professionally-cataloged libraries that use LibraryThing.

Try it Out. Check it out on Import or directly to MARC Import.

How it works. To use MARC import, you’ll need to have your library data in a .marc file format. Depending on how large a file you’ve got, the import process may take a few minutes. The good news is, you’ll receive a notification from LibraryThing once it’s ready. From there, you’ll be able to review your import options—just like you would with any other import—and select the collections, tags, etc. you’d like to apply to the items you’re importing.

What is MARC? MARC stands for Machine-Readable Cataloging. It represents a set of digital formats for describing items held by libraries: books, maps, CDs/DVDs, etc. You name it, if it’s in a library, MARC can handle it. Libraries the world over use MARC to standardize their item records in such a way that information about different types of items can all be fed into (and retrieved from) cataloging systems uniformly.

MARC fields are denoted by numerical tags, that indicate what type of information is contained in that field. For example, the title of a given work is always in field 245.

Don’t Upload The New York Public Library! This is for small—or, better the _tiny_—libraries that use MARC records and LibraryThing. Uploads are capped at 10,000 records total, so don’t try to upload 100,000 records. “Regular” libraries, big and small, should check out LibraryThing for Libraries, a remarkable suite of catalog enhancements.

Questions? Comments? Let us know what you think on Talk.

Labels: cataloging, new features, small libraries

Subjects and the Ship of Theseus

I thought I might take a break to post an amusing photo of something I wrote out today:

subjecttables

The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp.

About eight of the tables do what a good cataloging system would do:

Most of the tables, however, satisfy LibraryThing’s unusual core commitments: to let users do their own thing, like their own little library, but also to let them benefit from and participate in the data and contributions of others.(1) So it:

The last is perhaps the hardest. Nine years ago (!) I compared LibraryThing to the “Ship of Theseus,” a ship which is “preserved” although its components are continually changed. The same goes for much of its data, although “shifting sands” might be a better analogy. Accounting for this makes for some interesting database structures, and interesting programming. Not every system at LibraryThing does this perfectly. But I hope this structure will help us do that better for subjects.(3)

Weird as all this is, I think it’s the way things are going. At present most libraries maintain their own data, which, while generally copied from another library, is fundamentally siloed. Like an evolving species, library records descend from each other; they aren’t dynamically linked. The data inside the records are siloed as well, trapped in a non-relational model. The profession that invented metadata, and indeed invented sharing metadata, is, at least as far as its catalogs go, far behind.

Eventually that will end. It may end in a “Library Goodreads,” every library sharing the same data, with global changes possible, but reserved for special catalogers. But my bet is on a more LibraryThing-like future, where library systems will both respect local cataloging choices and, if they like, benefit instantly from improvements made elsewhere in the system.

When that future arrives, we got the schema!


1. I’m betting another ten tables are added before the system is complete.
2. The system doesn’t presume whether changes will be made unilaterally, or voted on. Voting, like much else, existings in a separate system, even if it ends up looking like part of the subject system.
3. This is a long-term project. Our first steps are much more modest–the tables have an order-of-use, not shown. First off we’re going to duplicate the current system, but with appropriate character sets and segmentation by thesaurus and language.

Labels: cataloging, subjects

Harvard University’s 12 million records now in LibraryThing

Short version. Our “Overcat” search now includes 12.3 million records from Harvard University!

Long version. On April 24 the Harvard Library announced that more than 12 million MARC records from across its 73 libraries would be made available under the library’s Open Metadata policy and a Creative Commons 0 public domain license. The announcement stunned the library world, because Harvard went against the wishes of the shared-cataloging company OCLC, who have long sought to prevent libraries from releasing records in this way. (For background on OCLC’s efforts see past blog posts.)

It took a while to process, but we’ve finally completed adding all 12.3 million MARC records (3.1GB of bibliographic goodness!) to LibraryThing. They’ve gone into OverCat, our giant index of library records from around the world—now numbering more than 51 million records! As a result, when searching OverCat under “Add books,” you’ll now see results “from Harvard OpenMetadata.”

This release (“big data for books,” as David Weinberger calls it) is, to put it mildly, a Very Big Deal. Harvard’s collections are both deep and broad, covering a wide variety of languages, fields, and formats. The addition of these 12 million records to OverCat has significantly improved our capacity for the cataloging of scholarly and rare books, and greatly enhanced our coverage generally.

Kudos to Harvard for making this metadata available, and we hope that other libraries will follow suit.

For more on the metadata release, see Quentin Hardy’s New York Times blog post, the Dataset description, or the Open Metadata FAQ. And happy cataloging!

Come discuss here.


Harvard requests and we’re happy to add: The “Harvard University Open Metadata” records in OverCat contain information from the Harvard Library Bibliographic Dataset, which is provided by the Harvard Library under its Bibliographic Dataset Use Terms and includes data made available by, among others, OCLC Online Computer Library Center, Inc. and the Library of Congress.

Labels: cataloging, open data

Occupy Libraries!

It’s been fascinating to watch the rise of libraries at the various Occupy sites around the world, particularly the impressively-large collection at Occupy Wall Street known as the People’s Library. We reached out and suggested a LibraryThing account for the collection, and the volunteer librarians in Zucotti Park responded enthusiastically.

The OWSLibrary catalog now includes more than 3,300 titles, and it’s quite a rich and varied collection (check out the tag mirror). We’ve got a Talk thread where members are posting the books they share with the library; as of this morning, I share 100 titles with them, everything from E.O. Wilson to Annie Dillard to Strunk & White. If you’re signed into LibraryThing, you can see what you share with the OWS Library here.

The OWSLibrary folks also have an active blog, Twitter, and Flickr presence (they’ve even got library stamps!). Many authors have visited to speak, lend support, and sign books, and there’s now even an Occupy Wall Street Poetry Anthology.

More than 1,300 writers have signed the Occupy Writers petition in support of the Occupy movement, including Margaret Atwood, Neil Gaiman, Junot Díaz and more.

You can read some good coverage of the Occupy library movement in American Libraries, the Chronicle of Higher Education, and the Wall Street Journal.

On Friday, local librarian JustinTheLibrarian, Tim and I went downtown on our lunch break and cataloged the Occupy Maine library, a small collection housed at Portland’s Spartan Grill restaurant (which also serves a very tasty gyro).

Occupy Sacramento’s library is also up on LibraryThing, and we’ve been in touch with various other Occupy libraries; if your city’s library joins up, we’d love to know about it!

While you may agree or disagree with the Occupy movement as a whole, we think what they’re doing with books and libraries is simply awesome. And we’re very happy to be a part of it.

Labels: cataloging, flash-mob cataloging, libraries

VIAF, OCLC and open data

Yesterday I released a service called “LC AuthoritiesThing.” The service solved a problem many have had with the LC Authorities website. Although a fine searchable resource, LC Authorities does not have stable URLs. Links die after a short period and are tied to sessions in a way that prevents sharing URLs during that period. LC AuthoritiesThing provides a window into the LC Authorities site which allows hard, reliable links. Various catalogers have thanked us for making the service, as it will allow them to refer to authority records more easily.

As an update to the post I took notice of VIAF, the Virtual Authority File, recommended to me as a substitute by a cataloger on Twitter. I assumed (apparently wrongly) that VIAF would at some point supercede LC Authorities. And I wrote that VIAF wasn’t a good substitute because it is an OCLC project, and encumbered by licensing restrictions.

Since then, I have received a diversity of communications that I am wrong. Although its data is hosted by and its services were developed and served by OCLC, VIAF is not an OCLC project, and the project has no access terms. Thomas Hickey from OCLC even wrote on this blog that full dumps are also available, although they must be approved somehow by project leaders.

This is welcome news. LibraryThing will be submitting a request for a full VIAF dump, and we’ll see where that goes. We will also look into automated harvesting of the website, or at least the LC portion of the data.

So much so good. But the situation is illustrative. Select people within the library community may believe that VIAF is free. But every public indication is that it is not free.

These indications include:

  1. OCLC copyright notices on every single VIAF.org page, and all VIAF-related pages on OCLC.org.
  2. Links to the OCLC Terms and Conditions from multiple VIAF.org pages, including the Privacy page.
  3. A robots.txt file that prohibits automated access to result pages.
  4. The “About VIAF” project page prominently states “Use of our prototypes is subject to OCLC’s terms and conditions. By continuing past this point, you agree to abide by these terms.”

As all catalogers surely know, the OCLC Terms and Conditions are lengthy and explicit. Among other things they prohibit commercial use, automated use, storage of data, and use of the data for cataloging (!). They state that OCLC has sole and arbitrary discretion to discontinue access to anyone for any reason. They state that exceptions to the terms requires permission in writing from OCLC.

Meanwhile, apart from a blog comment from Thom Hickey, I can find no assertions that OCLC terms don’t apply to VIAF, no mention of dumps or of a process to get them.

VIAF is to be commended for its openness and lack of terms. This is a great move forward for open bibliographic data. But it needs to make greater efforts to make others aware of this state of affairs, and define the level and character of openness. (It’s still unclear to me whether VIAF asserts any ownership, or whether it is all in the public domain.) And VIAF should make efforts to remove multiple statements asserting that OCLC terms apply to VIAF data.

Labels: cataloging, oclc

Library catalogs are notorious for their URL structure. More than a decade after the rest of the web decided on solid, permanent links, most library systems continue to generate ephemeral, usually session-based ones. Sometimes catalogs have a syntax for permanent links, but they’re a special, added feature.

The problem is at its worst with the Library of Congress Authorities system, used by catalogers and librarians the world over. The core of authority control is a stable identifier, in this case the LCCN, but the LC Authorities catalog can neither be searched by nor linked to by that identifier. No matter what URL you find, it dies when the session dies. You can’t even link to searches. What ought to be a rock is a puff of smoke.

The problem was been solved for Subject Authority files when the Library of Congress released the Authorities and Vocabularies website, which allows linking to subjects by their LCCN (eg., sh85026719). But name-authority files (ie., authors) have received no similar treatment.

LC AuthoritiesThing is a partial and tentative solution to that problem, a window into the Library of Congress Authorities catalog that allows permanent linking. Search for a name (or subject) and, when you find it, the page will have a tiny link icon () which serves as the permalink for the page.

Example: http://www.librarything.com/LCAT/LCCN=no2010139263

It took a little magic to get it to work, but it does.* For now at least, you can’t link to records you haven’t found. If there’s interest, I will inject Simon Spero’s ingenious screen-scrape dump of LC Authority files, which will give me the necessary link between 001 and 035 fields.

For now, it’s just an experiment. Will anyone find it useful? Is it worth putting on its own domain? What would make it better? I know, anyway, that it can be of some use to LibraryThing. In the near future I plan to bolt it to LibraryThing itself, so members can link authors to their LC Authority number, when the link will help clarify things.

If you have any thought, discuss them here.

Update: It’s been objected that LC Authorities has or will be superseded by VIAF, the Virtual International Authority File, an aggregate of authority files from libraries around the world. Unfortunately, VIAF is another OCLC project, studded on every side by copyright assertions, EULAs, use restrictions and licensing terms. As with most everything else OCLC does, the core information was created at taxpayer expense, and is legally impossible to copyright. The rest was created by libraries with no intention of creating a proprietary resource. And the result is another proprietary, restricted and nigh-inescapable data monopoly.


*Behind the scenes it’s doing both proxied requests and stepping through pages as if it were. If anyone can come up with a better way, I’m all ears.

Labels: cataloging

LibraryThing gets work-to-work relationships!

Today we’ve launched some new ways to display relationships between works.

The concept covers works that contain other works, or are contained by them. It also covers retellings, abridgments, parodies, commentaries on and so forth.

Thus, LibraryThing members will be able to add relationships that show:

A core concept here is that this is only for work-level relationships. Therefore, we are not doing “translation of,” “facsimile edition of,” etc. Members are asked to connect only existing works, not make up new, so-far uncataloged works.

Come discuss rules, concepts and ideas in the Talk topic.

We’ve got a lot more coming that builds and expands on these capabilities, so stay tuned!

Many thanks to the members of Board for Extreme Thing Advances group, who’ve been helping us develop and refine this feature. They have already added some 4,500 contains/contained-in relationships across LibraryThing.

Labels: cataloging, work pages, works