NCBI Bookshelf: books and documents in life sciences and health care - PubMed (original) (raw)

. 2013 Jan;41(Database issue):D1251-60.

doi: 10.1093/nar/gks1279. Epub 2012 Nov 29.

Affiliations

NCBI Bookshelf: books and documents in life sciences and health care

Marilu A Hoeppner. Nucleic Acids Res. 2013 Jan.

Abstract

Bookshelf (http://www.ncbi.nlm.nih.gov/books/) is a full-text electronic literature resource of books and documents in life sciences and health care at the National Center for Biotechnology Information (NCBI). Created in 1999 with a single book as an encyclopedic reference for resources such as PubMed and GenBank, it has grown to its current size of >1300 titles. Unlike other NCBI databases, such as GenBank and Gene, which have a strict data structure, books come in all forms; they are diverse in publication types, formats, sizes and authoring models. The Bookshelf data format is XML tagged in the NCBI Book DTD (Document Type Definition), modeled after the National Library of Medicine journal article DTDs. The book DTD has been used for systematically tagging the diverse data formats of books, a move that has set the foundation for the growth of this resource. Books at NCBI followed the route of journal articles in the PubMed Central project, using the PubMed Central architectural framework, workflows and processes. Through integration with other NCBI molecular databases, books at NCBI can be used to provide reference information for biological data and facilitate its discovery. This article describes Bookshelf at NCBI: its growth, data handling and retrieval and integration with molecular databases.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Growth of Bookshelf. The term ‘title’ represents a book in the database. The spike in the book title count in 2010 was due to data restructuring of the Health Services/Technology Assessment Texts (HSTAT;

http://www.ncbi.nlm.nih.gov/books/NBK16710/

) database content. Previously, reports in this database were not counted as individual titles.

Figure 2.

Figure 2.

Data processing workflow. XML stored in the Bookshelf CMS goes through several stages of data processing before loading into the book database. Ingest, the first stage of data conversion results in the creation of a tar file package, containing XML, PDF, image and supplementary files. During the ‘chop-it-up’ process, the single book XML document with root element

is chunked into multiple article-like book-part XML documents with root element . Text and image conversion processing occurs concurrently. XML files loaded into the book database are dynamically rendered to HTML for viewing in the book viewer application and are indexed in Entrez for search and retrieval.

Figure 3.

Figure 3.

Reciprocal linking between Bookshelf and PubChem. (A) A search in PubChem for a substance AOI987 reveals links to book content under ‘Find related data’. (B) The relevant record in Bookshelf shows the chemical structure for the PubChem substance. This structure renders dynamically in the Bookshelf page from PubChem database. Clicking on the image links to the record in PubChem.

Figure 4.

Figure 4.

Searching, browsing and reading books. (A) Search in Entrez across Bookshelf. Clicking on ‘See all results for this book’ leads to view shown in B. (B) Search within a book. (C) Browse tool. Titles can be viewed using term filters or by selection of categories (see highlighted areas). (D) Content can be read in the viewer, which displays the citation (orange) and allows for navigation (red) and alternate views (blue).

Figure 4.

Figure 4.

Searching, browsing and reading books. (A) Search in Entrez across Bookshelf. Clicking on ‘See all results for this book’ leads to view shown in B. (B) Search within a book. (C) Browse tool. Titles can be viewed using term filters or by selection of categories (see highlighted areas). (D) Content can be read in the viewer, which displays the citation (orange) and allows for navigation (red) and alternate views (blue).

Figure 5.

Figure 5.

Integration of molecular genetic information in books. Data for Table A and B in the Molecular Genetics section of a GeneReviews article are computed based on OMIM entries in a specialized database. The information is dynamically pulled into the text at run time.

Similar articles

Cited by

References

    1. Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD. Molecular Biology of the Cell. 3rd edn. New York: Garland Science; 1994.
    1. Dean L, Orris R, Hoeppner M. Books with New Looks: The Bookshelf redesign. NLM Tech. Bull. 2010;377:e9.
    1. Dean L, Hoeppner M. Bookshelf 2011. NLM Tech. Bull. 2011;378:e8.
    1. Beck J. NISO Z39.96 The Journal Article Tag Suite (JATS): what happened to the NLM DTDs? J. Electron. Publ. 2011;14:106. - PMC - PubMed
    1. Latterner M, Hoeppner M. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010. 2010. Leafing through XML. National Center for Biotechnology Information, Bethesda, MD. http://www.ncbi.nlm.nih.gov/books/NBK47113/

Publication types

MeSH terms

LinkOut - more resources