A MOD(ern) perspective on literature curation - PubMed (original) (raw)

A MOD(ern) perspective on literature curation

Jodi Hirschman et al. Mol Genet Genomics. 2010 May.

Abstract

Curation of biological data is a multi-faceted task whose goal is to create a structured, comprehensive, integrated, and accurate resource of current biological knowledge. These structured data facilitate the work of the scientific community by providing knowledge about genes or genomes and by generating validated connections between the data that yield new information and stimulate new research approaches. For the model organism databases (MODs), an important source of data is research publications. Every published paper containing experimental information about a particular model organism is a candidate for curation. All such papers are examined carefully by curators for relevant information. Here, four curators from different MODs describe the literature curation process and highlight approaches taken by the four MODs to address: (1) the decision process by which papers are selected, and (2) the identification and prioritization of the data contained in the paper. We will highlight some of the challenges that MOD biocurators face, and point to ways in which researchers and publishers can support the work of biocurators and the value of such support.

PubMed Disclaimer

Figures

Fig. 1

Fig. 1

A typical curation workflow, exemplified by the process at ZFIN. Curation workflows are unique as each MOD strives to best serve its own research community. For example at some MODS, different members of the curation team may enter different types of data, whereas at other MODS a single curator may enter all of the data types from a paper. Additional differences in workflow stem mainly from staffing and other budgetary constraints for each database. However, there are many commonalities in the workflow process, as the questions that must be answered to complete curation of a paper are similar regardless of the MOD. Here, the curation workflow at ZFIN illustrates the order in which certain tasks take place and many of the questions that must be answered at each step. Papers that lack key details can prevent curators from answering questions critical to the curation process, leading to a reduction in the amount or the detail of the curated data

Fig. 2

Fig. 2

A TAIR web query form using controlled vocabularies to ask “Find a gene whose symbol begins with At1g and has GO function annotations based on direct assays, and codes for a protein that has literature associated with it.”

Fig. 3

Fig. 3

Snapshot of the Gene Detail page for Fech (upper left), and snapshots of MGI editorial interfaces for input of data relating to symbol, name, and synonyms (upper right), phenotypic alleles (bottom right), including expanded window showing a controlled pick list, and mammalian homology (bottom left). Arrows point to the relevant section of the gene detail page that the editorial interface addresses

Fig. 4

Fig. 4

The global flow of biological data, as presented from a MOD perspective. Curators read the published literature and data that can be extracted for the database is identified and entered. Other sources of data may also be incorporated and in some cases can be used to identify inconsistencies with the literature-derived data. The curation process serves to organize and integrate data into the relational database format for users to easily view what is known and not known about their favorite genes or proteins

Similar articles

Cited by

References

    1. Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000;408:796–815. doi: 10.1038/35048692. - DOI - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Ausdesirk T, Audesirk G, Byers B. Life on earth. 3. Saddlebrook: Pearson Prentice Hall; 2004.
    1. Bard JL, Kaufman MH, Dubreuil C, Brune RM, Burger A, Baldock RA, Davidson DR. An internet-accessible database of mouse developmental anatomy based on a systematic nomenclature. Mech Dev. 1998;74:111–120. doi: 10.1016/S0925-4773(98)00069-0. - DOI - PubMed
    1. Bard J, Rhee SY, Ashburner M. An ontology for cell types. Genome Biol. 2005;6:R21. doi: 10.1186/gb-2005-6-2-r21. - DOI - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources