Biotea Dataset (Vr. July 2012) (original) (raw)

2012

Abstract

<strong>Background</strong> Information reported by scientific literature still remains locked up in discrete documents that are not always interconnected or machine-readable. The Semantic Web together with approaches such as the Resource Description Framework (RDF) and the Linked Open Data (LOD) initiative offer a connectivity tissue that can be used to support the generation of self-describing, machine-readable documents. <strong>Results</strong> Biotea is an approach to generate RDF from scholarly documents. Our RDF model makes extensive use of existing ontologies and semantic enrichment services. Our dataset comprises 270,834 articles from PubMed Open Central in RDF/XML distributed in 404 zipped files. The RDFization process takes care of metadata, e.g., title, authors and journal, as well as semantic annotations on biological entities along the full text. Biological entities are extracted by using the NCBO Annotator and Whatizit. We use the Bibliographic Ontology (BIBO), Dublin Core Metadata Initiative Terms (DCMI-terms), and the Provenance Ontology (PROV-O) to model the bibliographic metadata. Links to related pages such as PubMed HTML articles are provided via rdfs:seeAlso while links to other semantic representation such as Bio2RDF PubMed articles are provided via owl:sameAs. The NCBO Annotator is used to extract entities covering ChEBI for chemicals; Pathway, and Functional Genomics Data Society (MGED) for genes and proteins; Master Drug Data Base (MDDB), NDDF, and NDFRT for drugs; SNOMED, SYMP, MedDRA, MeSH, MedlinePlus Health Topics (MedlinePlus), Online Mendelian Inheritance in Man (OMIM), FMA, ICD10, and Ontology for Biomedical Investigations (OBI) for diseases and medical terms; PO for plants; and MeSH, SNOMED, and NCIt for general terms. Whatizit is used for GO, UniProt proteins, UniProt Taxonomy, and diseases mapped to the UMLS; UniProt taxa are also mapped to NCBI Taxon vocabulary. <strong>Conclusions</strong> Biotea delivers models and tools for metadata enrichment and semantic pro [...]

olga giraldo hasn't uploaded this paper.

Let olga know you want this paper to be uploaded.

Ask for this paper to be uploaded.