Wikidata as a semantic framework for the Gene Wiki initiative - PubMed (original) (raw)

Wikidata as a semantic framework for the Gene Wiki initiative

Sebastian Burgstaller-Muehlbacher et al. Database (Oxford). 2016.

Abstract

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia. In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes have been imported from NCBI and 27,306 human proteins and 16,728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike. The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias. Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists. In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web. Database URL: https://www.wikidata.org/.

PubMed Disclaimer

Figures

Figure 1

Wikidata item and data organization. Wikidata items can be added or edited by anyone manually. A Wikidata item consists of: (1) a language-specific label, (2) its unique identifier, (3) language specific aliases, (4) interwiki links to the different language Wikipedia articles or other Wikimedia projects and (5) a list of statements. For this specific example, the human protein Reelin was used (

https://www.wikidata.org/wiki/Q13569356

)

Figure 2

Gene Wiki data model in Wikidata. Each entity (human gene, human protein, mouse gene, mouse protein) is represented as a separate Wikidata item. Arrows represent direct links between Wikidata statements. The English language interwiki link on the human gene item points to the corresponding Gene Wiki article on the English Wikipedia.

Figure 3

GeneWiki infobox populated with data from Wikidata, using data from Wikidata items Q414043 for the human gene, Q13561329 for human protein, Q14331135 for the mouse gene and Q14331165 for the mouse protein. Three dots indicate that there is more information in the real Gene Wiki infobox for Reelin (

https://en.wikipedia.org/wiki/Reelin

Figure 4

An example SPARQL query, using the Wikidata SPARQL endpoint (query.wikidata.org). It retrieves all Wikidata (WD) items which are of subclass protein-coding gene (Q840604), which have a chromosomal start position (P644) according to human genome build GRCh38 and reside on human chromosome (P659) 9 (Q20966585) and a chromosomal end position (P645) also on chromosome 9. Furthermore, the region of interest is restricted to a chromosomal start position between 21 and 30 megabase pairs. Colors: Red indicates SPARQL commands, blue represents variable names, green represents URIs and brown are strings. Arrows point to the source code the description applies to.

Cited by

Collective intelligence defines biological functions in Wikipedia as communities in the hidden protein connection network.
Zinovyev A, Czerwinska U, Cantini L, Barillot E, Frahm KM, Shepelyansky DL. Zinovyev A, et al. PLoS Comput Biol. 2020 Feb 18;16(2):e1007652. doi: 10.1371/journal.pcbi.1007652. eCollection 2020 Feb. PLoS Comput Biol. 2020. PMID: 32069277 Free PMC article.
Intestinal microbiota alterations by dietary exposure to chemicals from food cooking and processing. Application of data science for risk prediction.
Ruiz-Saavedra S, García-González H, Arboleya S, Salazar N, Emilio Labra-Gayo J, Díaz I, Gueimonde M, González S, de Los Reyes-Gavilán CG. Ruiz-Saavedra S, et al. Comput Struct Biotechnol J. 2021 Jan 29;19:1081-1091. doi: 10.1016/j.csbj.2021.01.037. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 33680352 Free PMC article. Review.
Human Disease Ontology 2018 update: classification, content and workflow expansion.
Schriml LM, Mitraka E, Munro J, Tauber B, Schor M, Nickle L, Felix V, Jeng L, Bearer C, Lichenstein R, Bisordi K, Campion N, Hyman B, Kurland D, Oates CP, Kibbey S, Sreekumar P, Le C, Giglio M, Greene C. Schriml LM, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D955-D962. doi: 10.1093/nar/gky1032. Nucleic Acids Res. 2019. PMID: 30407550 Free PMC article.
Ten quick tips for editing Wikidata.
Shafee T, Mietchen D, Lubiana T, Jemielniak D, Waagmeester A. Shafee T, et al. PLoS Comput Biol. 2023 Jul 20;19(7):e1011235. doi: 10.1371/journal.pcbi.1011235. eCollection 2023 Jul. PLoS Comput Biol. 2023. PMID: 37471307 Free PMC article. No abstract available.
ChlamBase: a curated model organism database for the Chlamydia research community.
Putman T, Hybiske K, Jow D, Afrasiabi C, Lelong S, Cano MA, Wu C, Su AI. Putman T, et al. Database (Oxford). 2019 Jan 1;2019:baz041. doi: 10.1093/database/baz041. Database (Oxford). 2019. PMID: 30985891 Free PMC article.

References

1. Daub J., Gardner P.P., Tate J. et al. (2008) The RNA WikiProject: community annotation of RNA families. RNA, 14, 2462–2464. - PMC - PubMed
1. Huss J.W., 3rd, Orozco C., Goodale J. et al. (2008) A gene wiki for community annotation of gene function. PLoS Biol., 6, e175. - PMC - PubMed
1. Huss J.W., 3rd, Lindenbaum P., Martone M. et al. (2010) The Gene Wiki: community intelligence applied to human gene annotation. Nucleic Acids Res., 38, D633–D639. - PMC - PubMed
1. Good B.M., Clarke E.L., de Alfaro L. et al. (2012) The Gene Wiki in 2011: community intelligence applied to human gene annotation. Nucleic Acids Res., 40, D1255–D1261. - PMC - PubMed
1. Ashburner M., Ball C.A., Blake J.A. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet, 25, 25–29. - PMC - PubMed

Wikidata as a semantic framework for the Gene Wiki initiative - PubMed (original) (raw)