Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development - PubMed (original) (raw)
Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development
Jennifer I Deegan née Clark et al. BMC Bioinformatics. 2010.
Abstract
Background: The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation.
Results: We have formalized the taxonomic constraints implicit in some GO classes, and specified these at various levels in the ontology. We have also developed an inference system that can be used to check for violations of these constraints in annotations. Using the constraints in conjunction with the inference system, we have detected and removed errors in annotations and improved the structure of the ontology.
Conclusions: Detection of inconsistencies in taxon-specificity enables gradual improvement of the ontologies, the annotations, and the formalized constraints. This is progressively improving the quality of our data. The full system is available for download, and new constraints or proposed changes to constraints can be submitted online at https://sourceforge.net/tracker/?atid=605890&group\_id=36855.
Figures
Figure 1
Lactation. The GO class 'lactation' is restricted for use with gene products from species of the taxonomic grouping Mammalia. The class inherits this restriction from the superclass 'mammary gland development'. In this figure, the GO classes are shown in blue, and the taxonomic classes are shown in yellow. The relationship types are labeled in the diagram.
Figure 2
C4 photosynthesis. The GO class 'C4 photosynthesis' is restricted for use with gene products from species of the taxonomic grouping Viridiplantae. This is a narrower taxonomic group than that to which the GO superclass 'photosynthesis' is restricted. The GO class 'photosynthesis' is restricted for use with gene products from any sub-type of the Viridiplantae, Euglenozoa, Archaea or Bacteria. The relationship between 'photosynthesis' and these four taxonomic groups is shown by the relationship only_ in_ taxon from 'photosynthesis' to the union term 'Viridiplantae or Euglenozoa or Archaea or Bacteria', and by the relationships between this union term and the four individual taxonomic groups. These latter relationships are shown as union_ of relationships (marked 'un'). In this figure, the GO classes are shown in blue, and the taxonomic classes are shown in yellow. The relationship types are labeled in the diagram.
Similar articles
- GOChase-II: correcting semantic inconsistencies from Gene Ontology-based annotations for gene products.
Park YR, Kim J, Lee HW, Yoon YJ, Kim JH. Park YR, et al. BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-12-S1-S40. BMC Bioinformatics. 2011. PMID: 21342572 Free PMC article. - Mining GO annotations for improving annotation consistency.
Faria D, Schlicker A, Pesquita C, Bastos H, Ferreira AE, Albrecht M, Falcão AO. Faria D, et al. PLoS One. 2012;7(7):e40519. doi: 10.1371/journal.pone.0040519. Epub 2012 Jul 25. PLoS One. 2012. PMID: 22848383 Free PMC article. - A relation based measure of semantic similarity for Gene Ontology annotations.
Sheehan B, Quigley A, Gaudin B, Dobson S. Sheehan B, et al. BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468. BMC Bioinformatics. 2008. PMID: 18983678 Free PMC article. - The Foundational Model of Anatomy in OWL 2 and its use.
Golbreich C, Grosjean J, Darmoni SJ. Golbreich C, et al. Artif Intell Med. 2013 Feb;57(2):119-32. doi: 10.1016/j.artmed.2012.11.002. Epub 2012 Dec 28. Artif Intell Med. 2013. PMID: 23273493 - Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae.
Meng S, Brown DE, Ebbole DJ, Torto-Alalibo T, Oh YY, Deng J, Mitchell TK, Dean RA. Meng S, et al. BMC Microbiol. 2009 Feb 19;9 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2180-9-S1-S8. BMC Microbiol. 2009. PMID: 19278556 Free PMC article. Review.
Cited by
- Canto: an online tool for community literature curation.
Rutherford KM, Harris MA, Lock A, Oliver SG, Wood V. Rutherford KM, et al. Bioinformatics. 2014 Jun 15;30(12):1791-2. doi: 10.1093/bioinformatics/btu103. Epub 2014 Feb 25. Bioinformatics. 2014. PMID: 24574118 Free PMC article. - The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments.
Roncaglia P, Martone ME, Hill DP, Berardini TZ, Foulger RE, Imam FT, Drabkin H, Mungall CJ, Lomax J. Roncaglia P, et al. J Biomed Semantics. 2013 Oct 7;4(1):20. doi: 10.1186/2041-1480-4-20. J Biomed Semantics. 2013. PMID: 24093723 Free PMC article. - A compendium of human gene functions derived from evolutionary modelling.
Feuermann M, Mi H, Gaudet P, Muruganujan A, Lewis SE, Ebert D, Mushayahama T; Gene Ontology Consortium; Thomas PD. Feuermann M, et al. Nature. 2025 Apr;640(8057):146-154. doi: 10.1038/s41586-025-08592-0. Epub 2025 Feb 26. Nature. 2025. PMID: 40011791 Free PMC article. - Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation.
Chen J, Goudey B, Geard N, Verspoor K. Chen J, et al. Bioinformatics. 2024 Jun 28;40(Suppl 1):i390-i400. doi: 10.1093/bioinformatics/btae246. Bioinformatics. 2024. PMID: 38940182 Free PMC article. - Uberon, an integrative multi-species anatomy ontology.
Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Mungall CJ, et al. Genome Biol. 2012 Jan 31;13(1):R5. doi: 10.1186/gb-2012-13-1-r5. Genome Biol. 2012. PMID: 22293552 Free PMC article.
References
- Hubbard TJP, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Overduin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E. Ensembl. Nucleic Acids Res. 2007;35:D610–D617. doi: 10.1093/nar/gkl996. - DOI - PMC - PubMed
- Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. InterPro. Bioinformatics. 2009;37:D211–D215. - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources