The InterPro BioMart: federated query and web service access to the InterPro Resource - PubMed (original) (raw)

The InterPro BioMart: federated query and web service access to the InterPro Resource

Philip Jones et al. Database (Oxford). 2011.

Abstract

The InterPro BioMart provides users with query-optimized access to predictions of family classification, protein domains and functional sites, based on a broad spectrum of integrated computational models ('signatures') that are generated by the InterPro member databases: Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. These predictions are provided for all protein sequences from both the UniProt Knowledge Base and the UniParc protein sequence archive. The InterPro BioMart is supplementary to the primary InterPro web interface (http://www.ebi.ac.uk/interpro), providing a web service and the ability to build complex, custom queries that can efficiently return thousands of rows of data in a variety of formats. This article describes the information available from the InterPro BioMart and illustrates its utility with examples of how to build queries that return useful biological information. Database URL: http://www.ebi.ac.uk/interpro/biomart/martview.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

An example human-curated InterPro entry, illustrating the detailed description provided for the entry and cross references to the GO and the member database signatures from which the entry is composed.

Figure 2.

Figure 2.

A protein for which matches have been calculated by InterPro. For this sequence, InterPro provides a prediction of protein family membership, an overview of the domain organization and the details of matches to member database signatures. At the foot of the view can be seen associated GO terms, based upon the calculated matches to InterPro entries.

Figure 3.

Figure 3.

Selecting a dataset in the InterPro BioMart.

Figure 4.

Figure 4.

Building a filter with two components: include results for ‘Family’ entry types that comprise signatures from Pfam.

Figure 5.

Figure 5.

Selecting the attributes to be included in the BioMart output (equivalent to the columns of a spreadsheet). The ordering of the columns is determined by the order in which the attributes are selected.

Figure 6.

Figure 6.

Clicking the ‘Results’ button at the top of the interface provides the first 10 results matching the query, to allow the query to be modified or improved.

References

    1. Hunter S, Apweiler R, Attwood TK, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37:D211–D215. - PMC - PubMed
    1. Lees J, Yeats C, Redfern O, et al. Gene3D: merging structure and function for a thousand genomes. Nucleic Acids Res. 2010;38:D296–D300. - PMC - PubMed
    1. Lima T, Auchincloss AH, Coudert E, et al. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 2009;37:D471–D478. - PMC - PubMed
    1. Thomas PD, Campbell MJ, Kejariwal A, et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003;13:2129–2141. - PMC - PubMed
    1. Finn RD, Mistry J, Tate J, et al. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources