The Pfam protein families database - PubMed (original) (raw)
. 2010 Jan;38(Database issue):D211-22.
doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.
Jaina Mistry, John Tate, Penny Coggill, Andreas Heger, Joanne E Pollington, O Luke Gavin, Prasad Gunasekaran, Goran Ceric, Kristoffer Forslund, Liisa Holm, Erik L L Sonnhammer, Sean R Eddy, Alex Bateman
Affiliations
- PMID: 19920124
- PMCID: PMC2808889
- DOI: 10.1093/nar/gkp985
The Pfam protein families database
Robert D Finn et al. Nucleic Acids Res. 2010 Jan.
Abstract
Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
Figures
Figure 1.
Sequence search results page. Results page for a single sequence search, showing at the top, the graphic of the domains matched by the query sequence along its length, with any active-site or metal-binding residues marked up if present. Underneath comes, firstly, the significant matches to Pfam-A families, then the insignificant matches to Pfam-A families, followed by the significant matches to Pfam-B families. At the bottom is the expanded match results with the #HMM line coloured such that residues identical to those in the query are coloured cyan and those that are similar in dark blue, and a #PP (posterior probability) line giving the posterior-probabilities at each point such that the #SEQ, query, line is colour-coded accordingly.
Figure 2.
New Pfam display of a protein domain architecture. Pfam-A families classified as type ‘family’ and ‘domain’ with a lozenge shape, and families with type ‘repeat’ or ‘motif’ are represented by rectangles. The alignment co-ordinates are depitcted with a solid colour, and the envelope co-ordianates in a lighter shade of this colour. Where the profile HMM match for a domain or family is only of partial length, the curved end of the lozenge/rectangle is replaced by a jagged edge. Active-site residues are marked with a lollipop with a diamond-shaped head. An example tooltip showing the domain description, co-ordinates and source is shown for the fourth domain. Note the overlapping envelopes between fourth and fifth domains.
Figure 3.
New alignment confidence display. The colour of the residues reflects the alignment uncertainty, and is based on the posterior probability that is calculated by HMMER3. A green residue indicates a high posterior probability which means that the alignment of the amino acid to the match/insert state in the profile HMM is very likely to be correct. Where the posterior probablity is lower, and therefore the alignment certainty decreases, the colour becomes closer to red. This allows users quickly to identify regions of the alignment where some sequences are aligned with less certainty.
Figure 4.
New BioLit/TOPSAN views. Left: using the webservices provided by BioLit, we display the abstract, figures and figure legends from the publication associated with a particular PDB entry (only where articles are published in open access journals). In this case, we have retrieved open access articles that reference the PDB entry 1dan. Right: using the webservices provided by TOPSAN, we display images and text from the TOPSAN wiki, and a link so that users can contribute to the TOPSAN wiki. In this example, we show the information contained in TOPSAN describing PDB entry 1kq3.
Similar articles
- The Pfam protein families database.
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A. Finn RD, et al. Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26. Nucleic Acids Res. 2008. PMID: 18039703 Free PMC article. - The Pfam protein families database.
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD. Punta M, et al. Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29. Nucleic Acids Res. 2012. PMID: 22127870 Free PMC article. - The Pfam protein families database.
Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer EL. Bateman A, et al. Nucleic Acids Res. 2002 Jan 1;30(1):276-80. doi: 10.1093/nar/30.1.276. Nucleic Acids Res. 2002. PMID: 11752314 Free PMC article. - Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins.
Bateman A, Birney E, Durbin R, Eddy SR, Finn RD, Sonnhammer EL. Bateman A, et al. Nucleic Acids Res. 1999 Jan 1;27(1):260-2. doi: 10.1093/nar/27.1.260. Nucleic Acids Res. 1999. PMID: 9847196 Free PMC article. - Pfam 10 years on: 10,000 families and still growing.
Sammut SJ, Finn RD, Bateman A. Sammut SJ, et al. Brief Bioinform. 2008 May;9(3):210-9. doi: 10.1093/bib/bbn010. Epub 2008 Mar 15. Brief Bioinform. 2008. PMID: 18344544 Review.
Cited by
- Genome guided, organ-specific transcriptome assembly of the European flounder (P. flesus) from the Baltic Sea.
Pomianowski K, Kulczykowska E, Burzyński A. Pomianowski K, et al. Sci Data. 2024 Oct 30;11(1):1184. doi: 10.1038/s41597-024-04004-6. Sci Data. 2024. PMID: 39477936 Free PMC article. - Organellar proteomics reveals hundreds of novel nuclear proteins in the malaria parasite Plasmodium falciparum.
Oehring SC, Woodcroft BJ, Moes S, Wetzel J, Dietz O, Pulfer A, Dekiwadia C, Maeser P, Flueck C, Witmer K, Brancucci NM, Niederwieser I, Jenoe P, Ralph SA, Voss TS. Oehring SC, et al. Genome Biol. 2012 Nov 26;13(11):R108. doi: 10.1186/gb-2012-13-11-r108. Genome Biol. 2012. PMID: 23181666 Free PMC article. - Genome-wide comparative analysis of Mg transporter gene family between Triticum turgidum and Camelina sativa.
Faraji S, Ahmadizadeh M, Heidari P. Faraji S, et al. Biometals. 2021 Jun;34(3):639-660. doi: 10.1007/s10534-021-00301-4. Epub 2021 Mar 30. Biometals. 2021. PMID: 33783656 - Developmentally regulated HEART STOPPER, a mitochondrially targeted L18 ribosomal protein gene, is required for cell division, differentiation, and seed development in Arabidopsis.
Zhang H, Luo M, Day RC, Talbot MJ, Ivanova A, Ashton AR, Chaudhury AM, Macknight RC, Hrmova M, Koltunow AM. Zhang H, et al. J Exp Bot. 2015 Sep;66(19):5867-80. doi: 10.1093/jxb/erv296. Epub 2015 Jun 23. J Exp Bot. 2015. PMID: 26105995 Free PMC article. - Transcriptome and Regional Association Analyses Reveal the Effects of Oleosin Genes on the Accumulation of Oil Content in Brassica napus.
Jia Y, Yao M, He X, Xiong X, Guan M, Liu Z, Guan C, Qian L. Jia Y, et al. Plants (Basel). 2022 Nov 16;11(22):3140. doi: 10.3390/plants11223140. Plants (Basel). 2022. PMID: 36432869 Free PMC article.
References
Publication types
MeSH terms
Grants and funding
- MC_U137761446/MRC_/Medical Research Council/United Kingdom
- HHMI/Howard Hughes Medical Institute/United States
- 087656/WT_/Wellcome Trust/United Kingdom
- BB/F010435/1/BB_/Biotechnology and Biological Sciences Research Council/United Kingdom
- WT077044/Z/05/Z/WT_/Wellcome Trust/United Kingdom
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous