The Pfam protein families database - PubMed (original) (raw)

. 2010 Jan;38(Database issue):D211-22.

doi: 10.1093/nar/gkp985. Epub 2009 Nov 17.

Jaina Mistry, John Tate, Penny Coggill, Andreas Heger, Joanne E Pollington, O Luke Gavin, Prasad Gunasekaran, Goran Ceric, Kristoffer Forslund, Liisa Holm, Erik L L Sonnhammer, Sean R Eddy, Alex Bateman

Affiliations

The Pfam protein families database

Robert D Finn et al. Nucleic Acids Res. 2010 Jan.

Abstract

Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Sequence search results page. Results page for a single sequence search, showing at the top, the graphic of the domains matched by the query sequence along its length, with any active-site or metal-binding residues marked up if present. Underneath comes, firstly, the significant matches to Pfam-A families, then the insignificant matches to Pfam-A families, followed by the significant matches to Pfam-B families. At the bottom is the expanded match results with the #HMM line coloured such that residues identical to those in the query are coloured cyan and those that are similar in dark blue, and a #PP (posterior probability) line giving the posterior-probabilities at each point such that the #SEQ, query, line is colour-coded accordingly.

Figure 2.

Figure 2.

New Pfam display of a protein domain architecture. Pfam-A families classified as type ‘family’ and ‘domain’ with a lozenge shape, and families with type ‘repeat’ or ‘motif’ are represented by rectangles. The alignment co-ordinates are depitcted with a solid colour, and the envelope co-ordianates in a lighter shade of this colour. Where the profile HMM match for a domain or family is only of partial length, the curved end of the lozenge/rectangle is replaced by a jagged edge. Active-site residues are marked with a lollipop with a diamond-shaped head. An example tooltip showing the domain description, co-ordinates and source is shown for the fourth domain. Note the overlapping envelopes between fourth and fifth domains.

Figure 3.

Figure 3.

New alignment confidence display. The colour of the residues reflects the alignment uncertainty, and is based on the posterior probability that is calculated by HMMER3. A green residue indicates a high posterior probability which means that the alignment of the amino acid to the match/insert state in the profile HMM is very likely to be correct. Where the posterior probablity is lower, and therefore the alignment certainty decreases, the colour becomes closer to red. This allows users quickly to identify regions of the alignment where some sequences are aligned with less certainty.

Figure 4.

Figure 4.

New BioLit/TOPSAN views. Left: using the webservices provided by BioLit, we display the abstract, figures and figure legends from the publication associated with a particular PDB entry (only where articles are published in open access journals). In this case, we have retrieved open access articles that reference the PDB entry 1dan. Right: using the webservices provided by TOPSAN, we display images and text from the TOPSAN wiki, and a link so that users can contribute to the TOPSAN wiki. In this example, we show the information contained in TOPSAN describing PDB entry 1kq3.

Similar articles

Cited by

References

    1. Heger A., Wilton C.A., Sivakumar A., Holm L. ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res. 2005;33:D188–D191. - PMC - PubMed
    1. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Henrick K., Feng Z., Bluhm W.F., Dimitropoulos D., Doreleijers J.F., Dutta S., Flippen-Anderson J.L., Ionides J., Kamada C., Krissinel E., et al. Remediation of the protein data bank archive. Nucleic Acids Res. 2008;36:D426–D433. - PMC - PubMed
    1. Hunter S., Apweiler R., Attwood T.K., Bairoch A., Bateman A., Binns D., Bork P., Das U., Daugherty L., Duquenne L., et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37:D211–D215. - PMC - PubMed
    1. Finn R.D., Mistry J., Schuster-Bockler B., Griffiths-Jones S., Hollich V., Lassmann T., Moxon S., Marshall M., Khanna A., Durbin R., et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–D251. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources