CDD: a Conserved Domain Database for protein classification - PubMed (original) (raw)

. 2005 Jan 1;33(Database issue):D192-6.

doi: 10.1093/nar/gki069.

John B Anderson, Praveen F Cherukuri, Carol DeWeese-Scott, Lewis Y Geer, Marc Gwadz, Siqian He, David I Hurwitz, John D Jackson, Zhaoxi Ke, Christopher J Lanczycki, Cynthia A Liebert, Chunlei Liu, Fu Lu, Gabriele H Marchler, Mikhail Mullokandov, Benjamin A Shoemaker, Vahan Simonyan, James S Song, Paul A Thiessen, Roxanne A Yamashita, Jodie J Yin, Dachuan Zhang, Stephen H Bryant

Affiliations

CDD: a Conserved Domain Database for protein classification

Aron Marchler-Bauer et al. Nucleic Acids Res. 2005.

Abstract

The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed, and can be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. CD-Search, which is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, is a fast, interactive tool to identify conserved domains in new protein sequences. CD-Search results for protein sequences in Entrez are pre-computed to provide links between proteins and domain models, and computational annotation visible upon request. Protein-protein queries submitted to NCBI's BLAST search service at http://www.ncbi.nlm.nih.gov/BLAST are scanned for the presence of conserved domains by default. While CDD started out as essentially a mirror of publicly available domain alignment collections, such as SMART, Pfam and COG, we have continued an effort to update, and in some cases replace these models with domain hierarchies curated at the NCBI. Here, we report on the progress of the curation effort and associated improvements in the functionality of the CDD information retrieval system.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Pre-calculated or live CD-Search results are readily available for protein sequences in Entrez. Clicking on the colored bars will launch alignment displays that merge the query into the domain alignment model, for further analysis. Domain annotation bars with identical colors have been grouped into sets of ‘related’ domains, indicating that they share many of the sequence intervals hit with significant _E_-values. Annotation bars colored in gray have been classified as putative multi-domain models and are excluded from domain–domain neighboring. The lower half of the figure displays a graphical representation of a domain family hierarchy, giving the summary for one particular member (cd01366).

Figure 2

Figure 2

Subfamily hierarchy of the Myosin/Kinesin motor domains, the corresponding sequence tree and taxonomy display. One subfamily has been highlighted (KISc_C_terminal), and the highlights are reflected in both the sequence tree and taxonomy view. It is evident that members of this subfamily form a distinct node in this tree calculated by the neighbor-joining algorithm. It is also evident that members of this subfamily span a variety of taxa, suggesting that this particular type of domain was already present in their common ancestor's genome.

References

    1. Bateman A., Coin,L., Durbin,R., Finn,R.D., Hollich,V., Griffiths-Jones,S., Khanna,A., Marshall,M., Moxon,S., Sonnhammer,E.L., Studholme,D.J., Yeats,C. and Eddy,S.R. (2004) The Pfam protein families database. Nucleic Acids Res., 32, 138–141. - PMC - PubMed
    1. Letunic I., Copley,R.R., Schmidt,S., Ciccarelli,F.D., Doerks,T., Schultz,J., Ponting,C.P. and Bork,P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res., 32, 142–144. - PMC - PubMed
    1. Tatusov R.L., Fedorova,N.D., Jackson,J.D., Jacobs,A.R., Kiryutin,B., Koonin,E.V., Krylov,D.M., Mazumder,R., Mekhedov,S.L., Nikolskaya,A.N., Rao,B.S., Smirnov,S., Sverdlov,A.V., Vasudevan,S., Wolf,Y.I., Yin,J.J. and Natale,D.A. (2003) The COG database: and updated version includes eukaryotes. BMC Bioinformatics, 4, 41. - PMC - PubMed
    1. Marchler-Bauer A. and Bryant,S.H. (2004) CD-Search: protein domain annotations on the fly. Nucleic Acids Res., 32, W327–W331. - PMC - PubMed
    1. Marchler-Bauer A., Panchenko,A.R., Ariel,N. and Bryant,S.H. (2002) Comparison of sequence and structure alignments for protein domains. Proteins, 48, 439–446. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources