Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation - PubMed (original) (raw)

. 2018 Jan 4;46(D1):D221-D228.

doi: 10.1093/nar/gkx1031.

Nuala A O'Leary 1, Catherine M Farrell 1, Jane E Loveland 2, Jonathan M Mudge 2, Craig Wallin 1, Carlos G Girón 2, Mark Diekhans 3, If Barnes 2, Ruth Bennett 2, Andrew E Berry 2, Eric Cox 1, Claire Davidson 2, Tamara Goldfarb 1, Jose M Gonzalez 2, Toby Hunt 2, John Jackson 1, Vinita Joardar 1, Mike P Kay 2, Vamsi K Kodali 1, Fergal J Martin 2, Monica McAndrews 4, Kelly M McGarvey 1, Michael Murphy 1, Bhanu Rajput 1, Sanjida H Rangwala 1, Lillian D Riddick 1, Ruth L Seal 5, Marie-Marthe Suner 2, David Webb 1, Sophia Zhu 4, Bronwen L Aken 2, Elspeth A Bruford 5, Carol J Bult 4, Adam Frankish 2, Terence Murphy 1, Kim D Pruitt 1

Affiliations

PMID: 29126148
PMCID: PMC5753299
DOI: 10.1093/nar/gkx1031

Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

Shashikant Pujar et al. Nucleic Acids Res. 2018.

Abstract

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.

Published by Oxford University Press on behalf of Nucleic Acids Research 2017.

PubMed Disclaimer

Figures

Figure 1.

Number of CCDS IDs and genes represented in the human (A) and mouse (B) CCDS releases. The _X_-axis indicates the year in which a CCDS dataset was made public. Details about CCDS releases are available on the CCDS Releases and Statistics web page (

https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=SHOW\_STATISTICS

Figure 2.

Fraction of all genes in a CCDS release that are represented by at least two current CCDS IDs.

Figure 3.

Changes in the human (A) and mouse (B) datasets with every new CCDS release. ‘New’ = new CCDS IDs added; ‘dropped’ = CCDS ID present in the previous release but withdrawn in the subsequent release; ‘updated’ = CCDS IDs that have an incremented accession version compared to the previous release, indicating a sequence update in the coding region.

Figure 4.

A view of the graphical display accessed from the report page of CCDS3542.1 (

https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=ALLFIELDS&DATA=CCDS3542&ORGANISM=0&BUILDS=CURRENTBUILDS

) using the purple ‘S’ icon. (A) Transcripts and proteins from NCBI Annotation Release 108. (B) Transcripts and proteins from Ensembl Release 85. The green bar indicates the gene; transcripts are shown in purple and proteins are shown in red color. Positioning the cursor over any of these objects (gene, transcript or protein) opens a tool tip which includes additional information and links. Proteins in the NCBI annotation display that are in the CCDS set include a link to the CCDS ID in the tool tip. The gray box to the right (indicated by vertical arrow) is the tool tip corresponding to the protein accession NP_002514.1. Differences between any two objects can also be revealed as vertical lines (indicated by horizontal arrows) when the objects (NM_002523.2 and ENST00000265634 in the figure) are selected using the ‘Control’ or ‘Command’ button on the keyboard.

Figure 5.

Distribution of human and mouse CCDS IDs by their ‘Review status’ in the current human (Release 20) and mouse (Release 21) CCDS releases at the time of data freeze. Details of the review status categories and sub-categories are provided in Table 1. Reviewed 1 = CCDS IDs reviewed ‘by RefSeq and HAVANA’, Reviewed 2 = CCDS IDs reviewed ‘by CCDS collaboration’, Reviewed 3 = CCDS IDs reviewed ‘by RefSeq, HAVANA and CCDS collaboration’.

Cited by

The implications of APOBEC3-mediated C-to-U RNA editing for human disease.
Van Norden M, Falls Z, Mandloi S, Segal BH, Baysal BE, Samudrala R, Elkin PL. Van Norden M, et al. Commun Biol. 2024 May 4;7(1):529. doi: 10.1038/s42003-024-06239-w. Commun Biol. 2024. PMID: 38704509 Free PMC article.
Cellular energy regulates mRNA degradation in a codon-specific manner.
Tomaz da Silva P, Zhang Y, Theodorakis E, Martens LD, Yépez VA, Pelechano V, Gagneur J. Tomaz da Silva P, et al. Mol Syst Biol. 2024 May;20(5):506-520. doi: 10.1038/s44320-024-00026-9. Epub 2024 Mar 15. Mol Syst Biol. 2024. PMID: 38491213 Free PMC article.
Joint analysis of mutational and transcriptional landscapes in human cancer reveals key perturbations during cancer evolution.
Cho JW, Cao J, Hemberg M. Cho JW, et al. Genome Biol. 2024 Mar 8;25(1):65. doi: 10.1186/s13059-024-03201-1. Genome Biol. 2024. PMID: 38459554 Free PMC article.
Genetic architecture and biology of youth-onset type 2 diabetes.
Kwak SH, Srinivasan S, Chen L, Todd J, Mercader JM, Jensen ET, Divers J, Mottl AK, Pihoker C, Gandica RG, Laffel LM, Isganaitis E, Haymond MW, Levitsky LL, Pollin TI, Florez JC, Flannick J; Progress in Diabetes Genetics in Youth (ProDiGY) consortium. Kwak SH, et al. Nat Metab. 2024 Feb;6(2):226-237. doi: 10.1038/s42255-023-00970-0. Epub 2024 Jan 26. Nat Metab. 2024. PMID: 38278947 Free PMC article.
Antibody production relies on the tRNA inosine wobble modification to meet biased codon demand.
Giguère S, Wang X, Huber S, Xu L, Warner J, Weldon SR, Hu J, Phan QA, Tumang K, Prum T, Ma D, Kirsch KH, Nair U, Dedon P, Batista FD. Giguère S, et al. Science. 2024 Jan 12;383(6679):205-211. doi: 10.1126/science.adi1763. Epub 2024 Jan 11. Science. 2024. PMID: 38207021 Free PMC article.

References

1. O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D. et al. . Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. - PMC - PubMed
1. Aken B.L., Ayling S., Barrell D., Clarke L., Curwen V., Fairley S., Fernandez Banet J., Billis K., Garcia Giron C., Hourlier T. et al. . The Ensembl gene annotation system. Database. 2016; 2016:1–19. - PMC - PubMed
1. Pruitt K.D., Harrow J., Harte R.A., Wallin C., Diekhans M., Maglott D.R., Searle S., Farrell C.M., Loveland J.E., Ruef B.J. et al. . The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009; 19:1316–1323. - PMC - PubMed
1. Harte R.A., Farrell C.M., Loveland J.E., Suner M.M., Wilming L., Aken B., Barrell D., Frankish A., Wallin C., Searle S. et al. . Tracking and coordinating an international curation effort for the CCDS Project. Database. 2012; 2012:bas008. - PMC - PubMed
1. Farrell C.M., O’Leary N.A., Harte R.A., Loveland J.E., Wilming L.G., Wallin C., Diekhans M., Barrell D., Searle S.M., Aken B. et al. . Current status and new features of the Consensus Coding Sequence database. Nucleic Acids Res. 2014; 42:D865–D872. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations
- scite Smart Citations