Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation - PubMed (original) (raw)
. 2018 Jan 4;46(D1):D221-D228.
doi: 10.1093/nar/gkx1031.
Nuala A O'Leary 1, Catherine M Farrell 1, Jane E Loveland 2, Jonathan M Mudge 2, Craig Wallin 1, Carlos G Girón 2, Mark Diekhans 3, If Barnes 2, Ruth Bennett 2, Andrew E Berry 2, Eric Cox 1, Claire Davidson 2, Tamara Goldfarb 1, Jose M Gonzalez 2, Toby Hunt 2, John Jackson 1, Vinita Joardar 1, Mike P Kay 2, Vamsi K Kodali 1, Fergal J Martin 2, Monica McAndrews 4, Kelly M McGarvey 1, Michael Murphy 1, Bhanu Rajput 1, Sanjida H Rangwala 1, Lillian D Riddick 1, Ruth L Seal 5, Marie-Marthe Suner 2, David Webb 1, Sophia Zhu 4, Bronwen L Aken 2, Elspeth A Bruford 5, Carol J Bult 4, Adam Frankish 2, Terence Murphy 1, Kim D Pruitt 1
Affiliations
- PMID: 29126148
- PMCID: PMC5753299
- DOI: 10.1093/nar/gkx1031
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation
Shashikant Pujar et al. Nucleic Acids Res. 2018.
Abstract
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.
Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Figures
Figure 1.
Number of CCDS IDs and genes represented in the human (A) and mouse (B) CCDS releases. The _X_-axis indicates the year in which a CCDS dataset was made public. Details about CCDS releases are available on the CCDS Releases and Statistics web page (
https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=SHOW\_STATISTICS
).
Figure 2.
Fraction of all genes in a CCDS release that are represented by at least two current CCDS IDs.
Figure 3.
Changes in the human (A) and mouse (B) datasets with every new CCDS release. ‘New’ = new CCDS IDs added; ‘dropped’ = CCDS ID present in the previous release but withdrawn in the subsequent release; ‘updated’ = CCDS IDs that have an incremented accession version compared to the previous release, indicating a sequence update in the coding region.
Figure 4.
A view of the graphical display accessed from the report page of CCDS3542.1 (
) using the purple ‘S’ icon. (A) Transcripts and proteins from NCBI Annotation Release 108. (B) Transcripts and proteins from Ensembl Release 85. The green bar indicates the gene; transcripts are shown in purple and proteins are shown in red color. Positioning the cursor over any of these objects (gene, transcript or protein) opens a tool tip which includes additional information and links. Proteins in the NCBI annotation display that are in the CCDS set include a link to the CCDS ID in the tool tip. The gray box to the right (indicated by vertical arrow) is the tool tip corresponding to the protein accession NP_002514.1. Differences between any two objects can also be revealed as vertical lines (indicated by horizontal arrows) when the objects (NM_002523.2 and ENST00000265634 in the figure) are selected using the ‘Control’ or ‘Command’ button on the keyboard.
Figure 5.
Distribution of human and mouse CCDS IDs by their ‘Review status’ in the current human (Release 20) and mouse (Release 21) CCDS releases at the time of data freeze. Details of the review status categories and sub-categories are provided in Table 1. Reviewed 1 = CCDS IDs reviewed ‘by RefSeq and HAVANA’, Reviewed 2 = CCDS IDs reviewed ‘by CCDS collaboration’, Reviewed 3 = CCDS IDs reviewed ‘by RefSeq, HAVANA and CCDS collaboration’.
Similar articles
- Tracking and coordinating an international curation effort for the CCDS Project.
Harte RA, Farrell CM, Loveland JE, Suner MM, Wilming L, Aken B, Barrell D, Frankish A, Wallin C, Searle S, Diekhans M, Harrow J, Pruitt KD. Harte RA, et al. Database (Oxford). 2012 Mar 20;2012:bas008. doi: 10.1093/database/bas008. Print 2012. Database (Oxford). 2012. PMID: 22434842 Free PMC article. - Current status and new features of the Consensus Coding Sequence database.
Farrell CM, O'Leary NA, Harte RA, Loveland JE, Wilming LG, Wallin C, Diekhans M, Barrell D, Searle SM, Aken B, Hiatt SM, Frankish A, Suner MM, Rajput B, Steward CA, Brown GR, Bennett R, Murphy M, Wu W, Kay MP, Hart J, Rajan J, Weber J, Snow C, Riddick LD, Hunt T, Webb D, Thomas M, Tamez P, Rangwala SH, McGarvey KM, Pujar S, Shkeda A, Mudge JM, Gonzalez JM, Gilbert JG, Trevanion SJ, Baertsch R, Harrow JL, Hubbard T, Ostell JM, Haussler D, Pruitt KD. Farrell CM, et al. Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72. doi: 10.1093/nar/gkt1059. Epub 2013 Nov 11. Nucleic Acids Res. 2014. PMID: 24217909 Free PMC article. - The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.
Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D. Pruitt KD, et al. Genome Res. 2009 Jul;19(7):1316-23. doi: 10.1101/gr.080531.108. Epub 2009 Jun 4. Genome Res. 2009. PMID: 19498102 Free PMC article. - NCBI Taxonomy: a comprehensive update on curation, resources and tools.
Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O'Neill K, Robbertse B, Sharma S, Soussov V, Sullivan JP, Sun L, Turner S, Karsch-Mizrachi I. Schoch CL, et al. Database (Oxford). 2020 Jan 1;2020:baaa062. doi: 10.1093/database/baaa062. Database (Oxford). 2020. PMID: 32761142 Free PMC article. Review. - Scaling national and international improvement in virtual gene panel curation via a collaborative approach to discordance resolution.
Stark Z, Foulger RE, Williams E, Thompson BA, Patel C, Lunke S, Snow C, Leong IUS, Puzriakova A, Daugherty LC, Leigh S, Boustred C, Niblock O, Rueda-Martin A, Gerasimenko O, Savage K, Bellamy W, Lin VSK, Valls R, Gordon L, Brittain HK, Thomas ERA, Taylor Tavares AL, McEntagart M, White SM, Tan TY, Yeung A, Downie L, Macciocca I, Savva E, Lee C, Roesley A, De Fazio P, Deller J, Deans ZC, Hill SL, Caulfield MJ, North KN, Scott RH, Rendon A, Hofmann O, McDonagh EM. Stark Z, et al. Am J Hum Genet. 2021 Sep 2;108(9):1551-1557. doi: 10.1016/j.ajhg.2021.06.020. Epub 2021 Jul 29. Am J Hum Genet. 2021. PMID: 34329581 Free PMC article. Review.
Cited by
- The implications of APOBEC3-mediated C-to-U RNA editing for human disease.
Van Norden M, Falls Z, Mandloi S, Segal BH, Baysal BE, Samudrala R, Elkin PL. Van Norden M, et al. Commun Biol. 2024 May 4;7(1):529. doi: 10.1038/s42003-024-06239-w. Commun Biol. 2024. PMID: 38704509 Free PMC article. - Cellular energy regulates mRNA degradation in a codon-specific manner.
Tomaz da Silva P, Zhang Y, Theodorakis E, Martens LD, Yépez VA, Pelechano V, Gagneur J. Tomaz da Silva P, et al. Mol Syst Biol. 2024 May;20(5):506-520. doi: 10.1038/s44320-024-00026-9. Epub 2024 Mar 15. Mol Syst Biol. 2024. PMID: 38491213 Free PMC article. - Joint analysis of mutational and transcriptional landscapes in human cancer reveals key perturbations during cancer evolution.
Cho JW, Cao J, Hemberg M. Cho JW, et al. Genome Biol. 2024 Mar 8;25(1):65. doi: 10.1186/s13059-024-03201-1. Genome Biol. 2024. PMID: 38459554 Free PMC article. - Genetic architecture and biology of youth-onset type 2 diabetes.
Kwak SH, Srinivasan S, Chen L, Todd J, Mercader JM, Jensen ET, Divers J, Mottl AK, Pihoker C, Gandica RG, Laffel LM, Isganaitis E, Haymond MW, Levitsky LL, Pollin TI, Florez JC, Flannick J; Progress in Diabetes Genetics in Youth (ProDiGY) consortium. Kwak SH, et al. Nat Metab. 2024 Feb;6(2):226-237. doi: 10.1038/s42255-023-00970-0. Epub 2024 Jan 26. Nat Metab. 2024. PMID: 38278947 Free PMC article. - Antibody production relies on the tRNA inosine wobble modification to meet biased codon demand.
Giguère S, Wang X, Huber S, Xu L, Warner J, Weldon SR, Hu J, Phan QA, Tumang K, Prum T, Ma D, Kirsch KH, Nair U, Dedon P, Batista FD. Giguère S, et al. Science. 2024 Jan 12;383(6679):205-211. doi: 10.1126/science.adi1763. Epub 2024 Jan 11. Science. 2024. PMID: 38207021 Free PMC article.
References
- O’Leary N.A., Wright M.W., Brister J.R., Ciufo S., Haddad D., McVeigh R., Rajput B., Robbertse B., Smith-White B., Ako-Adjei D. et al. . Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016; 44:D733–D745. - PMC - PubMed
- Pruitt K.D., Harrow J., Harte R.A., Wallin C., Diekhans M., Maglott D.R., Searle S., Farrell C.M., Loveland J.E., Ruef B.J. et al. . The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 2009; 19:1316–1323. - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources