AceView: a comprehensive cDNA-supported gene and transcripts annotation - PubMed (original) (raw)

AceView: a comprehensive cDNA-supported gene and transcripts annotation

Danielle Thierry-Mieg et al. Genome Biol. 2006.

Abstract

Background: Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and finding new genes. The organizers evaluated the protein predictions in depth. We present a complementary analysis of the mRNAs, including alternative transcript variants.

Results: We evaluate 25 gene tracks from the University of California Santa Cruz (UCSC) genome browser. We either distinguish or collapse the alternative splice variants, and compare the genomic coordinates of exons, introns and nucleotides. Whole mRNA models, seen as chains of introns, are sorted to find the best matching pairs, and compared so that each mRNA is used only once. At the mRNA level, AceView is by far the closest to Gencode: the vast majority of transcripts of the two methods, including alternative variants, are identical. At the protein level, however, due to a lack of experimental data, our predictions differ: Gencode annotates proteins in only 41% of the mRNAs whereas AceView does so in virtually all. We describe the driving principles of AceView, and how, by performing hand-supervised automatic annotation, we solve the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA-supported transcriptome. AceView accuracy is now validated by Gencode.

Conclusion: Relative to a consensus mRNA catalog constructed from all evidence-based annotations, Gencode and AceView have 81% and 84% sensitivity, and 74% and 73% specificity, respectively. This close agreement validates a richer view of the human transcriptome, with three to five times more transcripts than in UCSC Known Genes (sensitivity 28%), RefSeq (sensitivity 21%) or Ensembl (sensitivity 19%).

PubMed Disclaimer

Figures

Figure 1

Comparison of introns between the Gencode reference and the 24 tracks, ordered by decreasing sensitivity, over the 31 test regions. Gencode validates 3,618 unique introns and a total of 9,693 introns in its alternative transcripts. (a) Projected measure: each intron is counted only once per method. Introns with the same coordinates as Gencode introns are shown in green and novel introns in red. The Gencode introns missed in each track (false negative) correspond to the distance between the 'true positive' bar and the Gencode reference, but are not explicitly represented. (b) Quantitative measure: all alternative variants are counted separately. Introns identical to Gencode introns, but over-used relative to Gencode are counted (in yellow) separately from novel introns that are not known to Gencode.

Figure 2

Comparison of whole transcripts. (a) Strategy for selecting the best one to one matching pairs. (b) Comparison of whole transcripts through their intron signatures. The number of transcripts identical to Gencode, best-matching but different from Gencode, new transcripts in Gencode genes and new transcripts in new genes are represented.

Figure 3

Consensus analysis. (a) Sensitivity and specificity at identifying 1,556 consensus transcripts from the pool of the following evidence-based tracks: RefSeq, Known Gene, Ensembl, Gencode, AceView, ECgene and ExonWalk. The sensitivity and specificity of all tracks at identifying these consensus models is plotted and listed in Table 2. **(b)**Closest neighbor consensus, evaluated by switching the track of reference. This figure shows the number of evidence-based models from CCDS, RefSeq, UCSC Known Genes, Gencode, or AceView, ExonWalk and Ensembl whose intron-exon structure is exactly matched by the 25 tracks. Tracks are arranged in decreasing order of averaged detection sensitivity, defined here as the sum of all evidence-based models from these seven reference tracks detected exactly.

Cited by

The uniqueness of ABCB5 as a full transporter ABCB5FL and a half-transporter-like ABCB5β.
Gerard L, Gillet JP. Gerard L, et al. Cancer Drug Resist. 2024 Aug 7;7:29. doi: 10.20517/cdr.2024.56. eCollection 2024. Cancer Drug Resist. 2024. PMID: 39267923 Free PMC article. Review.
Discovery of NRG1-VII: the myeloid-derived class of NRG1.
Berrocal-Rubio MA, Pawer YDJ, Dinevska M, De Paoli-Iseppi R, Widodo SS, Gleeson J, Rajab N, De Nardo W, Hallab J, Li A, Mantamadiotis T, Clark MB, Wells CA. Berrocal-Rubio MA, et al. BMC Genomics. 2024 Aug 29;25(1):814. doi: 10.1186/s12864-024-10723-2. BMC Genomics. 2024. PMID: 39210279 Free PMC article.
Adaptive Evolution and Functional Differentiation of Testis-Expressed Genes in Theria.
Katsura Y, Shigenobu S, Satta Y. Katsura Y, et al. Animals (Basel). 2024 Aug 9;14(16):2316. doi: 10.3390/ani14162316. Animals (Basel). 2024. PMID: 39199849 Free PMC article.
Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing.
Gong B, Li D, Łabaj PP, Pan B, Novoradovskaya N, Thierry-Mieg D, Thierry-Mieg J, Chen G, Bergstrom Lucas A, LoCoco JS, Richmond TA, Tseng E, Kusko R, Happe S, Mercer TR, Pabón-Peña C, Salmans M, Tilgner HU, Xiao W, Johann DJ Jr, Jones W, Tong W, Mason CE, Kreil DP, Xu J. Gong B, et al. Sci Data. 2024 Aug 16;11(1):892. doi: 10.1038/s41597-024-03741-y. Sci Data. 2024. PMID: 39152166 Free PMC article.
IL-2Rα KO mice exhibit maternal microchimerism and reveal nuclear localization of IL-2Rα in lymphoid and non-lymphoid cells.
Wong VA, Dinh KN, Chen G, Wrenshall LE. Wong VA, et al. Front Immunol. 2024 May 15;15:1369818. doi: 10.3389/fimmu.2024.1369818. eCollection 2024. Front Immunol. 2024. PMID: 38812502 Free PMC article.

References

1. ENCODE Project Consortium The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. - DOI - PubMed
1. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34:D590–598. doi: 10.1093/nar/gkj144. - DOI - PMC - PubMed
1. UCSC Genome Browser: ENCODE Regions http://genome.ucsc.edu/ENCODE/encode.hg17.html
1. HAVANA http://www.sanger.ac.uk/HGP/havana/havana.shtml
1. Guigo R, Reese MG. EGASP: collaboration through competition to find human genes. Nat Methods. 2005;2:575–577. doi: 10.1038/nmeth0805-575. - DOI - PubMed

AceView: a comprehensive cDNA-supported gene and transcripts annotation - PubMed (original) (raw)