Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data - PubMed (original) (raw)
Case Reports
doi: 10.1101/gr.6.9.829.
Affiliations
- PMID: 8889550
- DOI: 10.1101/gr.6.9.829
Free article
Case Reports
Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data
J S Aaronson et al. Genome Res. 1996 Sep.
Free article
Abstract
A rigorous analysis of the Merck-sponsored EST data with respect to known gene sequences increases the utility of the data set and helps refine methods for building a gene index. A highly curated human transcript data base was used as a reference data set of known genes. A detailed analysis of EST sequences derived from known genes was performed to assess the accuracy of EST sequence annotation. The EST data was screened to remove low-quality and low-complexity sequences. A set of high-quality ESTs similar to the transcript data base was identified using BLAST; this subset of ESTs was compared with the set of known genes using the Smith-Waterman algorithm. Error rates of several types were assessed based on a flexible match criterion defining sequence identity. The rate of lane-tracking errors is very low, approximately 0.5%. Insert size data is accurate within approximately 20%. Reversed clone and internal priming error rates are approximately 5% and 2.5%, respectively, contributing to the incorrect identification of reads as 3' ends of genes. Follow-up investigation reveals that a significant number of clones, miscategorized as reversed, represent overlapping genes on the opposite strand of entries in the transcript data base. Relevance of these results to the creation of a high-quality index to the human genome capable of supporting diverse genomic investigations is discussed.
Similar articles
- A comparison of expressed sequence tags (ESTs) to human genomic sequences.
Wolfsberg TG, Landsman D. Wolfsberg TG, et al. Nucleic Acids Res. 1997 Apr 15;25(8):1626-32. doi: 10.1093/nar/25.8.1626. Nucleic Acids Res. 1997. PMID: 9092672 Free PMC article. - Analysis of EST-driven gene annotation in human genomic sequence.
Bailey LC Jr, Searls DB, Overton GC. Bailey LC Jr, et al. Genome Res. 1998 Apr;8(4):362-76. doi: 10.1101/gr.8.4.362. Genome Res. 1998. PMID: 9548972 - It's the genes! EST access to human genome content.
Gerhold D, Caskey CT. Gerhold D, et al. Bioessays. 1996 Dec;18(12):973-81. doi: 10.1002/bies.950181207. Bioessays. 1996. PMID: 8976154 Review. - [Anatomy of EST data].
Mizuno K, Okubo K. Mizuno K, et al. Tanpakushitsu Kakusan Koso. 1997 Dec;42(17 Suppl):2814-21. Tanpakushitsu Kakusan Koso. 1997. PMID: 9455198 Review. Japanese. No abstract available.
Cited by
- Template-switching artifacts resemble alternative polyadenylation.
Balázs Z, Tombácz D, Csabai Z, Moldován N, Snyder M, Boldogkői Z. Balázs Z, et al. BMC Genomics. 2019 Nov 8;20(1):824. doi: 10.1186/s12864-019-6199-7. BMC Genomics. 2019. PMID: 31703623 Free PMC article. - Overview of DNA microarrays: types, applications, and their future.
Bumgarner R. Bumgarner R. Curr Protoc Mol Biol. 2013 Jan;Chapter 22:Unit 22.1.. doi: 10.1002/0471142727.mb2201s101. Curr Protoc Mol Biol. 2013. PMID: 23288464 Free PMC article. - The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment.
Ning K, Nesvizhskii AI. Ning K, et al. BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S14. doi: 10.1186/1471-2105-11-S11-S14. BMC Bioinformatics. 2010. PMID: 21172049 Free PMC article. - A multiway analysis for identifying high integrity bovine BACs.
Ratnakumar A, Barris W, McWilliam S, Brauning R, McEwan JC, Snelling WM, Dalrymple BP. Ratnakumar A, et al. BMC Genomics. 2009 Jan 23;10:46. doi: 10.1186/1471-2164-10-46. BMC Genomics. 2009. PMID: 19166603 Free PMC article. - Exploring the transcriptome of the burrowing nematode Radopholus similis.
Jacob J, Mitreva M, Vanholme B, Gheysen G. Jacob J, et al. Mol Genet Genomics. 2008 Jul;280(1):1-17. doi: 10.1007/s00438-008-0340-7. Epub 2008 Apr 2. Mol Genet Genomics. 2008. PMID: 18386064
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials