A beginner's guide to eukaryotic genome annotation (original) (raw)
Adams, M. D. et al. The genome sequence of Drosophila melanogaster. Science287, 2185–2195 (2000). PubMed Google Scholar
Celniker, S. E. et al. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol.3, research0079 (2002). PubMedPubMed Central Google Scholar
Venter, J. C. et al. The sequence of the human genome. Science291, 1304–1351 (2001). CASPubMed Google Scholar
Finishing the euchromatic sequence of the human genome. Nature431, 931–945 (2004).
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods5, 621–628 (2008). CASPubMed Google Scholar
Wang, E. T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature456, 470–476 (2008). This paper provides one of the most extensively documented surveys of alternatively spliced transcripts. It is a key publication for understanding how extensive alternative splicing is in human tissues, for understanding how powerful RNA-seq data are as a tool for discovering new transcripts and for quantifying their abundance and differential expression patterns. CASPubMedPubMed Central Google Scholar
Chain, P. S. et al. Genomics. Genome project standards in a new era of sequencing. Science326, 236–237 (2009). CASPubMed Google Scholar
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res.18, 188–196 (2008). CASPubMedPubMed Central Google Scholar
Ye, L. et al. A vertebrate case study of the quality of assemblies derived from next-generation sequences. Genome Biol.12, R31 (2011). CASPubMedPubMed Central Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics23, 1061–1067 (2007). CASPubMed Google Scholar
Tsai, I. J., Otto, T. D. & Berriman, M. Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol.11, R41 (2010). PubMedPubMed Central Google Scholar
Assefa, S., Keane, T. M., Otto, T. D., Newbold, C. & Berriman, M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics25, 1968–1969 (2009). CASPubMedPubMed Central Google Scholar
Husemann, P. & Stoye, J. r2cat: synteny plots and comparative assembly. Bioinformatics26, 570–571 (2010). CASPubMed Google Scholar
Kapitonov, V. V. & Jurka, J. A novel class of SINE elements derived from 5S rRNA. Mol. Biol. Evol.20, 694–702 (2003). CASPubMed Google Scholar
Kapitonov, V. V. & Jurka, J. A universal classification of eukaryotic transposable elements implemented in Repbase. Nature Rev. Genet.9, 411–412; author reply 414 (2008). PubMed Google Scholar
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature409, 860–921 (2001). ArticleCASPubMed Google Scholar
Buisine, N., Quesneville, H. & Colot, V. Improved detection and annotation of transposable elements in sequenced genomes using multiple reference sequence sets. Genomics91, 467–475 (2008). CASPubMed Google Scholar
Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res.38, e199 (2010). PubMedPubMed Central Google Scholar
McClure, M. A. et al. Automated characterization of potentially active retroid agents in the human genome. Genomics85, 512–523 (2005). CASPubMed Google Scholar
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res.12, 1269–1276 (2002). CASPubMedPubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics21 (Suppl. 1), i351–i358 (2005). CASPubMed Google Scholar
Morgulis, A., Gertz, E. M., Schaffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics22, 134–141 (2006). ArticleCASPubMed Google Scholar
Treangen, T. J. & Salzberg, S. L. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Rev. Genet.13, 36–46 (2012). CAS Google Scholar
Bergman, C. M. & Quesneville, H. Discovering and detecting transposable elements in genome sequences. Brief. Bioinform.8, 382–392 (2007). CASPubMed Google Scholar
Cordaux, R. & Batzer, M. A. The impact of retrotransposons on human genome evolution. Nature Rev. Genet.10, 691–703 (2009). CASPubMed Google Scholar
Smit, A. F., Hubley, R. & Green, P. RepeatMasker 3.0 repeatmasker.org[online], (1996–2010).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol.215, 403–410 (1990). CASPubMed Google Scholar
Korf, I., Yandell, M. & Bedell, J. BLAST: an Essential Guide to the Basic Local Alignment Search Tool 339 (O'Reilly & Associates, 2003). Everyone involved with a genome project should be familiar with BLAST. Reference 31 is the original paper describing this tool. Reference 32 is an entire book describing BLAST and how it is used. Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res.25, 3389–3402 (1997). CASPubMedPubMed Central Google Scholar
Green, P. Crossmatch. A general purpose utility for comparing any two sets of DNA sequences. PHRAP[online], (1993–1996).
Majoros, W. H. Methods for Computational Gene Prediction 2 (Cambridge Univ. Press, 2007). Google Scholar
Bairoch, A., Boeckmann, B., Ferro, S. & Gasteiger, E. Swiss-Prot: juggling between evolution and stability. Brief. Bioinform.5, 39–55 (2004). CASPubMed Google Scholar
Boeckmann, B. et al. Protein variety and functional diversity: Swiss-Prot annotation in its biological context. C.R. Biol.328, 882–899 (2005). CASPubMed Google Scholar
The UniProt Consortium. Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res.39, D214–D219 (2011).
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res.37, D26–D31 (2009). CASPubMed Google Scholar
Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res.37, D5–D15 (2009). CASPubMed Google Scholar
Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics6, 31 (2005). PubMedPubMed Central Google Scholar
Kapustin, Y., Souvorov, A., Tatusova, T. & Lipman, D. Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct3, 20 (2008). PubMedPubMed Central Google Scholar
Wheelan, S. J., Church, D. M. & Ostell, J. M. Spidey: a tool for mRNA-to-genomic alignments. Genome Res.11, 1952–1957 (2001). CASPubMedPubMed Central Google Scholar
Florea, L., Hartzell, G., Zhang, Z., Rubin, G. M. & Miller, W. A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res.8, 967–974 (1998). CASPubMedPubMed Central Google Scholar
Garber, M., Grabherr, M. G., Guttman, M. & Trapnell, C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nature Methods8, 469–477 (2011). CASPubMed Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res.20, 265–272 (2010). CASPubMedPubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nature Biotech.29, 644–652 (2011). This paper describes Trinity, a transcriptome assembler that was specifically designed for next-generation sequence data. It is required reading for anyone trying to use RNA-seq data for genome annotation. CAS Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics25, 1105–1111 (2009). CASPubMedPubMed Central Google Scholar
Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics26, 873–881 (2010). CASPubMedPubMed Central Google Scholar
Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotech.28, 503–510 (2010). CAS Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotech.28, 511–515 (2010). CAS Google Scholar
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protoc.7, 562–578 (2012). This paper describes best practice approaches for combining TopHat and Cufflinks when using RNA-seq data for genome annotation. CAS Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res.31, 5654–5666 (2003). CASPubMedPubMed Central Google Scholar
Guigo, R., Knudsen, S., Drake, N. & Smith, T. Prediction of gene structure. J. Mol. Biol.226, 141–157 (1992). CASPubMed Google Scholar
Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames. Proc. Int. Conf. Intell. Syst. Mol. Biol.2, 354–362 (1994). CASPubMed Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol.268, 78–94 (1997). This study describes theab initiogene predictor GenScan. It is a classic paper that is full of informative explanations of the problems associated with eukaryotic gene prediction. CASPubMed Google Scholar
Reese, M. G., Kulp, D., Tammana, H. & Haussler, D. Genie—gene finding in Drosophila melanogaster. Genome Res.10, 529–538 (2000). CASPubMedPubMed Central Google Scholar
Brent, M. R. Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res.15, 1777–1786 (2005). CASPubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics5, 59 (2004). This paper describes a gene predictor, SNAP, that is easy to use and to configure. It also clearly explains the pitfalls that are associated with using a poorly trained gene finder or one that has been trained on a different genome from the one that is being annotated. ArticlePubMedPubMed Central Google Scholar
Reese, M. G. & Guigo, R. EGASP: Introduction. Genome Biol.7 (Suppl. 1), 1–3 (2006). This is the introduction to an entire issue ofGenome Biologythat is dedicated to benchmarking an entire host of eukaryotic gene finders and annotation pipelines. Anyone involved with a genome annotation project should have a look at every paper in this special supplement. PubMed Google Scholar
Coghlan, A. et al. nGASP—the nematode genome annotation assessment project. BMC Bioinformatics9, 549 (2008). PubMedPubMed Central Google Scholar
Guigo, R. & Reese, M. G. EGASP: collaboration through competition to find human genes. Nature Methods2, 575–577 (2005). CASPubMed Google Scholar
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics19 (Suppl. 2), ii215–ii225 (2003). PubMed Google Scholar
Stanke, M., Schoffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics7, 62 (2006). PubMedPubMed Central Google Scholar
Lukashin, A. V. & Borodovsky, M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res.26, 1107–1115 (1998). CASPubMedPubMed Central Google Scholar
Ter-Hovhannisyan, V., Lomsadze, A., Chernoff, Y. O. & Borodovsky, M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res.18, 1979–1990 (2008). CASPubMedPubMed Central Google Scholar
Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res.38, e132 (2010). PubMedPubMed Central Google Scholar
Korf, I., Flicek, P., Duan, D. & Brent, M. R. Integrating genomic homology into gene structure prediction. Bioinformatics17, S140–S148 (2001). PubMed Google Scholar
Salamov, A. A. & Solovyev, V. V. Ab initio gene finding in Drosophila genomic DNA. Genome Res.10, 516–522 (2000). CASPubMedPubMed Central Google Scholar
Souvorov, A. et al. Gnomon — the NCBI eukaryotic gene prediction tool. National Center for Biotechnology Information[online], (2010). Google Scholar
Howe, K. L., Chothia, T. & Durbin, R. GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Res.12, 1418–1427 (2002). CASPubMedPubMed Central Google Scholar
Mungall, C. J. et al. An integrated computational pipeline and database to support whole-genome sequence annotation. Genome Biol.3, research0081 (2002). CASPubMedPubMed Central Google Scholar
Misra, S. et al. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol.3, research0083 (2002). PubMedPubMed Central Google Scholar
Yandell, M. et al. A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome. Proc. Natl Acad. Sci. USA102, 1566–1571 (2005). CASPubMedPubMed Central Google Scholar
Allen, J. E. & Salzberg, S. L. JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics21, 3596–3603 (2005). CASPubMed Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol.9, R7 (2008). PubMedPubMed Central Google Scholar
Liu, Q., Mackey, A. J., Roos, D. S. & Pereira, F. C. Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction. Bioinformatics24, 597–605 (2008). CASPubMed Google Scholar
Haas, B. J., Zeng, Q., Pearson, M. D., Cuomo, C. A. & Wortman, J. R. Approaches to fungal genome annotation. Mycology2, 118–141 (2011). This paper provides an excellent description of the process used by the Broad Institute for fungal annotation. It is also a good resource for those seeking to learn more about PASA; for more information about PASA, see reference 56. CASPubMed Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics12, 491 (2011). This study describes the database management and annotation quality-control tools for the MAKER2 genome annotation pipeline. It also explains many of the challenges that are associated with annotating novel genomes and how to overcome them. PubMedPubMed Central Google Scholar
Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA85, 2444–2448 (1988). CASPubMedPubMed Central Google Scholar
Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol.6, R44 (2005). PubMedPubMed Central Google Scholar
Donlin, M. J. in Current Protocols in Bioinformatics. Ch. 9, Unit 9.9 (2007). Google Scholar
Skinner, M. E., Uzilov, A. V., Stein, L. D., Mungall, C. J. & Holmes, I. H. JBrowse: a next-generation genome browser. Genome Res.19, 1630–1638 (2009). CASPubMedPubMed Central Google Scholar
Zhou, P., Emmert, D. & Zhang, P. in Current Protocols in Bioinformatics Ch. 9, Unit 9.6 (2006). Google Scholar
Klimke, W. et al. Solving the problem: genome annotation standards before the data deluge. Stand. Genomic Sci.5, 168–193 (2011). CASPubMedPubMed Central Google Scholar
Brister, J. R. et al. Towards viral genome annotation standards, report from the 2010 NCBI annotation workshop. Viruses2, 2258–2268 (2010). PubMedPubMed Central Google Scholar
Madupu, R. et al. Meeting report: a workshop on best practices in genome annotation. Database2010, baq001 (2010). PubMedPubMed Central Google Scholar
Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol. Biol.396, 59–70 (2007). CASPubMed Google Scholar
Finn, R. D. et al. The Pfam protein families database. Nucleic Acids Res.38, D211–D222 (2010). CASPubMed Google Scholar
Holt, C. Tools and Techniques for Genome Annotation Analysis. Ph.D. thesis, Univ. Utah (2011). Google Scholar
Eilbeck, K., Moore, B., Holt, C. & Yandell, M. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics10, 67 (2009). This paper describes a number of annotation quality-control measures, including annotation edit distance (AED). It also provides some interesting meta-analyses describing the impact of curation efforts on the gene annotations of several model organism databases over a period of several years. PubMedPubMed Central Google Scholar
Engels, R. Argo Genome Browser version 1.0.31. Broad Institute[online], (2010). Google Scholar
Rutherford, K. et al. Artemis: sequence visualization and annotation. Bioinformatics16, 944–945 (2000). CASPubMed Google Scholar
Hartl, D. L. Fly meets shotgun: shotgun wins. Nature Genet.24, 327–328 (2000). CASPubMed Google Scholar
Desk, B. H. Introduction to the standalone WWW Blast server. National Center for Biotechnology Information[online], (2002). This page explains how to use a suite of programs to set up a local Blast server for your local database. Google Scholar
Stein, L. D. et al. The generic genome browser: a building block for a model organism system database. Genome Res.12, 1599–1610 (2002). CASPubMedPubMed Central Google Scholar
Munoz-Torres, M. C. et al. Hymenoptera Genome Database: integrated community resources for insect species of the order Hymenoptera. Nucleic Acids Res.39, D658–D662 (2011). CASPubMed Google Scholar
Smith, C. D. et al. Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). Proc. Natl Acad. Sci. USA108, 5673–5678 (2011). CASPubMedPubMed Central Google Scholar
Suen, G. et al. The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet.7, e1002007 (2011). PubMedPubMed Central Google Scholar
Nygaard, S. et al. The genome of the leaf-cutting ant Acromyrmex echinatior suggests key adaptations to advanced social life and fungus farming. Genome Res.21, 1339–1348 (2011). CASPubMedPubMed Central Google Scholar
Curwen, V. et al. The Ensembl automatic gene annotation system. Genome Res.14, 942–950 (2004). This paper describes the Ensembl genome annotation pipeline; although the article is now several years old, it is still a good place to start. We would recommend reading this paper and then browsing the extensive Ensembl web site for more information. CASPubMedPubMed Central Google Scholar
Youens-Clark, K. et al. Gramene database in 2010: updates and extensions. Nucleic Acids Res.39, D1085–D1094 (2011). CASPubMed Google Scholar
Duvick, J. et al. PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res.36, D959–D965 (2008). CASPubMed Google Scholar
Goodstein, D. M. et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res.40, D1178–D1186 (2012). CASPubMed Google Scholar
Lawson, D. et al. VectorBase: a data resource for invertebrate vector genomics. Nucleic Acids Res.37, D583–D587 (2009). CASPubMed Google Scholar
Karro, J. E. et al. Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res.35, D55–D60 (2007). CASPubMed Google Scholar
Zheng, D. et al. Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J. Mol. Biol.349, 27–45 (2005). CASPubMed Google Scholar
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic Acids Res.31, 439–441 (2003). CASPubMedPubMed Central Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res.35, 3100–3108 (2007). CASPubMedPubMed Central Google Scholar
Dolezel, J. & Bartos, J. Plant DNA flow cytometry and estimation of nuclear genome size. Ann. Botany95, 99–110 (2005). CAS Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res.25, 955–964 (1997). CASPubMedPubMed Central Google Scholar
Schattner, P., Brooks, A. N. & Lowe, T. M. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res.33, W686–W689 (2005). CASPubMedPubMed Central Google Scholar
Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P. & Burge, C. B. Prediction of mammalian microRNA targets. Cell115, 787–798 (2003). CASPubMed Google Scholar
Eddy, S. R. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure. BMC Bioinformatics3, 18 (2002). PubMedPubMed Central Google Scholar
Holmes, I. & Rubin, G. M. Pairwise RNA structure comparison with stochastic context-free grammars. Pac. Symp. Biocomput.7, 163–174 (2002). Google Scholar
QIAGEN. Quick-Start Protocol miRNAeasy Mini Kit. QIAGEN[online], (2011).
Chen, C. et al. Real-time quantification of microRNAs by stem–loop RT-PCR. Nucleic Acids Res.33, e179 (2005). PubMedPubMed Central Google Scholar
van Leeuwen, S. & Mikkers, H. Long non-coding RNAs: guardians of development. Differentiation80, 175–183 (2010). CASPubMed Google Scholar
Hung., T. & Chang, H. Y. Long noncoding RNA in genome regulation: prospects and mechanisms. RNA Biol.7, 582–585 (2010). CASPubMedPubMed Central Google Scholar
Tam, O. H. et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature453, 534–538 (2008). CASPubMedPubMed Central Google Scholar
Zhang, Z., Carriero, N. & Gerstein, M. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet.20, 62–67 (2004). PubMed Google Scholar
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics25, 1335–1337 (2009). CASPubMedPubMed Central Google Scholar
Burset, M. & Guigo, R. Evaluation of gene structure prediction programs. Genomics34, 353–367 (1996). This paper provides an excellent explanation of how sensitivity and specificity measures can be used to evaluate gene finder performance. This is a classic paper in the field and should be read by anyone involved in gene annotation. CASPubMed Google Scholar
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics16, 412–424 (2000). CASPubMed Google Scholar
Guigo, R. et al. EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol.7 (Suppl. 1), 1–31 (2006). PubMed Google Scholar
Schweikert, G. et al. mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res.19, 2133–2143 (2009). CASPubMedPubMed Central Google Scholar
Yeh, R. F., Lim, L. P. & Burge, C. B. Computational inference of homologous gene structures in the human genome. Genome Res.11, 803–816 (2001). CASPubMedPubMed Central Google Scholar
Gross, S. S., Do, C. B., Sirota, M. & Batzoglou, S. CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction. Genome Biol.8, R269 (2007). PubMedPubMed Central Google Scholar
Bernal, A., Crammer, K., Hatzigeorgiou, A. & Pereira, F. Global discriminative learning for higher-accuracy computational gene prediction. PLoS Comput. Biol.3, e54 (2007). PubMedPubMed Central Google Scholar
Usuka, J., Zhu, W. & Brendel, V. Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics16, 203–211 (2000). CASPubMed Google Scholar
Kiryutin, B. ProSplign. National Center for Biotechnology Information[online], (2011). Google Scholar
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res.38, e178 (2010). PubMedPubMed Central Google Scholar
Kitts, P. in The NCBI Handbook (ed. McEntyre, J. & Ostell, J.) (National Center for Biotechnology Information, 2003). Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nature Biotech.29, 24–26 (2011). CAS Google Scholar