AntiHunter 2.0: increased speed and sensitivity in searching BLAST output for EST antisense transcripts - PubMed (original) (raw)

AntiHunter 2.0: increased speed and sensitivity in searching BLAST output for EST antisense transcripts

Giovanni Lavorgna et al. Nucleic Acids Res. 2005.

Abstract

An increasing number of eukaryotic and prokaryotic genes are being found to have natural antisense transcripts (NATs). There is also growing evidence to suggest that antisense transcription could play a key role in many human diseases. Consequently, there have been several recent attempts to set up computational procedures aimed at identifying novel NATs. Our group has developed the AntiHunter program for the identification of expressed sequence tag (EST) antisense transcripts from BLAST output. In order to perform an analysis, the program requires a genomic sequence plus an associated list of transcript names and coordinates of the genomic region. After masking the repeated regions, the program carries out a BLASTN search of this sequence in the selected EST database, reporting via email the EST entries that reveal an antisense transcript according to the user-supplied list. Here, we present the newly developed version 2.0 of the AntiHunter tool. Several improvements have been added to this version of the program in order to increase its ability to detect a larger number of antisense ESTs. As a result, AntiHunter can now detect, on average, >45% more antisense ESTs with little or no increase in the percentage of the false positives. We also raised the maximum query size to 3 Mb (previously 1 Mb). Moreover, we found that a reasonable trade-off between the program search sensitivity and the maximum allowed size of the input-query sequence could be obtained by querying the database with the MEGABLAST program, rather than by using the BLAST one. We now offer this new opportunity to users, i.e. if choosing the MEGABLAST option, users can input a query sequence up to 30 Mb long, thus considerably improving the possibility to analyze longer query regions. The AntiHunter tool is freely available at http://bioinfo.crs4.it/AH2.0.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Parameterizing the value of constant ‘Bases_Searched_For_Splicing_Consensi’ in AntiHunter. The constant ‘Bases_Searched_For_Splicing_Consensi’ determines the number of bases located upstream and downstream of the edge of a BLAST alignment between a genomic and an EST sequence that are searched for in the presence of splicing consensi. It used to be set to a fixed value of 5 in the AntiHunter program. This low value made unfeasible the detection of alignments like those shown in (A), where up to 11 spurious bases (shown in boldface uppercase) are added at the edge of the alignment between a query genomic sequence from MYCN locus (coordinates: chr2:16024168-16039977 from the release hg17 of the UCSC genome browser) and the EST AA609982. The specialized programs SIM4 (

) correctly detects the alignment boundaries of the alignment, as shown in (B). In AntiHunter 2.0, this hard-coded constant value has been parameterized, allowing the user to experiment with it: the splicing consensi are indeed correctly identified by AntiHunter when using a value >11.

Figure 2

Figure 2

Benchmarking the performance of AntiHunter 2.0. The capability of AntiHunter 2.0 to detect EST antisense transcripts was compared with that of AntiHunter on a test case of 15 genomic regions, containing overlapping transcriptional units previously described in literature in mammalian genomes (for details see

). As a result, AntiHunter 2.0 detected a significantly larger number, 272 versus 186, of antisense ESTs than the previous version of the program. The newly detected ESTs belonged to six different genomic loci (ASE-1, RFPL3S, RFPL1, MYCN, FGF2 and THRA).

Similar articles

Cited by

References

    1. Wagner E.G., Altuvia S., Romby P. Antisense RNAs in bacteria and their genetic elements. Adv. Genet. 2002;46:361–398. - PubMed
    1. Prescott E.M., Proudfoot N.J. Transcriptional collision between convergent genes in budding yeast. Proc. Natl Acad. Sci. USA. 2002;99:8796–8801. - PMC - PubMed
    1. Moore T., Constancia M., Zubair M., Bailleul B., Feil R., Sasaki H., Reik W. Multiple imprinted sense and antisense transcripts, differential methylation and tandem repeats in a putative imprinting control region upstream of mouse Igf2. Proc. Natl Acad. Sci. USA. 1997;9:12509–12514. - PMC - PubMed
    1. Sleutels F., Zwart R., Barlow D.P. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature. 2002;415:810–813. - PubMed
    1. Billy E., Brondani V., Zhang H., Muller U., Filipowicz W. Specific interference with gene expression induced by long, double-stranded RNA in mouse embryonal teratocarcinoma cell lines. Proc. Natl Acad. Sci. USA. 2001;98:14428–14433. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources