Biases in Illumina transcriptome sequencing caused by random hexamer priming - PubMed (original) (raw)
Biases in Illumina transcriptome sequencing caused by random hexamer priming
Kasper D Hansen et al. Nucleic Acids Res. 2010 Jul.
Abstract
Generation of cDNA using random hexamer priming induces biases in the nucleotide composition at the beginning of transcriptome sequencing reads from the Illumina Genome Analyzer. The bias is independent of organism and laboratory and impacts the uniformity of the reads along the transcriptome. We provide a read count reweighting scheme, based on the nucleotide frequencies of the reads, that mitigates the impact of the bias.
Figures
Figure 1.
Nucleotide frequencies versus position for stringently mapped reads. For each experiment, mapped reads were extended upstream of the 5′-start position, such that the first position of the actual read is 1 and positions 0 to −20 are obtained from the genome. The first hexamer of the read is shaded. Brief experimental protocols are indicated in the key. (a) RNA-Seq experiments conducted using priming with random hexamers, with and without RNA fragmentation. (b) DNA resequencing and ChIP-Seq experiments. (c) RNA-Seq experiments with alternative library preparation protocols, including priming with random hexamers followed by fragmentation using DNase I and priming with oligo(dT) followed by fragmentation using either DNase I, nebulization or sonication.
Figure 2.
Hexamer frequencies. (a) The logarithm (base 2) of all (4096) observed hexamer frequencies computed using positions 1–6 of the aligned reads for an experiment in H. sapiens (8) versus an experiment in S. cerevisiae (9). The two distributions have a correlation of . (b) As in (a), but the hexamers correspond to positions 25–30 of the aligned reads, with a correlation of .
Figure 3.
Nucleotide frequencies versus position for stringently mapped stranded reads for the A nucleotide. (a and b) As in Figure 1a, but split according to whether reads map to the sense or antisense strand. (c) Difference between the frequencies in (a and b).
Figure 4.
Evaluation of the reweighting scheme. (a and b) Unadjusted and re-weighted base-level counts for reads from the WT experiment mapped to the sense strand of a 1-kb coding region in S. cerevisiae (YOL086C). The graey bars near the _x_-axis indicate unmappable genomic locations. (c) The goodness-of-fit statistics based on unadjusted and reweighted counts for 552 highly expressed regions of constant expression. (d) Smoothed histograms of the reduction in goodness-of-fit statistics when using the re-weighting scheme, evaluated in five different experiments. Values greater than zero indicate that the re-weighting scheme improves the uniformity of the read distribution.
Similar articles
- RNA sequencing and quantitation using the Helicos Genetic Analysis System.
Raz T, Causey M, Jones DR, Kieu A, Letovsky S, Lipson D, Thayer E, Thompson JF, Milos PM. Raz T, et al. Methods Mol Biol. 2011;733:37-49. doi: 10.1007/978-1-61779-089-8_3. Methods Mol Biol. 2011. PMID: 21431761 - Consistent errors in first strand cDNA due to random hexamer mispriming.
van Gurp TP, McIntyre LM, Verhoeven KJ. van Gurp TP, et al. PLoS One. 2013 Dec 30;8(12):e85583. doi: 10.1371/journal.pone.0085583. eCollection 2013. PLoS One. 2013. PMID: 24386481 Free PMC article. - Characterizing the mouse ES cell transcriptome with Illumina sequencing.
Rosenkranz R, Borodina T, Lehrach H, Himmelbauer H. Rosenkranz R, et al. Genomics. 2008 Oct;92(4):187-94. doi: 10.1016/j.ygeno.2008.05.011. Epub 2008 Aug 3. Genomics. 2008. PMID: 18602984 - Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.
Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Dohm JC, et al. Nucleic Acids Res. 2008 Sep;36(16):e105. doi: 10.1093/nar/gkn425. Epub 2008 Jul 26. Nucleic Acids Res. 2008. PMID: 18660515 Free PMC article. - De novo assembly of short sequence reads.
Paszkiewicz K, Studholme DJ. Paszkiewicz K, et al. Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by
- Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner.
Ji HJ, Pertea M. Ji HJ, et al. bioRxiv [Preprint]. 2024 Aug 17:2024.04.13.589356. doi: 10.1101/2024.04.13.589356. bioRxiv. 2024. PMID: 39185147 Free PMC article. Preprint. - Sequencing accuracy and systematic errors of nanopore direct RNA sequencing.
Liu-Wei W, van der Toorn W, Bohn P, Hölzer M, Smyth RP, von Kleist M. Liu-Wei W, et al. BMC Genomics. 2024 May 28;25(1):528. doi: 10.1186/s12864-024-10440-w. BMC Genomics. 2024. PMID: 38807060 Free PMC article. - BEERS2: RNA-Seq simulation through high fidelity in silico modeling.
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. Brooks TG, et al. Brief Bioinform. 2024 Mar 27;25(3):bbae164. doi: 10.1093/bib/bbae164. Brief Bioinform. 2024. PMID: 38605641 Free PMC article. - Identification of two major QTLs for pod shell thickness in peanut (Arachis hypogaea L.) using BSA-seq analysis.
Liu H, Zheng Z, Sun Z, Qi F, Wang J, Wang M, Dong W, Cui K, Zhao M, Wang X, Zhang M, Wu X, Wu Y, Luo D, Huang B, Zhang Z, Cao G, Zhang X. Liu H, et al. BMC Genomics. 2024 Jan 16;25(1):65. doi: 10.1186/s12864-024-10005-x. BMC Genomics. 2024. PMID: 38229017 Free PMC article. - Differentially expressed platelet activation-related genes in dogs with stage B2 myxomatous mitral valve disease.
Zhou Q, Cui X, Zhou H, Guo S, Wu Z, Li L, Zhang J, Feng W, Guo Y, Ma X, Chen Y, Qiu C, Xu M, Deng G. Zhou Q, et al. BMC Vet Res. 2023 Dec 13;19(1):271. doi: 10.1186/s12917-023-03789-9. BMC Vet Res. 2023. PMID: 38087280 Free PMC article.
References
- Mortazavi A, Williams B, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628. - PubMed
- Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–277. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources