Phasing amplicon sequencing on Illumina Miseq for robust environmental microbial community analysis (original) (raw)

Microbial Community Composition and Diversity via 16S rRNA Gene Amplicons: Evaluating the Illumina Platform

PLOS ONE, 2015

As new sequencing technologies become cheaper and older ones disappear, laboratories switch vendors and platforms. Validating the new setups is a crucial part of conducting rigorous scientific research. Here we report on the reliability and biases of performing bacterial 16S rRNA gene amplicon paired-end sequencing on the MiSeq Illumina platform. We designed a protocol using 50 barcode pairs to run samples in parallel and coded a pipeline to process the data. Sequencing the same sediment sample in 248 replicates as well as 70 samples from alkaline soda lakes, we evaluated the performance of the method with regards to estimates of alpha and beta diversity. Using different purification and DNA quantification procedures we always found up to 5-fold differences in the yield of sequences between individually barcodes samples. Using either a one-step or a two-step PCR preparation resulted in significantly different estimates in both alpha and beta diversity. Comparing with a previous method based on 454 pyrosequencing, we found that our Illumina protocol performed in a similar mannerwith the exception for evenness estimates where correspondence between the methods was low. We further quantified the data loss at every processing step eventually accumulating to 50% of the raw reads. When evaluating different OTU clustering methods, we observed a stark contrast between the results of QIIME with default settings and the more recent UPARSE algorithm when it comes to the number of OTUs generated. Still, overall trends in alpha and beta diversity corresponded highly using both clustering methods. Our procedure performed well considering the precisions of alpha and beta diversity estimates, with insignificant effects of individual barcodes. Comparative analyses suggest that 454 and Illumina sequence data can be combined if the same PCR protocol and bioinformatic workflows are used for describing patterns in richness, beta-diversity and taxonomic composition.

Species Identification and Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences

PLoS ONE, 2013

The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (.90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.

Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform

Applied and Environmental Microbiology, 2013

Updated information and services can be found at: These include: SUPPLEMENTAL MATERIAL Supplemental material REFERENCES http://aem.asm.org/content/79/17/5112#ref-list-1 at: This article cites 28 articles, 10 of which can be accessed free CONTENT ALERTS more» articles cite this article), Receive: RSS Feeds, eTOCs, free email alerts (when new http://journals.asm.org/site/misc/reprints.xhtml Information about commercial reprint orders: http://journals.asm.org/site/subscriptions/ To subscribe to to another ASM Journal go to: on June 12, 2014 by guest http://aem.asm.org/ Downloaded from on June 12, 2014 by guest

Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys

PLoS ONE, 2014

The exploration of microbial communities by sequencing 16S rRNA genes has expanded with low-cost, high-throughput sequencing instruments. Illumina-based 16S rRNA gene sequencing has recently gained popularity over 454 pyrosequencing due to its lower costs, higher accuracy and greater throughput. Although recent reports suggest that Illumina and 454 pyrosequencing provide similar beta diversity measures, it remains to be demonstrated that pre-existing 454 pyrosequencing workflows can transfer directly from 454 to Illumina MiSeq sequencing by simply changing the sequencing adapters of the primers. In this study, we modified 454 pyrosequencing primers targeting the V4-V5 hypervariable regions of the 16S rRNA gene to be compatible with Illumina sequencers. Microbial communities from cows, humans, leeches, mice, sewage, and termites and a mock community were analyzed by 454 and MiSeq sequencing of the V4-V5 region and MiSeq sequencing of the V4 region. Our analysis revealed that reference-based OTU clustering alone introduced biases compared to de novo clustering, preventing certain taxa from being observed in some samples. Based on this we devised and recommend an analysis pipeline that includes read merging, contaminant filtering, and reference-based clustering followed by de novo OTU clustering, which produces diversity measures consistent with de novo OTU clustering analysis. Low levels of dataset contamination with Illumina sequencing were discovered that could affect analyses that require highly sensitive approaches. While moving to Illumina-based sequencing platforms promises to provide deeper insights into the breadth and function of microbial diversity, our results show that care must be taken to ensure that sequencing and processing artifacts do not obscure true microbial diversity.

Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities

Environmental Microbiology, 2013

Here we show that 16S rDNA fragments derived from Illumina-sequenced environmental metagenomes (mitags) are a powerful alternative to 16S rDNA amplicons for investigating the taxonomic diversity and structure of prokaryotic communities. As part of the Tara Oceans global expedition, marine plankton was sampled in three locations, resulting in 29 subsamples for which metagenomes were produced by shotgun Illumina sequencing (ca. 700 Gb). For comparative analyses, a subset of samples was also selected for Roche-454 sequencing using both shotgun (m454tags; 13 metagenomes, ca. 2.4 Gb) and 16S rDNA amplicon (454tags; ca. 0.075 Gb) approaches. Our results indicate that by overcoming PCR biases related to amplification and primer mismatch, mitags may provide more realistic estimates of community richness and evenness than amplicon 454tags. In addition, mitags can capture expected beta diversity patterns. Using mitags is now economically feasible given the dramatic reduction in highthroughput sequencing costs, having the advantage of retrieving simultaneously both taxonomic (Bacteria, Archaea and Eukarya) and functional information from the same microbial community.

The Bias Associated with Amplicon Sequencing Does Not Affect the Quantitative Assessment of Bacterial Community Dynamics

PLoS ONE, 2014

The performance of two sets of primers targeting variable regions of the 16S rRNA gene V1-V3 and V4 was compared in their ability to describe changes of bacterial diversity and temporal turnover in full-scale activated sludge. Duplicate sets of high-throughput amplicon sequencing data of the two 16S rRNA regions shared a collection of core taxa that were observed across a series of twelve monthly samples, although the relative abundance of each taxon was substantially different between regions. A case in point was the changes in the relative abundance of filamentous bacteria Thiothrix, which caused a large effect on diversity indices, but only in the V1-V3 data set. Yet the relative abundance of Thiothrix in the amplicon sequencing data from both regions correlated with the estimation of its abundance determined using fluorescence in situ hybridization. In nonmetric multidimensional analysis samples were distributed along the first ordination axis according to the sequenced region rather than according to sample identities. The dynamics of microbial communities indicated that V1-V3 and the V4 regions of the 16S rRNA gene yielded comparable patterns of: 1) the changes occurring within the communities along fixed time intervals, 2) the slow turnover of activated sludge communities and 3) the rate of species replacement calculated from the taxa-time relationships. The temperature was the only operational variable that showed significant correlation with the composition of bacterial communities over time for the sets of data obtained with both pairs of primers. In conclusion, we show that despite the bias introduced by amplicon sequencing, the variable regions V1-V3 and V4 can be confidently used for the quantitative assessment of bacterial community dynamics, and provide a proper qualitative account of general taxa in the community, especially when the data are obtained over a convenient time window rather than at a single time point.

To rarefy or not to rarefy: Enhancing diversity analysis of microbial communities through next-generation sequencing and rarefying repeatedly

2020

Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library ...

Advantages of meta-total RNA sequencing (MeTRS) over shotgun metagenomics and amplicon-based sequencing in the profiling of complex microbial communities

NPJ biofilms and microbiomes, 2018

Sequencing-based microbiome profiling aims at detecting and quantifying individual members of a microbial community in a culture-independent manner. While amplicon-based sequencing (ABS) of bacterial or fungal ribosomal DNA is the most widely used technology due to its low cost, it suffers from PCR amplification biases that hinder accurate representation of microbial population structures. Shotgun metagenomics (SMG) conversely allows unbiased microbiome profiling but requires high sequencing depth. Here we report the development of a meta-total RNA sequencing (MeTRS) method based on shotgun sequencing of total RNA and benchmark it on a human stool sample spiked in with known abundances of bacterial and fungal cells. MeTRS displayed the highest overall sensitivity and linearity for both bacteria and fungi, the greatest reproducibility compared to SMG and ABS, while requiring a ~20-fold lower sequencing depth than SMG. We therefore present MeTRS as a valuable alternative to existing t...

Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads

Applied and Environmental Microbiology, 2011

Microbial communities host unparalleled taxonomic diversity. Adequate characterization of environmental and host-associated samples remains a challenge for microbiologists, despite the advent of 16S rRNA gene sequencing. In order to increase the depth of sampling for diverse bacterial communities, we developed a method for sequencing and assembling millions of paired-end reads from the 16S rRNA gene (spanning the V3 region; ϳ200 nucleotides) by using an Illumina genome analyzer. To confirm reproducibility and to identify a suitable computational pipeline for data analysis, sequence libraries were prepared in duplicate for both a defined mixture of DNAs from known cultured bacterial isolates (>1 million postassembly sequences) and an Arctic tundra soil sample (>6 million postassembly sequences). The Illumina 16S rRNA gene libraries represent a substantial increase in number of sequences over all extant next-generation sequencing approaches (e.g., 454 pyrosequencing), while the assembly of paired-end 125-base reads offers a methodological advantage by incorporating an initial quality control step for each 16S rRNA gene sequence. This method incorporates indexed primers to enable the characterization of multiple microbial communities in a single flow cell lane, may be modified readily to target other variable regions or genes, and demonstrates unprecedented and economical access to DNAs from organisms that exist at low relative abundances.