UCHIME improves sensitivity and speed of chimera detection - PubMed (original) (raw)

UCHIME improves sensitivity and speed of chimera detection

Robert C Edgar et al. Bioinformatics. 2011.

Abstract

Motivation: Chimeric DNA sequences often form during polymerase chain reaction amplification, especially when sequencing single regions (e.g. 16S rRNA or fungal Internal Transcribed Spacer) to assess diversity or compare populations. Undetected chimeras may be misinterpreted as novel species, causing inflated estimates of diversity and spurious inferences of differences between populations. Detection and removal of chimeras is therefore of critical importance in such experiments.

Results: We describe UCHIME, a new program that detects chimeric sequences with two or more segments. UCHIME either uses a database of chimera-free sequences or detects chimeras de novo by exploiting abundance data. UCHIME has better sensitivity than ChimeraSlayer (previously the most sensitive database method), especially with short, noisy sequences. In testing on artificial bacterial communities with known composition, UCHIME de novo sensitivity is shown to be comparable to Perseus. UCHIME is >100× faster than Perseus and >1000× faster than ChimeraSlayer.

Contact: robert@drive5.com

Availability: Source, binaries and data: http://drive5.com/uchime.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

UCHIME schematic. The query sequence is divided into four chunks, each of which is used to search the reference database. The best few hits to each chunk are saved, and the closest two sequences are found by calculating smoothed identity with the query. A three-way chimeric alignment is constructed, and a chimera is reported if its score [Equation (2)] exceeds a preset threshold.

Fig. 2.

Fig. 2.

Chimeric alignments. We identify three types of chimeric alignment between a query sequence Q and two candidate parents A and B: local, local-X and global-X. A chimeric alignment has two non-overlapping segments of Q, one of which is closer to A than to B by some measure of evolutionary distance while the other is closer to B than to A. In a local chimeric alignment, these two segments can be non-contiguous and may only cover a part of Q. In a local-X alignment, the segments are contiguous with an intervening crossover segment (X) which is identical in Q, A and B. A global-X alignment is a special case of a local-X alignment that covers all of Q, but not necessarily all of A or B.

Fig. 3.

Fig. 3.

Chimeric alignment showing diffs and votes. This figure shows a region from an alignment generated by UCHIME. Diffs and votes are annotated. The ‘Model’ row indicates the three segments of the alignment which are closer to A, the crossover (X) and closer to B, respectively. Diffs are ‘A’ = diff with Q closer to A in the A segment, ‘a’ = diff with Q closer to A in the B segment, and similarly for ‘B’ and ‘b’. A ‘p’ diff indicates that the parents agree but are different from Q. Votes are ‘+’ (yes), ‘!’ (no) and ‘0’ (abstain), indicating whether the corresponding diff supports or contradicts the model.

Fig. 4.

Fig. 4.

Performance of UCHIME and ChimeraSlayer on length 200 tests in SIM2. These results show that UCHIME has higher sensitivity than ChimeraSlayer on all length 200 sets, with increasing improvement at higher mutation rates, especially when substitutions are present. The UCHIME error rate is <1% on all sets.

Fig. 5.

Fig. 5.

Sensitivity on the SIMM set with 1% substitutions. UCHIME has higher sensitivity than ChimeraSlayer on all subsets, especially to chimeras with small divergence and larger numbers of segments. In the 3×3 grid shown in the figure, columns indicate the number of segments (m) in an _m_-mera and rows correspond to divergence ranges.

Similar articles

Cited by

References

    1. Acinas S.G., et al. PCR-induced sequence artifacts and bias: insights from comparison of two 16S rRNA clone libraries constructed from the same sample. Appl. Environ. Microbiol. 2005;71:8966–8969. - PMC - PubMed
    1. Altschul S.F. Trees, stars and multiple biological sequence alignment. SIAM J. Appl. Math. 1989;49:197–209.
    1. Altschul S.F., et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Ashelford K.E., et al. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl. Environ. Microbiol. 2005;71:7724–7736. - PMC - PubMed
    1. Ashelford K.E., et al. New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras. Appl. Environ. Microbiol. 2006;72:5734–5741. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources