Swarm v2: highly-scalable and high-resolution amplicon clustering - PubMed (original) (raw)

Swarm v2: highly-scalable and high-resolution amplicon clustering

Frédéric Mahé et al. PeerJ. 2015.

Abstract

Previously we presented Swarm v1, a novel and open source amplicon clustering program that produced fine-scale molecular operational taxonomic units (OTUs), free of arbitrary global clustering thresholds and input-order dependency. Swarm v1 worked with an initial phase that used iterative single-linkage with a local clustering threshold (d), followed by a phase that used the internal abundance structures of clusters to break chained OTUs. Here we present Swarm v2, which has two important novel features: (1) a new algorithm for d = 1 that allows the computation time of the program to scale linearly with increasing amounts of data; and (2) the new fastidious option that reduces under-grouping by grafting low abundant OTUs (e.g., singletons and doubletons) onto larger ones. Swarm v2 also directly integrates the clustering and breaking phases, dereplicates sequencing reads with d = 0, outputs OTU representatives in fasta format, and plots individual OTUs as two-dimensional networks.

Keywords: Barcoding; Environmental diversity; Molecular operational taxonomic units.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1

Figure 1. Schematic view of Swarm’s clustering and refinement approach.

(A) Swarm clusters amplicons iteratively by using a small user-chosen local threshold, d, allowing OTUs to grow to their natural limits, where no other amplicons can be added. (B) Swarm takes into account the abundance of each amplicon to produce higher resolution clusters, by not allowing the formation of amplicon chains. The darker the red, the higher the abundance. (C) The fastidious option avoids under-grouping (e.g., the production of small OTUs such as singletons and doubletons) by postulating the existence of virtual linking amplicons to graft smaller OTUs onto larger ones.

Figure 2

Figure 2. Graphical representation of an OTU produced by Swarm (breaking and grafting phases deactivated) when clustering the BioMarKs 18S rRNA V9 dataset (amplicons are appr. 129 bp in length).

Nodes represent amplicons. Node size, color and text annotations represent the abundance of each amplicon. Edges represent one difference (substitution, deletion or insertion); the length of the edges carries no information. The red-colored edge indicates where Swarm’s breaking phase cuts when it is not deactivated, resulting into two high abundant OTUs, each being assigned to a different genus of Collodaria (Radiolaria).

Figure 3

Figure 3. Graphical representation of an OTU produced by Swarm (breaking and grafting phases deactivated) when clustering the BioMarKs 18S rRNA V4 dataset (amplicons are appr. 380 bp in length).

Nodes represent amplicons. Node size, color and text annotations represent the abundance of each amplicon. Edges represent one difference (substitution, deletion or insertion); the length of the edges carries no information. The red-colored edges indicate where Swarm’s breaking phase cuts when it is not deactivated, resulting into three high abundant OTUs, each being assigned to a different taxa of Cnidaria (Metazoa).

Similar articles

Cited by

References

    1. Behnke A, Engel M, Christen R, Nebel M, Klein RR, Stoeck T. Depicting more accurate pictures of protistan community complexity using pyrosequencing of hypervariable SSU rRNA gene regions. Environmental Microbiology. 2011;13(2):340–349. doi: 10.1111/j.1462-2920.2010.02332.x. - DOI - PubMed
    1. Brown EA, Chain FJJ, Crease TJ, MacIsaac HJ, Cristescu ME. Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities? Ecology and Evolution. 2015;5(11):2234–2251. doi: 10.1002/ece3.1485. - DOI - PMC - PubMed
    1. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Sevinsky JR, Turnbaugh PJ, Walters WA, Widmann J, Yatsunenko T, Zaneveld J, Knight R. QIIME allows analysis of high-throughput community sequencing data. Nature Methods. 2010;7(5):335–336. doi: 10.1038/nmeth.f.303. - DOI - PMC - PubMed
    1. Caron DA, Countway PD, Savai P, Gast RJ, Schnetzer A, Moorthi SD, Dennett MR, Moran DM, Jones AC. Defining DNA-based operational taxonomic units for microbial-eukaryote ecology. Applied and Environmental Microbiology. 2009;75(18):5797–5808. doi: 10.1128/AEM.00298-09. - DOI - PMC - PubMed
    1. De Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le Bescot N, Probert I, Carmichael M, Poulain J, Romac S, Colin S, Aury J-M, Bittner L, Chaffron S, Dunthorn M, Engelen S, Flegontova O, Guidi L, Horák A, Jaillon O, Lukeš J, Malviya S, Morard R, Mulot M, Scalco E, Siano R, Vincent F, Zingone A, Dimier C, Picheral M, Searson S, Kandels-Lewis S, Tara Oceans Coordinators. Acinas SG, Bork P, Bowler C, Gorsky G, Grimsley N, Hingamp P, Iudicone D, Not F, Ogata H, Pesant S, Raes J, Sieracki ME, Speich S, Stemman L, Sunagawa S, Weissenbach J, Wincker P, Karsenti E. Eukaryotic plankton diversity in the sunlit global ocean. Science. 2015;348(6237):1261605. doi: 10.1126/science.1261605. - DOI - PubMed

Grants and funding

FM and MD were supported by the Deutsche Forschungsgemeinschaft (grant #DU1319/1-1). CQ is funded by an EPSRC Career Acceleration Fellowship—EP/H003851/1. CdeV were supported by the EU EraNet BiodivErsA program BioMarKs (grant #2008-6530) and the French government “Investissements d’Avenir” project OCEANOMICS (ANR-11-BTBR-0008) and the EU FP7 program MicroB3 (contract number 287589). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources