A likelihood ratio test for species membership based on DNA sequence data - PubMed (original) (raw)

Comparative Study

A likelihood ratio test for species membership based on DNA sequence data

Mikhail V Matz et al. Philos Trans R Soc Lond B Biol Sci. 2005.

Abstract

DNA barcoding as an approach for species identification is rapidly increasing in popularity. However, it remains unclear which statistical procedures should accompany the technique to provide a measure of uncertainty. Here we describe a likelihood ratio test which can be used to test if a sampled sequence is a member of an a priori specified species. We investigate the performance of the test using coalescence simulations, as well as using the real data from butterflies and frogs representing two kinds of challenge for DNA barcoding: extremely low and extremely high levels of sequence variability.

PubMed Disclaimer

Figures

Figure 1

Figure 1

(a) Frequencies of likelihood ratios in the test applied to sequence data simulated using different values of θ per locus, for different number of database sequences. Horizontal axis, value of likelihood ratio test statistic; vertical axis, number of replicates out of 100. (b) Summary of the results from panel A.

Figure 2

Figure 2

Consensus maximum parsimony trees for cox1 sequences from the two real data sets: (a) skipper butterfly Astraptes fulgerator species complex, and (b) four species of the tree frogs of the genus Litoria. Scale bars: 10 nucleotide changes. The number of individual sequences per species is indicates near the species names. (c–f): frequency distributions of the likelihood ratio test statistic in simulations with these datasets. The number of sequences used to represent a true or sister species in the test was either 3 (c, d) or 10 (e, f). Filled bars, test with correct species to assess type I error rate; open bars, test with sister species to assess type II error rate.

Figure 3

Figure 3

Summary of error rates obtained for Astraptes (a) and Litoria (b) datasets with different number of sequences per species in the database, assuming the critical value of 2.7. Open bars, type I error rate; filled bars, type II error rate.

References

    1. Altschul S.F, Madden T.L, Schaffer A.A, Zhang J.H, Zhang Z, Miller W, Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 1997;25:3389–3402. <10.1093/nar/25.17.3389> - DOI - PMC - PubMed
    1. Armstrong K.F, Ball S.L. DNA barcodes for biosecurity: invasive species identification. Phil. Trans. R. Soc. B. 2005;360:1813–1823. <10.1098/rstb.2005.1713> - DOI - PMC - PubMed
    1. Chase M.W, Salamin N, Wilkinson M, Dunwell J.M, Kesanakurthi R.P, Haidar N, Savolainen V. Land plants and DNA barcodes: short-term and long-term goals. Phil. Trans. R. Soc. B. 2005;360:1889–1895. <10.1098/rstb.2005.1720> - DOI - PMC - PubMed
    1. Cornuet J.M, Aulagnier S, Lek S, Franck P, Solignac M. Classifying individuals among infra-specific taxa using microsatellite data and neural networks. C. R. Acad. Sci. Paris. Life Sci. 1996;319:1167–1177. - PubMed
    1. Cornuet J.M, Piry S, Luikart G, Estoup A, Solignac M. New methods employing multi-locus genotypes to select or exclude populations as origins of individuals. Genetics. 1999;153:1989–2000. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources