Uneven Missing Data Skew Phylogenomic Relationships within the Lories and Lorikeets - PubMed (original) (raw)

Uneven Missing Data Skew Phylogenomic Relationships within the Lories and Lorikeets

Brian Tilston Smith et al. Genome Biol Evol. 2020.

Abstract

The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9× more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.

Keywords: bird; likelihood; museum DNA; museum specimen; parrot; phylogeny.

© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1

Fig. 1

Modern samples have more parsimony informative sites (PIS), less missing data at PIS, and less variation in number of samples among loci. Shown are histograms of the number of samples per locus in the Low Coverage (A) and Filtered (B) alignments. (C, D) Boxplots showing the number of parsimony informative sites (C) and number of missing characters at parsimony informative sites (D) in the ingroup samples. The data are partitioned into the modern versus historical samples, and Low Coverage versus Filtered alignments. In all plots, modern samples are shown in red and historical samples in blue.

<sc>Fig</sc>. 2

Fig. 2

Alternative topologies for the subclade that differs the most among filtering schemes. Shown is the subclade containing Trichoglossus/Eos/Psitteuteles iris/Glossopsitta from trees estimated without (A: Filtered Tree) and with low coverage characters (B: Low Coverage Tree). In the Low Coverage tree are clades composed of mostly historical versus modern samples. Bootstrap nodes are colored on a gradient from 100% (black) to <70% (gray). Taxon names are colored according to whether their DNA came from modern tissues (red) or historical specimens (blue).

<sc>Fig</sc>. 3

Fig. 3

Outlier sites have high missing data in historical samples. (A) Outlier site plot showing Δ sites-wise log-likelihoods (Δ s-lk) for topologies estimated with and without low coverage sites. The y axis is the Δ s-lk score and the x axis represents individual sites in the concatenated alignment, where K and M represent thousand and million, respectively. Points are colored according to the magnitude of the Δ site-wise log-likelihood scores according to a gradient reflecting the different likelihood thresholds (>2, >10, >20, <−2, <−10, and <−20). (_B_) Boxplot of historical (blue) and modern (red) samples showing the amount of missing data in the 3,084 outlier sites (Δ s-lk > 2) identified in plot A.

<sc>Fig</sc>. 4

Fig. 4

Likelihood plots showing Δ locus-wise log-likelihood (Δ l-lk) for topologies estimated with and without missing data for the Low Coverage data set. The y axis is the Δ l-lk and the x axis represents individual loci across the full alignment. Shown are the results for six subclades assessed within Loriini using the Low Coverage data set: (A) Parvipsitta and Psitteuteles, (B) Chalcopsitta and Pseudeos, (C) Neopsittacus, (D) Charmosyna, Vini, and Phigys, (E) Eos, Trichoglossus, Glossopsitta concinna, and Psitteuteles iris, and (F) Lorius. Points are colored according to the magnitude of the Δ l-lk scores according to a gradient ranging from >20 (blue) through <−10 (orange).

<sc>Fig</sc>. 5

Fig. 5

Multidimensional scaling of Robinson–Foulds distances among 100 bootstrap trees with differing levels of outlier sites or loci excluded. (A) Compares distances among Filtered and Low Coverage trees where outlier sites have been removed at different increments. Outlier sites were excluded in the Low Coverage alignment using Δ site-wise log-likelihood (Δ s-lk) thresholds of >20, >10, >2, <−2, <−10, and <−20. (_B_) The distances among trees produced from the subclade outlier analyses. Shown is a comparison of the Low Coverage and Filtered trees with topologies estimated with outlier loci excluded using Δ locus-wise log-likelihood (Δ l-lk) thresholds of >2 and <−2.

<sc>Fig</sc>. 6

Fig. 6

Maximum likelihood tree containing unique taxa in Loriini. The tree was inferred from a concatenated alignment where loci identified with the locus likelihood analysis with Δ locus-wise log-likelihood (Δ l-lk) values of >10 were excluded. On each node are shown rapid bootstrap values and the taxon names are colored according to whether their DNA came from modern tissues (red) or historical specimens (blue). Bootstrap nodes are colored on a gradient from 100% (black) to <70% (gray).

Similar articles

Cited by

References

    1. Amadon D. 1943. Birds collected during the Whitney South Sea Expedition. LII, Notes on some non-passerine genera, 3. Am Mus Novit. 1237:1–22.
    1. Andersen MJ, Fatdal L, Mauck WM III, Smith BT.. 2017. An ornithological survey of Vanuatu on the islands of Éfaté, Malakula, Gaua, and Vanua Lava. Check List 13(6):755–782.
    1. Andersen MJ, McCullough JM, Mauck WM III, Smith BT, Moyle RG.. 2018. A phylogeny of kingfishers reveals an Indomalayan origin and elevated rates of diversification on oceanic islands. J Biogeogr. 45(2):269–281.
    1. Andersen MJ, et al.2019. Ultraconserved elements resolve genus-level relationships in a major Australasian bird radiation (Aves: Meliphagidae). Emu Austral Ornithol. 119(3):218–232.
    1. Arcila D, et al.2017. Genome-wide interrogation advances resolution of recalcitrant groups in the tree of life. Nat Ecol Evol. 1:0020. - PubMed

MeSH terms

LinkOut - more resources