Computational challenges in the analysis of ancient DNA - PubMed (original) (raw)

Computational challenges in the analysis of ancient DNA

Kay Prüfer et al. Genome Biol. 2010.

Abstract

High-throughput sequencing technologies have opened up a new avenue for studying extinct organisms. Here we identify and quantify biases introduced by particular characteristics of ancient DNA samples. These analyses demonstrate the importance of closely related genomic sequence for correctly identifying and classifying bona fide endogenous DNA fragments. We show that more accurate genome divergence estimates from ancient DNA sequence can be attained using at least two outgroup genomes and appropriate filtering.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Number of aligned ancient DNA fragments and average sequence length. Properties of Mega BLAST alignments of ancient DNA sequences from a Neandertal fossil to genome sequences of increasing divergence. Left panel: number of reads with a best hit to the genome sequence and not to the GenBank nonredundant and environmental databases (yellow). Subset of reads with one unique best hit to the reference genome (light green). Subset of reads with one unique best hit to the reference genome that can be fully aligned with a positive alignment score (dark green). Right panel: Average length of best local alignments (yellow), average length of fragments with a unique best local alignment (red), average length of fragments with a positive score when fully aligned to reference genome (brown).

Figure 2

Figure 2

Differences per site in alignments of ancient DNA fragments. All nucleotide differences (top) and transversion differences (bottom) in different alignments to reference genomes of increasing divergence. Each read is required to have one uniquely best Mega BLAST alignment to the reference genome (estimate shown as the black line). The semiglobal alignment forces the full sequence to align to the genomic region identified by the local alignment (estimate shown as red line). These full alignments are further filtered for having a positive alignment score (blue line). The green crosses show the differences between human and the reference species in the ENCODE multiple sequence alignments. The divergence times on the x-axis are from [52] and [35], except for human for which we choose an arbitrary divergence time of 1 million years to Neandertal.

Figure 3

Figure 3

Schematic description of divergence triangulation. (a) A phylogenetic tree depicting the necessary topology for the application of the divergence triangulation method. (b) The ancient DNA sequences are used like an outgroup to the two genomic sequences in an unrooted tree. (c) Alignments between genomic sequences and ancient DNA fragments are used to assign changes to the lineages (numbers on the right-hand side). In this process, coinciding changes often caused by ancient DNA damage (shown in red in the alignments) can lead to misassignments of differences (in red in the summary of tables) (d) The assigned differences can be used to calculate a divergence relative to the divergence between the two genome sequences.

Figure 4

Figure 4

Divergence estimates by triangulation on simulated datasets. (a) 3DP divergence estimates in comparison to the expected values. Four bars are drawn for different filters: raw estimate without filtering on all unique alignments (brown); filtered alignments with verified human and chimpanzee genomic location using a whole genome alignment and a distance of at least 6 points between best and second best local alignments' bitscores (red); alignments of fragments with a size >35 bp (orange); and all filters applied (yellow). (b) Estimates are derived solely from transversion differences, otherwise identical to (a).

Similar articles

Cited by

References

    1. Krause J, Dear PH, Pollack JL, Slatkin M, Spriggs H, Barnes I, Lister AM, Ebersberger I, Pääbo S, Hofreiter M. Multiplex amplification of the mammoth mitochondrial genome and the evolution of Elephantidae. Nature. 2006;439:724–727. doi: 10.1038/nature04432. - DOI - PubMed
    1. Serre D, Langaney A, Chech M, Teschler-Nicola M, Paunovic M, Mennecier P, Hofreiter M, Possnert G, Pääbo S. No evidence of Neandertal mtDNA contribution to early modern humans. PLoS Biol. 2004;2:E57. doi: 10.1371/journal.pbio.0020057. - DOI - PMC - PubMed
    1. Cooper A, Lalueza-Fox C, Anderson S, Rambaut A, Austin J, Ward R. Complete mitochondrial genome sequences of two extinct moas clarify ratite evolution. Nature. 2001;409:704–707. doi: 10.1038/35055536. - DOI - PubMed
    1. Höss M, Dilling A, Currant A, Pääbo S. Molecular phylogeny of the extinct ground sloth Mylodon darwinii. Proc Natl Acad Sci USA. 1996;93:181–185. doi: 10.1073/pnas.93.1.181. - DOI - PMC - PubMed
    1. Krajewski C, Buckley L, Westerman M. DNA phylogeny of the marsupial wolf resolved. Proc Biol Sci. 1997;264:911–917. doi: 10.1098/rspb.1997.0126. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources