Norman Warthmann | The Australian National University (original) (raw)
Papers by Norman Warthmann
Springer eBooks, Dec 31, 2022
Exome Capture is a molecular biology technique that, in combination with Next Generation DNA sequ... more Exome Capture is a molecular biology technique that, in combination with Next Generation DNA sequencing technologies (NGS), allows for selectively sequencing the predicted genes of an organism. Such capture sequencing provides a compromise between genome coverage and sequencing cost. The capture reaction is an additional step in an otherwise standard sequencing protocol and exome capture effectively enriches the sequencing library for DNA molecules that overlap with predicted genes (the exome). This enables genome-wide assessments while focusing on the gene space. Capture sequencing is particularly attractive in species with large genomes, where whole genome sequencing in larger numbers of samples would be cost-prohibitive at present prices. Plant Breeding and Genetics Laboratory (PBGL) developed an Exome Capture Kit for Coffea arabica in collaboration with Daicel Arbor Biosciences (Ann Arbor, MI, USA). Use of the kit achieves eightfold enrichment, and hence approx. eightfold reduction in sequencing cost for a whole genome assessment of Coffee arabica plants. The kit is available as a regular product from Daicel Arbor Biosciences and this protocol describes the kit and gives detailed instructions on how to perform the capture reaction.
bioRxiv (Cold Spring Harbor Laboratory), Jun 13, 2016
Most studies of aquatic plankton focus on either macroscopic or microbial communities, and on eit... more Most studies of aquatic plankton focus on either macroscopic or microbial communities, and on either eukaryotes or prokaryotes. This separation is primarily for methodological reasons, but can overlook potential interactions among groups. We tested whether DNA-metabarcoding of unfractionated water samples with universal primers could be used to qualitatively and quantitatively study the temporal dynamics of the total plankton community in a shallow temperate lake. We found significant changes in the relative proportions of normalized sequence reads of eukaryotic and prokaryotic plankton communities over a three-month period in spring. Patterns followed the same trend as plankton estimates using traditional microscopic methods. We characterized the bloom of a conditionally rare bacterial taxon belonging to Arcicella, which rapidly came to dominate the whole lake ecosystem and would have remained unnoticed without metabarcoding. Our data demonstrate the potential of universal DNA-metabarcoding applied to unfractionated samples for providing a more holistic view of plankton communities.
Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritan... more Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results: With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion: Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.
Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritan... more Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results: With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion: Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.
Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritan... more Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results: With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion: Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.
F1000Research, Nov 2, 2016
The development of model systems requires a detailed assessment of standing genetic variation acr... more The development of model systems requires a detailed assessment of standing genetic variation across natural populations. The Brachypodium species complex has been promoted as a plant model for grass genomics with translational to small grain and biomass crops. To capture the genetic diversity within this species complex, thousands of Brachypodium accessions from around the globe were collected and sequenced using genotyping by sequencing (GBS). Overall, 1,897 samples were classified into two diploid or allopolyploid species and then further grouped into distinct inbred genotypes. A core set of diverse B. distachyon diploid lines were selected for whole genome sequencing and high resolution phenotyping. Genome-wide association studies across simulated seasonal environments was used to identify candidate genes and pathways tied to key life history and agronomic traits under current and future climatic conditions. A total of 8, 22 and 47 QTLs were identified for flowering time, early vigour and energy traits, respectively. Overall, the results highlight the genomic structure of the Brachypodium species complex and allow powerful complex trait dissection within this new grass model species. .
Humana Press eBooks, Oct 11, 2012
Artificial microRNAs (amiRNAs) have been shown to facilitate efficient gene silencing with high s... more Artificial microRNAs (amiRNAs) have been shown to facilitate efficient gene silencing with high specificity to the intended target gene(s). For the plant breeder, gene silencing by artificial miRNAs will certainly accelerate gene discovery, because it allows targeting of all genes in a mapping interval, independent of the genetic background. In addition, beneficial knockout phenotypes can easily be transferred between varieties and across incompatibility barriers. This chapter describes the generation and application of amiRNAs as a gene silencing tool in rice.
Proceedings of the National Academy of Sciences of the United States of America, Jun 6, 2011
New Phytologist, Jan 17, 2018
Mutants without root hairs show reduced inorganic orthophosphate (Pi) uptake and compromised grow... more Mutants without root hairs show reduced inorganic orthophosphate (Pi) uptake and compromised growth on soils when Pi availability is restricted. What is less clear is whether root hairs that are longer than wild-type provide an additional benefit to phosphorus (P) nutrition. This was tested using transgenic Brachypodium lines with longer root hairs. The lines were transformed with the endogenous BdRSL2 and BdRSL3 genes using either a constitutive promoter or a root hair-specific promoter. Plants were grown for 32 d in soil amended with various Pi concentrations. Plant biomass and P uptake were measured and genotypes were compared on the basis of critical Pi values and P uptake per unit root length. Ectopic expression of RSL2 and RSL3 increased root hair length threefold but decreased plant biomass. Constitutive expression of BdRSL2, but not expression of BdRSL3, consistently improved P nutrition as measured by lowering the critical Pi values and increasing Pi uptake per unit root length. Increasing root hair length through breeding or biotechnology can improve P uptake efficiency if the pleotropic effects on plant biomass are avoided. Long root hairs, alone, appear to be insufficient to improve Pi uptake and need to be combined with other traits to benefit P nutrition.
Food Security, Mar 14, 2015
Land use management is a central challenge for the 21st century with unprecedented and competing ... more Land use management is a central challenge for the 21st century with unprecedented and competing demands to produce food, feed/fodder, fibre, fuel, and essential ecosystem services which sustain life. Global change requires rapid adaptation in current and emerging crops as well as in the foundation species of natural ecosystems. Revolutions in genomics and high throughput experimentation are transforming breeding so that adaptive traits in new environments can be predicted and selected more directly from germplasm collections of crops and wild species. This genomic breeding is now feasible in almost any species and has promise to help meet the need to feed and nourish over 9 billion people by 2050. Genomic techniques can accelerate our response to food security challenges of yield, quality and resilience and also address environmental security challenges. To achieve its potential there will need to be widespread and ongoing investments in the human capital to promote genomic breeding.
Science, 2010
To take complete advantage of information on within-species polymorphism and divergence from clos... more To take complete advantage of information on within-species polymorphism and divergence from close relatives, one needs to know the rate and the molecular spectrum of spontaneous mutations. To this end, we have searched for de novo spontaneous mutations in the complete nuclear genomes of five Arabidopsis thaliana mutation accumulation lines that had been maintained by single-seed descent for 30 generations. We identified and validated 99 base substitutions and 17 small and large insertions and deletions. Our results imply a spontaneous mutation rate of 7 × 10 −9 base substitutions per site per generation, the majority of which are G:C→A:T transitions. We explain this very biased spectrum of base substitution mutations as a result of two main processes: deamination of methylated cytosines and ultraviolet light-induced mutagenesis. Most of what we know about molecular evolution comes from the comparison of biological sequences that have survived many cycles of natural selection. In order to infer the properties of the original source of variation and to detect the signature of natural selection from such data sets, we need to assume that variants affecting certain types of sites, such as the last base of fourfold redundant codons or pseudogenes, are not subject to natural selection. This pervasive assumption is very rarely tested and difficult to avoid, because of the slow pace of spontaneous mutagenesis. However, with the advent of high-throughput sequencing technologies, some estimates of the rate of spontaneous mutations have begun to appear (1-3). Here, we report a direct estimate of the spontaneous base substitution rate in Arabidopsis thaliana, a plant species with extensive DNA methylation. As a result, we reduce the uncertainty associated with key aspects of the evolutionary history of this species,
Modern genomics techniques generate overwhelming quantities of data. Extracting population geneti... more Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals or samples in an unbiased manner, preferably de novo. The rapid and unbiased estimation of genetic relatedness has the potential to overcome reference genome bias, to detect mix-ups early, and to verify that biological replicates belong to the same genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include detecting sample identity and mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.
<p>(A) <i>k</i>-mers are counted into sketches (using khmer [<a href="h... more <p>(A) <i>k</i>-mers are counted into sketches (using khmer [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref028" target="_blank">28</a>]). Columns represent the “bins” in each sketch. The frequencies of non-zero counts across a set of sketches is computed, forming the population frequency sketch (denoted <i>F</i>). We calculate Shannon entropy of this frequency sketch as the weight vector for the WIP metric (denoted <i>H</i>, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.e009" target="_blank">Eq 2</a>). (B) Illustration of Shannon Entropy as used in kWIP: the relationship between the population frequency (<i>F</i>) and the weight (<i>H</i>).</p
<p>We used kWIP to examine 16S rDNA amplicon sequencing data of Edwards, <i>et al.<... more <p>We used kWIP to examine 16S rDNA amplicon sequencing data of Edwards, <i>et al.</i> [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref035" target="_blank">35</a>] and compare our kWIP result (“kWIP”) with the results as presented by Edwards, <i>et al.</i> (“Weighted UniFrac” and “UniFrac”). We find that kWIP replicates their observations of stratification of root-associated microbiomes by rhizo-compartment (PC1) and experiment site (PC2).</p
SummaryMost studies of aquatic plankton focus on either macroscopic or microbial communities, and... more SummaryMost studies of aquatic plankton focus on either macroscopic or microbial communities, and on either eukaryotes or prokaryotes. This separation is primarily for methodological reasons, but can overlook potential interactions among groups. We tested whether DNA-metabarcoding of unfractionated water samples with universal primers could be used to qualitatively and quantitatively study the temporal dynamics of the total plankton community in a shallow temperate lake. We found significant changes in the relative proportions of normalized sequence reads of eukaryotic and prokaryotic plankton communities over a three-month period in spring. Patterns followed the same trend as plankton estimates using traditional microscopic methods. We characterized the bloom of a conditionally rare bacterial taxon belonging toArcicella, which rapidly came to dominate the whole lake ecosystem and would have remained unnoticed without metabarcoding. Our data demonstrate the potential of universal DN...
PLOS ONE, 2023
Massively parallel, second-generation short-read DNA sequencing has become an integral tool in bi... more Massively parallel, second-generation short-read DNA sequencing has become an integral tool in biology for genomic studies. Offering highly accurate base-pair resolution at the most competitive price, the technology has become widespread. However, high-throughput generation of multiplexed DNA libraries can be costly and cumbersome. Here, we present a cost-conscious protocol for generating multiplexed short-read DNA libraries using a beadlinked transposome from Illumina. We prepare libraries in high-throughput with small reaction volumes that use 1/50 th the amount of transposome compared to Illumina DNA Prep tagmentation protocols. By reducing transposome usage and optimising the protocol to circumvent magnetic bead-based clean-ups between steps, we reduce costs, labour time and DNA input requirements. Developing our own dual index primers further reduced costs and enables up to nine 96-well microplate combinations. This facilitates efficient usage of largescale sequencing platforms, such as the Illumina NovaSeq 6000, which offers up to three terabases of sequencing per S4 flow cell. The protocol presented substantially reduces the cost per library by approximately 1/20 th compared to conventional Illumina methods.
protocols.io, 2018
Illumina® short-read DNA sequencing has become an integral tool in biology for genome-wide studie... more Illumina® short-read DNA sequencing has become an integral tool in biology for genome-wide studies. Offering accurate base-pair resolution at the most competitive price, the technology has become widespread. However, the generation of multiplexed DNA libraries remains costly and cumbersome. Here, we present a streamlined cost-conscious protocol for generating multiplexed short read DNA libraries using a transposase from Illumina®. By implementing small volumes that use 1/25th the amount of transposase compared to Illumina® NexteraTM protocols, the cost of library preparation can be significantly reduced, by 1/10th or more. Furthermore, we optimised the protocol to minimise carboxylate bead-based cleanups between steps, further reducing cost, time and DNA input. By developing our own indicies to multiplex nine 96-well plates, up to 864 samples can be placed on a single flow cell. This enables efficient usage of monolithic sequencing platforms that can offer over three terabases of se...
Springer eBooks, Dec 31, 2022
Exome Capture is a molecular biology technique that, in combination with Next Generation DNA sequ... more Exome Capture is a molecular biology technique that, in combination with Next Generation DNA sequencing technologies (NGS), allows for selectively sequencing the predicted genes of an organism. Such capture sequencing provides a compromise between genome coverage and sequencing cost. The capture reaction is an additional step in an otherwise standard sequencing protocol and exome capture effectively enriches the sequencing library for DNA molecules that overlap with predicted genes (the exome). This enables genome-wide assessments while focusing on the gene space. Capture sequencing is particularly attractive in species with large genomes, where whole genome sequencing in larger numbers of samples would be cost-prohibitive at present prices. Plant Breeding and Genetics Laboratory (PBGL) developed an Exome Capture Kit for Coffea arabica in collaboration with Daicel Arbor Biosciences (Ann Arbor, MI, USA). Use of the kit achieves eightfold enrichment, and hence approx. eightfold reduction in sequencing cost for a whole genome assessment of Coffee arabica plants. The kit is available as a regular product from Daicel Arbor Biosciences and this protocol describes the kit and gives detailed instructions on how to perform the capture reaction.
bioRxiv (Cold Spring Harbor Laboratory), Jun 13, 2016
Most studies of aquatic plankton focus on either macroscopic or microbial communities, and on eit... more Most studies of aquatic plankton focus on either macroscopic or microbial communities, and on either eukaryotes or prokaryotes. This separation is primarily for methodological reasons, but can overlook potential interactions among groups. We tested whether DNA-metabarcoding of unfractionated water samples with universal primers could be used to qualitatively and quantitatively study the temporal dynamics of the total plankton community in a shallow temperate lake. We found significant changes in the relative proportions of normalized sequence reads of eukaryotic and prokaryotic plankton communities over a three-month period in spring. Patterns followed the same trend as plankton estimates using traditional microscopic methods. We characterized the bloom of a conditionally rare bacterial taxon belonging to Arcicella, which rapidly came to dominate the whole lake ecosystem and would have remained unnoticed without metabarcoding. Our data demonstrate the potential of universal DNA-metabarcoding applied to unfractionated samples for providing a more holistic view of plankton communities.
Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritan... more Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results: With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion: Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.
Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritan... more Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results: With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion: Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.
Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritan... more Background: The guppy, Poecilia reticulata, is a well-known model organism for studying inheritance and variation of male ornamental traits as well as adaptation to different river habitats. However, genomic resources for studying this important model were not previously widely available. Results: With the aim of generating molecular markers for genetic mapping of the guppy, cDNA libraries were constructed from embryos and different adult organs to generate expressed sequence tags (ESTs). About 18,000 ESTs were annotated according to BLASTN and BLASTX results and the sequence information from the 3' UTRs was exploited to generate PCR primers for re-sequencing of genomic DNA from different wild type strains. By comparison of EST-linked genomic sequences from at least four different ecotypes, about 1,700 polymorphisms were identified, representing about 400 distinct genes. Two interconnected MySQL databases were built to organize the ESTs and markers, respectively. A robust phylogeny of the guppy was reconstructed, based on 10 different nuclear genes. Conclusion: Our EST and marker databases provide useful tools for genetic mapping and phylogenetic studies of the guppy.
F1000Research, Nov 2, 2016
The development of model systems requires a detailed assessment of standing genetic variation acr... more The development of model systems requires a detailed assessment of standing genetic variation across natural populations. The Brachypodium species complex has been promoted as a plant model for grass genomics with translational to small grain and biomass crops. To capture the genetic diversity within this species complex, thousands of Brachypodium accessions from around the globe were collected and sequenced using genotyping by sequencing (GBS). Overall, 1,897 samples were classified into two diploid or allopolyploid species and then further grouped into distinct inbred genotypes. A core set of diverse B. distachyon diploid lines were selected for whole genome sequencing and high resolution phenotyping. Genome-wide association studies across simulated seasonal environments was used to identify candidate genes and pathways tied to key life history and agronomic traits under current and future climatic conditions. A total of 8, 22 and 47 QTLs were identified for flowering time, early vigour and energy traits, respectively. Overall, the results highlight the genomic structure of the Brachypodium species complex and allow powerful complex trait dissection within this new grass model species. .
Humana Press eBooks, Oct 11, 2012
Artificial microRNAs (amiRNAs) have been shown to facilitate efficient gene silencing with high s... more Artificial microRNAs (amiRNAs) have been shown to facilitate efficient gene silencing with high specificity to the intended target gene(s). For the plant breeder, gene silencing by artificial miRNAs will certainly accelerate gene discovery, because it allows targeting of all genes in a mapping interval, independent of the genetic background. In addition, beneficial knockout phenotypes can easily be transferred between varieties and across incompatibility barriers. This chapter describes the generation and application of amiRNAs as a gene silencing tool in rice.
Proceedings of the National Academy of Sciences of the United States of America, Jun 6, 2011
New Phytologist, Jan 17, 2018
Mutants without root hairs show reduced inorganic orthophosphate (Pi) uptake and compromised grow... more Mutants without root hairs show reduced inorganic orthophosphate (Pi) uptake and compromised growth on soils when Pi availability is restricted. What is less clear is whether root hairs that are longer than wild-type provide an additional benefit to phosphorus (P) nutrition. This was tested using transgenic Brachypodium lines with longer root hairs. The lines were transformed with the endogenous BdRSL2 and BdRSL3 genes using either a constitutive promoter or a root hair-specific promoter. Plants were grown for 32 d in soil amended with various Pi concentrations. Plant biomass and P uptake were measured and genotypes were compared on the basis of critical Pi values and P uptake per unit root length. Ectopic expression of RSL2 and RSL3 increased root hair length threefold but decreased plant biomass. Constitutive expression of BdRSL2, but not expression of BdRSL3, consistently improved P nutrition as measured by lowering the critical Pi values and increasing Pi uptake per unit root length. Increasing root hair length through breeding or biotechnology can improve P uptake efficiency if the pleotropic effects on plant biomass are avoided. Long root hairs, alone, appear to be insufficient to improve Pi uptake and need to be combined with other traits to benefit P nutrition.
Food Security, Mar 14, 2015
Land use management is a central challenge for the 21st century with unprecedented and competing ... more Land use management is a central challenge for the 21st century with unprecedented and competing demands to produce food, feed/fodder, fibre, fuel, and essential ecosystem services which sustain life. Global change requires rapid adaptation in current and emerging crops as well as in the foundation species of natural ecosystems. Revolutions in genomics and high throughput experimentation are transforming breeding so that adaptive traits in new environments can be predicted and selected more directly from germplasm collections of crops and wild species. This genomic breeding is now feasible in almost any species and has promise to help meet the need to feed and nourish over 9 billion people by 2050. Genomic techniques can accelerate our response to food security challenges of yield, quality and resilience and also address environmental security challenges. To achieve its potential there will need to be widespread and ongoing investments in the human capital to promote genomic breeding.
Science, 2010
To take complete advantage of information on within-species polymorphism and divergence from clos... more To take complete advantage of information on within-species polymorphism and divergence from close relatives, one needs to know the rate and the molecular spectrum of spontaneous mutations. To this end, we have searched for de novo spontaneous mutations in the complete nuclear genomes of five Arabidopsis thaliana mutation accumulation lines that had been maintained by single-seed descent for 30 generations. We identified and validated 99 base substitutions and 17 small and large insertions and deletions. Our results imply a spontaneous mutation rate of 7 × 10 −9 base substitutions per site per generation, the majority of which are G:C→A:T transitions. We explain this very biased spectrum of base substitution mutations as a result of two main processes: deamination of methylated cytosines and ultraviolet light-induced mutagenesis. Most of what we know about molecular evolution comes from the comparison of biological sequences that have survived many cycles of natural selection. In order to infer the properties of the original source of variation and to detect the signature of natural selection from such data sets, we need to assume that variants affecting certain types of sites, such as the last base of fourfold redundant codons or pseudogenes, are not subject to natural selection. This pervasive assumption is very rarely tested and difficult to avoid, because of the slow pace of spontaneous mutagenesis. However, with the advent of high-throughput sequencing technologies, some estimates of the rate of spontaneous mutations have begun to appear (1-3). Here, we report a direct estimate of the spontaneous base substitution rate in Arabidopsis thaliana, a plant species with extensive DNA methylation. As a result, we reduce the uncertainty associated with key aspects of the evolutionary history of this species,
Modern genomics techniques generate overwhelming quantities of data. Extracting population geneti... more Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals or samples in an unbiased manner, preferably de novo. The rapid and unbiased estimation of genetic relatedness has the potential to overcome reference genome bias, to detect mix-ups early, and to verify that biological replicates belong to the same genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include detecting sample identity and mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.
<p>(A) <i>k</i>-mers are counted into sketches (using khmer [<a href="h... more <p>(A) <i>k</i>-mers are counted into sketches (using khmer [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref028" target="_blank">28</a>]). Columns represent the “bins” in each sketch. The frequencies of non-zero counts across a set of sketches is computed, forming the population frequency sketch (denoted <i>F</i>). We calculate Shannon entropy of this frequency sketch as the weight vector for the WIP metric (denoted <i>H</i>, see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.e009" target="_blank">Eq 2</a>). (B) Illustration of Shannon Entropy as used in kWIP: the relationship between the population frequency (<i>F</i>) and the weight (<i>H</i>).</p
<p>We used kWIP to examine 16S rDNA amplicon sequencing data of Edwards, <i>et al.<... more <p>We used kWIP to examine 16S rDNA amplicon sequencing data of Edwards, <i>et al.</i> [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005727#pcbi.1005727.ref035" target="_blank">35</a>] and compare our kWIP result (“kWIP”) with the results as presented by Edwards, <i>et al.</i> (“Weighted UniFrac” and “UniFrac”). We find that kWIP replicates their observations of stratification of root-associated microbiomes by rhizo-compartment (PC1) and experiment site (PC2).</p
SummaryMost studies of aquatic plankton focus on either macroscopic or microbial communities, and... more SummaryMost studies of aquatic plankton focus on either macroscopic or microbial communities, and on either eukaryotes or prokaryotes. This separation is primarily for methodological reasons, but can overlook potential interactions among groups. We tested whether DNA-metabarcoding of unfractionated water samples with universal primers could be used to qualitatively and quantitatively study the temporal dynamics of the total plankton community in a shallow temperate lake. We found significant changes in the relative proportions of normalized sequence reads of eukaryotic and prokaryotic plankton communities over a three-month period in spring. Patterns followed the same trend as plankton estimates using traditional microscopic methods. We characterized the bloom of a conditionally rare bacterial taxon belonging toArcicella, which rapidly came to dominate the whole lake ecosystem and would have remained unnoticed without metabarcoding. Our data demonstrate the potential of universal DN...
PLOS ONE, 2023
Massively parallel, second-generation short-read DNA sequencing has become an integral tool in bi... more Massively parallel, second-generation short-read DNA sequencing has become an integral tool in biology for genomic studies. Offering highly accurate base-pair resolution at the most competitive price, the technology has become widespread. However, high-throughput generation of multiplexed DNA libraries can be costly and cumbersome. Here, we present a cost-conscious protocol for generating multiplexed short-read DNA libraries using a beadlinked transposome from Illumina. We prepare libraries in high-throughput with small reaction volumes that use 1/50 th the amount of transposome compared to Illumina DNA Prep tagmentation protocols. By reducing transposome usage and optimising the protocol to circumvent magnetic bead-based clean-ups between steps, we reduce costs, labour time and DNA input requirements. Developing our own dual index primers further reduced costs and enables up to nine 96-well microplate combinations. This facilitates efficient usage of largescale sequencing platforms, such as the Illumina NovaSeq 6000, which offers up to three terabases of sequencing per S4 flow cell. The protocol presented substantially reduces the cost per library by approximately 1/20 th compared to conventional Illumina methods.
protocols.io, 2018
Illumina® short-read DNA sequencing has become an integral tool in biology for genome-wide studie... more Illumina® short-read DNA sequencing has become an integral tool in biology for genome-wide studies. Offering accurate base-pair resolution at the most competitive price, the technology has become widespread. However, the generation of multiplexed DNA libraries remains costly and cumbersome. Here, we present a streamlined cost-conscious protocol for generating multiplexed short read DNA libraries using a transposase from Illumina®. By implementing small volumes that use 1/25th the amount of transposase compared to Illumina® NexteraTM protocols, the cost of library preparation can be significantly reduced, by 1/10th or more. Furthermore, we optimised the protocol to minimise carboxylate bead-based cleanups between steps, further reducing cost, time and DNA input. By developing our own indicies to multiplex nine 96-well plates, up to 864 samples can be placed on a single flow cell. This enables efficient usage of monolithic sequencing platforms that can offer over three terabases of se...