Genomic basis of European ash tree resistance to ash dieback fungus - PubMed (original) (raw)
Genomic basis of European ash tree resistance to ash dieback fungus
Jonathan J Stocks et al. Nat Ecol Evol. 2019 Dec.
Abstract
Populations of European ash trees (Fraxinus excelsior) are being devastated by the invasive alien fungus Hymenoscyphus fraxineus, which causes ash dieback. We sequenced whole genomic DNA from 1,250 ash trees in 31 DNA pools, each pool containing trees with the same ash dieback damage status in a screening trial and from the same seed-source zone. A genome-wide association study identified 3,149 single nucleotide polymorphisms (SNPs) associated with low versus high ash dieback damage. Sixty-one of the 192 most significant SNPs were in, or close to, genes with putative homologues already known to be involved in pathogen responses in other plant species. We also used the pooled sequence data to train a genomic prediction model, cross-validated using individual whole genome sequence data generated for 75 healthy and 75 damaged trees from a single seed source. The model's genomic estimated breeding values (GEBVs) allocated these 150 trees to their observed health statuses with 67% accuracy using 10,000 SNPs. Using the top 20% of GEBVs from just 200 SNPs, we could predict observed tree health with over 90% accuracy. We infer that ash dieback resistance in F. excelsior is a polygenic trait that should respond well to both natural selection and breeding, which could be accelerated using genomic prediction.
Conflict of interest statement
Declaration of Interests
The authors declare no competing financial interests.
Figures
Extended Data Fig. 1. Schematic overview of the study design.
Showing sampling and pooling strategies and dependencies of analyses for genome-wide association study and genomic prediction.
Extended Data Fig. 2. Circle plot of major allele frequency correlation values between all 31 pools in the Pool-seq dataset.
Numbers after seed source code correspond to health status (1 - healthy or 2 - damaged by ADB). Pool NSZ204:1 (with low ADB damage) was technically replicated (NSZ204:1R) using the same set of trees. Both pools from NSZ106 and NSZ107 were biologically replicated for both high and low damage pools, using different sets of trees. High correlation for both technical (NSZ204:1R) and biological replicates (NSZ 106 & 107) can be seen.
Extended Data Fig. 3. Detection of contamination in the F. excelsior reference genome (BATG0.5).
Blobtools plot for the showing taxonomic affiliation at the phylum rank level, distributed according to GC content and base coverage. Contigs that were not classified as streptophyta corresponded to 0.5% of the genome assembly and 0.24% of all mapped reads.
Extended Data Fig. 4. Pool-seq GWAS p-value density histogram with line plots of the q-values and local False Discovery Rate (FDR) values versus p-values.
The π0 estimate is also displayed.
Extended Data Fig. 5. Predicted protein structures for genes containing amino acid changes associated with tree health status under ADB pressure.
The protein structures to the left were more common in damaged trees, and those to the right were more common in healthy trees. Variant amino acids are coloured in magenta and indicated with a black arrowhead. (a) Gene FRAEX38873_v2_000003260, a BED finger-NBS-LRR resistance protein, where position 157 is a leucine (left) versus tryptophan (right) variant. Two ATP molecules are shown in orange to indicate the location of nucleotide binding sites. (b) Gene FRAEX38873_v2_000164520, a F-box/kelch-repeat, where position 13 is a glutamine (left) versus arginine (right) variant. (c) FRAEX38873_v2_000180950, a Protein DAMAGED DNA-BINDING, where position 99 is a proline (left) versus leucine (right) variant. DNA molecules are shown in orange docked at the proteins’ DNA binding sites. (d) Gene FRAEX38873_v2_000116110, a 60S ribosomal protein L4-1, where position 251 is an arginine (left) versus glycine (right) variant, position 285 is a methionine (left) versus arginine (right) variant, position 287 is an asparagine (left) versus lysine (right) variant and position 297 is a threonine (left) versus alanine (right) variant.
Extended Data Fig. 6. Genomic prediction results using the 150 individually genotyped samples as both training and testing set, showing little difference in accuracy between GWAS SNPs and random SNPs.
(A) GWAS candidate SNPs with all data filters applied (mapping quality, indel and repeat removal); (B) GWAS candidate SNPs only filtering by mapping quality and indel removal; (C) random selection of SNPs using all data filters (mean and standard error shown for N=10 runs, each of 500 iterations); (D) GP allocation accuracy calculated using data with all filters applied. The scale on the left hand vertical axis is for correlation, and the scale on the right hand vertical axis is for accuracy. 100 to 5 million SNPs used to train and test the rrBLUP model.
Extended Data Fig. 7. Genomic prediction using Pool-seq data for training and 150 NSZ 204 individuals for testing.
Dashed lines show results excluding Pool-seq data from NSZ 204 (the test seed source) from the training dataset, whereas solid lines show results with NSZ 204 included. The left column shows correlation of observed phenotype and GEBV and the right column shows accuracy of phenotypic assignment from GEBV.
Figure 1. Summary of variation among the sequenced DNA pools using Correspondence Analysis (CA).
Major allele frequencies were used for all 31 seed source populations (including replicate). Numbers after seed source code correspond to health status (1 - healthy or 2 - infected by ADB). The vertical axis represents Principal Coordinate 1, which accounts for 10% of the variation and the horizontal axis represents Principal Coordinate 2, which accounts for 9% of the variation.
Figure 2. Manhattan plot for pool-seq genome-wide association study of tree health under natural ash dieback inoculation.
For each SNP a -log10(p) value is shown. The green line represents the p = 1 x 10-13 threshold. Loci are ordered by position in the_F. excelsior_ reference genome (BATG0.5).
Figure 3. Manhattan plots for contigs containing genes with missense variants associated with tree health under natural ash dieback inoculations.
Points representing SNPs within genes are colored and those genes containing missense SNPs are named above the plot in the same colour as the points representing SNPs within them. The red line represents the p = 1 x 10-13 threshold.
Figure 4. Performance of genomic prediction models for health under ash dieback pressure.
For 150 individual ash trees, with models trained on pooled sequencing of 1250 trees, using varying numbers of SNPs in training and test sets. Solid lines show results for SNPs selected using the pool-seq GWAS; dashed lines show mean results for repeated runs (n=10) of randomly selected SNPs, with bars indicating standard error. Left column: correlation of genomic estimated breeding value (GEBV) with observed health status. Right column: accuracy of health status assignment from GEBV.
Figure 5. Performance of genomic prediction models for selection.
Genomic prediction accuracy of assignment of health status for the (left) top 20% and (right) top 30% of test population trees by GEBV, using 1000 to 50,000 SNPs identified by GWAS in the training set and use of ten to 250 SNPs in the testing set.
References
- Mitchell RJ, et al. Ash dieback in the UK: A review of the ecological and conservation implications and potential management options. Biological Conservation. 2014 doi: 10.1016/j.biocon.2014.04.019. -DOI
- Pautasso M, Aas G, Queloz V, Holdenrieder O. European ash (Fraxinus excelsior) dieback - A conservation biology challenge. Biological Conservation. 2013 doi: 10.1016/j.biocon.2012.08.026. -DOI
- Plumb WJ, et al. The viability of a breeding programme for ash in the British Isles in the face of ash dieback. Plants People Planet. 2019
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous