Dysregulation of expression correlates with rare-allele burden and fitness loss in maize (original) (raw)

Accession codes

Primary accessions

BioProject

Sequence Read Archive

References

  1. Kimura, M., Maruyama, T. & Crow, J. F. The mutation load in small populations. Genetics 48, 1303–1312 (1963)
    CAS PubMed PubMed Central Google Scholar
  2. Marth, G. T. et al. The functional spectrum of low-frequency coding variation. Genome Biol. 12, R84 (2011)
    Article Google Scholar
  3. Henn, B. M., Botigué, L. R., Bustamante, C. D., Clark, A. G. & Gravel, S. Estimating the mutation load in human genomes. Nat. Rev. Genet. 16, 333–343 (2015)
    Article CAS Google Scholar
  4. Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2012)
    Article CAS Google Scholar
  5. Troyer, A. F. A retrospective view of corn genetic resources. J. Hered. 81, 17–24 (1990)
    Article Google Scholar
  6. Remington, D. L. et al. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl Acad. Sci. USA 98, 11479–11484 (2001)
    Article ADS CAS Google Scholar
  7. Kono, T. J. Y. et al. The role of deleterious substitutions in crop genomes. Mol. Biol. Evol. 33, 2307–2317 (2016)
    Article CAS Google Scholar
  8. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)
    Article ADS CAS Google Scholar
  9. Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014)
    Article CAS Google Scholar
  10. Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016)
    Article CAS Google Scholar
  11. Jiao, Y. et al. Genome-wide genetic changes during modern breeding of maize. Nat. Genet. 44, 812–815 (2012)
    Article CAS Google Scholar
  12. Gore, M. A. et al. A first-generation haplotype map of maize. Science 326, 1115–1117 (2009)
    Article ADS CAS Google Scholar
  13. Tenaillon, M. I. et al. Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl Acad. Sci. USA 98, 9161–9166 (2001)
    Article ADS CAS Google Scholar
  14. Vigouroux, Y. et al. Rate and pattern of mutation at microsatellite loci in maize. Mol. Biol. Evol. 19, 1251–1260 (2002)
    Article CAS Google Scholar
  15. Beissinger, T. M. et al. Recent demography drives changes in linked selection across the maize genome. Nat. Plants 2, 16084 (2016)
    Article Google Scholar
  16. Duvick, D. N. The contribution of breeding to yield advances in maize (Zea mays L.). Adv. Agron. 86, 83–145 (2005)
    Article Google Scholar
  17. Troyer, A. F. & Wellin, E. J. Heterosis decreasing in hybrids: yield test inbreds. Crop Sci. 49, 1969–1976 (2009)
    Article Google Scholar
  18. Flint-Garcia, S. A. et al. Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J. 44, 1054–1064 (2005)
    Article CAS Google Scholar
  19. Eveland, A. L., McCarty, D. R. & Koch, K. E. Transcript profiling by 3′-untranslated region sequencing resolves expression of gene families. Plant Physiol. 146, 32–44 (2008)
    Article CAS Google Scholar
  20. Lohman, B. K., Weber, J. N. & Bolnick, D. I. Evaluation of TagSeq, a reliable low-cost alternative for RNAseq. Mol. Ecol. Resour. 16, 1315–1321 (2016)
    Article CAS Google Scholar
  21. Bukowski, R. et al. Construction of the third generation Zea mays haplotype map. Gigascience https://doi.org/10.1093/gigascience/gix134 (2017)
  22. Romay, M. C. et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 14, R55 (2013)
    Article Google Scholar
  23. Yao, H., Dogra Gray, A., Auger, D. L. & Birchler, J. A. Genomic dosage effects on heterosis in triploid maize. Proc. Natl Acad. Sci. USA 110, 2665–2669 (2013)
    Article ADS CAS Google Scholar
  24. Josephs, E. B., Lee, Y. W., Stinchcombe, J. R. & Wright, S. I. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc. Natl Acad. Sci. USA 112, 15390–15395 (2015)
    Article ADS CAS Google Scholar
  25. Gout, J.-F., Kahn, D., Duret, L. & Paramecium Post-Genomics Consortium. The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution. PLoS Genet. 6, e1000944 (2010)
    Article Google Scholar
  26. Hufford, M. B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012)
    Article CAS Google Scholar
  27. Hung, H.-Y. et al. The relationship between parental genetic or phenotypic divergence and progeny variation in the maize nested association mapping population. Heredity 108, 490–499 (2012)
    Article CAS Google Scholar
  28. Rodgers-Melnick, E. et al. Recombination in diverse maize is stable, predictable, and associated with genetic load. Proc. Natl Acad. Sci. USA 112, 3823–3828 (2015)
    ADS CAS PubMed Google Scholar
  29. Wan, C. Y. & Wilkins, T. A. A modified hot borate method significantly enhances the yield of high-quality RNA from cotton (Gossypium hirsutum L.). Anal. Biochem. 223, 7–12 (1994)
    Article CAS Google Scholar
  30. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014)
    Article CAS Google Scholar
  31. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)
    Article CAS Google Scholar
  32. Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015)
    Article CAS Google Scholar
  33. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)
    Article CAS Google Scholar
  34. Money, D. et al. LinkImpute: fast and accurate genotype imputation for nonmodel organisms. G3 5, 2383–2390 (2015)
    Article Google Scholar
  35. Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007)
    Article CAS Google Scholar
  36. Swarts, K. et al. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome 7, https://doi.org/10.3835/plantgenome2014.05.0023 (2014)
    Article CAS Google Scholar
  37. Ramu, P. et al. Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation. Nat. Genet. 49, 959–963 (2017)
    Article CAS Google Scholar
  38. Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010)
    Article ADS MathSciNet Google Scholar
  39. Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012)
    Article CAS Google Scholar
  40. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
    Article Google Scholar
  41. Kisselbach, T. A. The Structure and Reproduction of Corn (Cold Spring Harbor Laboratory, 1999)

Download references

Acknowledgements

We thank J. Pardo, J. Wallace, R. Punna, K. Shirasawa and S. Miller for assistance with tissue collection; J. Budka and G. Inzinna for field and greenhouse assistance; R. Bukowski for running the maize HapMap genotyping pipeline; L. Johnson and Z. Miller for database curation; G. Gibson, M. Wolfe, J.-L. Jannink, M. Hufford and J. Ross-Ibarra for discussions; P. Schweitzer, J. Mosher, A. Tate, J. Mattison, M. Magallanes-Lundback, I. Holländer and D. Daujotyte for guidance on RNA extraction, library preparation automation and sequencing; and S. Miller for copy-editing. This work was supported by the US Department of Agriculture–Agricultural Research Service and the National Science Foundation grants IOS-0922493 and IOS-1238014 to E.S.B. The National Science Foundation Graduate Research Fellowship Program grant DGE-1650441 and the Section of Plant Breeding and Genetics at Cornell University provided support to K.A.G.K. The Taiwanese Ministry of Science and Technology Overseas Project for Post Graduate Research grant 104-2917-I-564-015 supported S.-Y.C.

Author information

Authors and Affiliations

  1. Section of Plant Breeding and Genetics, 175 Biotechnology Building, Cornell University, Ithaca, 14853, New York, USA
    Karl A. G. Kremling, Kelly L. Swarts & Edward S. Buckler
  2. Institute for Genomic Diversity, 175 Biotechnology Building, Cornell University, Ithaca, 14853, New York, USA
    Shu-Yun Chen, Mei-Hsiu Su, M. Cinta Romay, Fei Lu & Edward S. Buckler
  3. Institute of Plant and Microbial Biology, Academia Sinica 128, Sec 2nd, Academia road, Taipei, 11529, Taiwan
    Shu-Yun Chen
  4. USDA-ARS, R. W. Holley Center, Cornell University, Ithaca, 14853, New York, USA
    Nicholas K. Lepak, Peter J. Bradbury & Edward S. Buckler
  5. Department of Molecular Biology, Research Group for Ancient Genomics and Evolution, Max Planck Institute for Developmental Biology, Spemannstr. 35, Tübingen, 72076, Germany
    Kelly L. Swarts
  6. The State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
    Fei Lu
  7. Department of Plant Sciences, University of California Davis, Davis, 95616, California, USA
    Anne Lorant

Authors

  1. Karl A. G. Kremling
  2. Shu-Yun Chen
  3. Mei-Hsiu Su
  4. Nicholas K. Lepak
  5. M. Cinta Romay
  6. Kelly L. Swarts
  7. Fei Lu
  8. Anne Lorant
  9. Peter J. Bradbury
  10. Edward S. Buckler

Contributions

K.A.G.K. and E.S.B. designed the experiments and wrote the manuscript. K.A.G.K performed the analyses and made the RNA-seq libraries. K.A.G.K., S.-Y.C., and M.-H.S. extracted RNA. N.K.L. managed germplasm and plants with K.A.G.K., M.C.R., K.L.S. and A.L. produced and imputed HapMap genotypic data. P.J.B. implemented matrixEQTL in Java/TASSEL. F.L. implemented SNP calling from RNA-seq data.

Corresponding authors

Correspondence toKarl A. G. Kremling or Edward S. Buckler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks N. Springer and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Tissues that were expression profiled by 3′ RNA-seq.

See additional details regarding tissue collection in Methods. Illustrations inspired by ref. 41.

Extended Data Figure 2 Higher numbers of rare alleles are upstream of genes in extreme-expressing individuals, for the most highly expressed genes.

Quadratic regression of the expression rank of each line, for each of the top 5,000 most-expressed genes versus the average local (5-kb upstream) rare-allele count. a, Base of leaf three (n = 263 unique inbred samples). b, Tip of leaf three (n = 265 unique inbred samples). c, Adult leaves collected during the day (n = 204 unique inbred samples). d, Adult leaves collected at night (n = 260 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Roots of germinating seedling (n = 273 unique inbred samples). g, Shoots of germinating seedling (n = 278 unique inbred samples).

Extended Data Figure 3 Higher numbers of rare alleles are upstream of genes in extreme-expressing individuals, for the medium-expressed genes.

Quadratic regression of the expression rank of each line, for each of the top 5,001–10,000 most-expressed genes versus the average local (5-kb upstream) rare-allele count. a, Base of leaf three (n = 263 unique inbred samples). b, Tip of leaf three (n = 265 unique inbred samples). c, Adult leaves collected during the day (n = 204 unique inbred samples). d, Adult leaves collected at night (n = 260 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples).f, Roots of germinating seedling (n = 273 unique inbred samples). g, Shoots of germinating seedling (n = 278 unique inbred samples).

Extended Data Figure 4 Comparison of the number of rare cis alleles near genes with differing expression levels.

The 10,000 most-expressed genes in each tissue are divided into groups of 1,000 on the basis of expression level. Plots in each panel show genes ranked 1–1,000, 1,001–2,000, …, 9,001–10,000 from left to right. Each of the individuals represented in each tissue is ranked for expression for each of the 1,000 genes in each group. Individuals in the bottom five expression ranks (fuchsia) versus the middle two quartiles (yellow) versus the top five expression ranks (blue) (mean ± s.e.m.). Y axes refer to mean upstream (within 5 kb) rare-allele count. a, Roots of germinating seedling (n = 273 unique inbred samples). b, Shoots of germinating seedling (n = 278 unique inbred samples). c, Kernels at 350-growing-degree days (n = 229 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Tip of leaf three (n = 265 unique inbred samples). f, Adult leaves collected during the day (n = 204 unique inbred samples). g, Adult leaves collected at night (n = 260 unique inbred samples).

Extended Data Figure 5 eQTL _R_2 distribution comparisons between SNPs in 0.0–0.1 (tropical MAF) and 0.1–0.2 (RNA-set MAF) versus 0.1–0.2 (RNA-set and tropical MAF).

a, Adult leaves collected at night (n = 260 unique inbred samples). b, Adult leaves collected during the day (n = 204 unique inbred samples). c, Tip of leaf three (n = 265 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Shoots of germinating seedling (n = 278 unique inbred samples). g, Roots of germinating seedling (n = 273 unique inbred samples). All pairs of distributions within each tissue are significantly different. P < 2.2 × 10−16 two-sided Wilcoxon signed-rank test and Kolmogorov–Smirnov test.

Extended Data Figure 6 eQTL _R_2 distribution comparisons between SNPs in 0.0–0.1 (tropical MAF) and 0.4–0.5 (RNA-set MAF) versus 0.4–0.5 (RNA-set and tropical MAF).

a, Adult leaves collected at night (n = 260 unique inbred samples). b, Adult leaves collected during the day (n = 204 unique inbred samples). c, Tip of leaf three (n = 265 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Shoots of germinating seedling (n = 278 unique inbred samples). g, Roots of germinating seedling (n = 273 unique inbred samples). All pairs of distributions within each tissue are significantly different. P < 2.2 × 10−16 two-sided Wilcoxon signed-rank test and Kolmogorov–Smirnov test.

Extended Data Figure 7 Expression value and dysregulation of 5,000 most-expressed genes are both predictive of fitness.

Orange boxes represent correlations between predicted and true seed weight when using expression values. Yellow boxes represent correlations between predicted and true seed weight when using absolute deviation in expression from the population mean. Range of correlations between predicted and true seed weight is displayed from ten repetitions of nested tenfold cross validation (ten inner and ten outer) using ridge regression. In the box plots, the middle horizontal lines represent the median, hinges represent the 25th and 75th percentiles (the interquartile range), the upper and lower whiskers extend to maximum and minimum points no more than 1.5× interquartile range beyond the hinges, and individual dots are outliers beyond the whiskers. Sample sizes: 2-cm root tips of germinating seedlings (unique n = 181) and whole shoots of germinating seedlings (unique n = 183); the 2-cm base (unique n = 181) and tip (unique n = 182) of leaf 3; leaves collected in the field during the day (unique n = 135) and night (unique n = 187); and 350-growing-degree-day kernels (unique n = 171), post sexual maturity (anthesis).

Extended Data Figure 8 Cumulative expression dysregulation of the 5,000 most-expressed genes in each tissue versus seed weight.

a, Adult leaves collected at night (n = 221 unique inbred samples). b, Adult leaves collected during the day (n = 171 unique inbred samples). c, Tip of leaf three (n = 226 unique inbred samples). d, Base of leaf three (n = 224 unique inbred samples). e, Kernels at 350-growing-degree days (n = 195 unique inbred samples). f, Shoots of germinating seedling (n = 235 unique inbred samples). g, Roots of germinating seedling (n = 226 unique inbred samples). Regression statistics in Extended Data Table 1. Sweet corn and popcorn lines were excluded from these regressions.

Extended Data Figure 9 Mean upstream rare-allele count from the 5,000 most highly expressed genes versus seed weight.

a, Adult leaves collected at night (n = 221 unique inbred samples). b, Adult leaves collected during the day (n = 171 unique inbred samples). c, Tip of leaf three (n = 226 unique inbred samples). d, Base of leaf three (n = 224 unique inbred samples). e, Kernels at 350-growing-degree days (n = 195 unique inbred samples). f, Shoots of germinating seedling (n = 235 unique inbred samples). g, Roots of germinating seedling (n = 226 unique inbred samples).

Extended Data Table 1 Regression statistics for cumulative expression dysregulation in each tissue against seed-weight fitness

Full size table

Supplementary information

PowerPoint slides

Rights and permissions

About this article

Cite this article

Kremling, K., Chen, SY., Su, MH. et al. Dysregulation of expression correlates with rare-allele burden and fitness loss in maize.Nature 555, 520–523 (2018). https://doi.org/10.1038/nature25966

Download citation