Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes - PubMed (original) (raw)
doi: 10.1371/journal.pgen.0010060. Epub 2005 Nov 11.
Sung Kim, Keyan Zhao, Erica Bakker, Matthew Horton, Katrin Jakob, Clare Lister, John Molitor, Chikako Shindo, Chunlao Tang, Christopher Toomajian, Brian Traw, Honggang Zheng, Joy Bergelson, Caroline Dean, Paul Marjoram, Magnus Nordborg
Affiliations
- PMID: 16292355
- PMCID: PMC1283159
- DOI: 10.1371/journal.pgen.0010060
Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes
María José Aranzana et al. PLoS Genet. 2005 Nov.
Abstract
There is currently tremendous interest in the possibility of using genome-wide association mapping to identify genes responsible for natural variation, particularly for human disease susceptibility. The model plant Arabidopsis thaliana is in many ways an ideal candidate for such studies, because it is a highly selfing hermaphrodite. As a result, the species largely exists as a collection of naturally occurring inbred lines, or accessions, which can be genotyped once and phenotyped repeatedly. Furthermore, linkage disequilibrium in such a species will be much more extensive than in a comparable outcrossing species. We tested the feasibility of genome-wide association mapping in A. thaliana by searching for associations with flowering time and pathogen resistance in a sample of 95 accessions for which genome-wide polymorphism data were available. In spite of an extremely high rate of false positives due to population structure, we were able to identify known major genes for all phenotypes tested, thus demonstrating the potential of genome-wide association mapping in A. thaliana and other species with similar patterns of variation. The rate of false positives differed strongly between traits, with more clinal traits showing the highest rate. However, the false positive rates were always substantial regardless of the trait, highlighting the necessity of an appropriate genomic control in association studies.
Conflict of interest statement
Competing interests. The authors have declared that no competing interests exist.
Figures
Figure 1. Summary of the Data Used in the Study
The columns on the left give the genotype and associated phenotype for four loci, for each of the 95 accessions. The four loci are the flowering time locus FRI (+, wild-type; 1, Ler null allele; 2, Col null allele [9]), for which the associated phenotype is flowering time in long-day conditions without vernalization (late flowering is indicated by height and color of bar), and the three pathogen resistance loci Rps5, Rpm1, and Rps2 (+, wild-type; −, null allele [10,11,12]), for which the associated phenotypes are hypersensitive response to the appropriate bacterial avr gene (red indicates resistance, black indicates susceptibility, and missing data are indicated by missing bar). The tree on the right illustrates the genetic relationships between the accessions [8]. It is clear that phenotypes and genotypes are correlated, genome-wide.
Figure 2. The Genome-Wide Distribution of _p_-Values under Different Scenarios
(A) Cumulative distribution of _p-_values for association tests across approximately 850 loci. The sequenced haplotypes at each locus were treated as alleles (after eliminating singleton polymorphisms), and the significance of genotype–phenotype associations was tested using Kruskal–Wallis tests in the case of flowering time (a continuous trait), and using χ2 tests in the case of resistance (a binary trait). Under the null hypothesis of no association, the cumulative distribution should be a straight line: the observed distributions are all heavily skewed towards zero. (B) The cumulative distribution of _p-_values for association with pathogen resistance, with and without correction for population structure using the program STRAT [13]. The false positive rate is decreased for avrPph3, but is unaffected for the other two phenotypes. (C) The cumulative distribution of _p-_values for association with flowering time, with and without correction for population structure. ANOVA was used instead of the nonparametric Kruskal–Wallis test to make it possible to use population structure as cofactor (cf. [14]). The distribution for ANOVA with accessions from Finland and northern Sweden removed is also shown (“ANOVA − northern”). The false positive rate is decreased using both approaches.
Figure 3. Genome-Wide Scans for Association with Flowering Time and Pathogen Resistance
For flowering time (A), four different statistical methods were used (described in Materials and Methods): Voronoi focusing on “late” alleles (magenta line), Voronoi focusing on “early” alleles (blue line), CLASS (green line), and fragment-based Kruskal–Wallis tests (red line; see also Figure 2). For pathogen resistance (avrRpm1 [B], avrRpt2 [C], and avrPph3 [D]), only the last two tests were used. Higher peaks indicate stronger association (the _y_-axes are proportional to the negative log _p-_values, but have been normalized to the highest value within each test). The dotted lines correspond to the 95% percentile and are mainly intended to facilitate comparison between figures. Yellow vertical lines indicate the positions of the appropriate candidate loci. Peaks occur at these loci for all methods, but are otherwise distributed throughout the genome.
Figure 4. Haplotypes Significantly Associated with Flowering Time Clustered by Haplotype Membership
To help determine which associations were real and which were due to population structure, the most significantly associated haplotypes (based on fragment-wise Kruskal–Wallis; see Materials and Methods) were clustered based on similarity in the list of accessions that carry each haplotype. (A) The tree shows the resulting cluster with tips colored according to average flowering time among the accessions that carry the haplotype corresponding to each tip (the scale is given on the right along with a histogram showing the distribution of flowering time across the 95 accessions). (B) The matrix shows the membership list for each haplotype. Each column corresponds to the haplotype (tip) in the tree above it; accessions highlighted in red carry the haplotype significantly associated with flowering time. The tree thus illustrates the clustering of the columns of the matrix: clustering was done based on pairwise distance as measured by the absolute value of the correlation in membership between columns. Phenotypes of the accessions are given on the right, and the rows of the matrix (i.e., the accessions) have been clustered based on pairwise Hamming distance. It is evident that most of the significant haplotypes, regardless of position in the genome, share similar membership lists that include the accessions from Finland and northern Sweden. On the other hand, the clusters corresponding to the known major alleles of FRI are unique, indicating that these are indeed true positives.
Figure 5. The Strength of Association (Using CLASS) around the Four Candidate Loci for Various Marker Densities
For each locus (FRI [A], Rpm1 [B], Rps2 [C], and Rps5 [D]), the bottom panel shows the pattern of association using all available fragment markers around the locus (the position of which is given by a grey vertical line), and the panels above show the effect of successively reducing the marker density so that no markers are within 10, 25, 50, and 100 kb (FRI only) of the causative polymorphisms. The dotted grey line represents the 95th percentile of all associations across the genome. Because we used an association statistic that utilizes the pattern of haplotype sharing across multiple fragments, the relative significance of any particular fragment may change depending on the presence or absence of other fragments. The FRI region (A) remains strongly associated with flowering time even for the lowest marker density, while the signal of association around the R genes (B–D) disappears as one goes from 10- to 25-kb spacing.
Similar articles
- Linkage and association mapping of Arabidopsis thaliana flowering time in nature.
Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M, Bergelson J, Cuguen J, Roux F. Brachi B, et al. PLoS Genet. 2010 May 6;6(5):e1000940. doi: 10.1371/journal.pgen.1000940. PLoS Genet. 2010. PMID: 20463887 Free PMC article. - Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines.
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Jiang R, Muliyati NW, Zhang X, Amer MA, Baxter I, Brachi B, Chory J, Dean C, Debieu M, de Meaux J, Ecker JR, Faure N, Kniskern JM, Jones JD, Michael T, Nemri A, Roux F, Salt DE, Tang C, Todesco M, Traw MB, Weigel D, Marjoram P, Borevitz JO, Bergelson J, Nordborg M. Atwell S, et al. Nature. 2010 Jun 3;465(7298):627-31. doi: 10.1038/nature08800. Epub 2010 Mar 24. Nature. 2010. PMID: 20336072 Free PMC article. - Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping.
Chan EK, Rowe HC, Kliebenstein DJ. Chan EK, et al. Genetics. 2010 Jul;185(3):991-1007. doi: 10.1534/genetics.109.108522. Epub 2009 Sep 7. Genetics. 2010. PMID: 19737743 Free PMC article. - Natural genetic variation in Arabidopsis: tools, traits and prospects for evolutionary ecology.
Shindo C, Bernasconi G, Hardtke CS. Shindo C, et al. Ann Bot. 2007 Jun;99(6):1043-54. doi: 10.1093/aob/mcl281. Epub 2007 Jan 26. Ann Bot. 2007. PMID: 17259228 Free PMC article. Review. - Arabidopsis in Madison: genes and phenotypes spread like weeds.
Chasan R. Chasan R. Plant Cell. 1995 Nov;7(11):1737-48. doi: 10.1105/tpc.7.11.1737. Plant Cell. 1995. PMID: 8535131 Free PMC article. Review. No abstract available.
Cited by
- Genome-wide association mapping of salinity tolerance in rice (Oryza sativa).
Kumar V, Singh A, Mithra SV, Krishnamurthy SL, Parida SK, Jain S, Tiwari KK, Kumar P, Rao AR, Sharma SK, Khurana JP, Singh NK, Mohapatra T. Kumar V, et al. DNA Res. 2015 Apr;22(2):133-45. doi: 10.1093/dnares/dsu046. Epub 2015 Jan 27. DNA Res. 2015. PMID: 25627243 Free PMC article. - Structured patterns in geographic variability of metabolic phenotypes in Arabidopsis thaliana.
Kleessen S, Antonio C, Sulpice R, Laitinen R, Fernie AR, Stitt M, Nikoloski Z. Kleessen S, et al. Nat Commun. 2012;3:1319. doi: 10.1038/ncomms2333. Nat Commun. 2012. PMID: 23271653 - Association genetics in Pinus taeda L. I. Wood property traits.
González-Martínez SC, Wheeler NC, Ersoz E, Nelson CD, Neale DB. González-Martínez SC, et al. Genetics. 2007 Jan;175(1):399-409. doi: 10.1534/genetics.106.061127. Epub 2006 Nov 16. Genetics. 2007. PMID: 17110498 Free PMC article. - High-throughput marker discovery in melon using a self-designed oligo microarray.
Ophir R, Eshed R, Harel-Beja R, Tzuri G, Portnoy V, Burger Y, Uliel S, Katzir N, Sherman A. Ophir R, et al. BMC Genomics. 2010 Apr 28;11:269. doi: 10.1186/1471-2164-11-269. BMC Genomics. 2010. PMID: 20426811 Free PMC article. - The genetic architecture of shoot branching in Arabidopsis thaliana: a comparative assessment of candidate gene associations vs. quantitative trait locus mapping.
Ehrenreich IM, Stafford PA, Purugganan MD. Ehrenreich IM, et al. Genetics. 2007 Jun;176(2):1223-36. doi: 10.1534/genetics.107.071928. Epub 2007 Apr 15. Genetics. 2007. PMID: 17435248 Free PMC article.
References
- Nordborg M, Tavaré S. Linkage disequilibrium: What history has to tell us. Trends Genet. 2002;18:83–90. - PubMed
- Weiss KM, Terwilliger JD. How many diseases does it take to map a gene with SNPs? Nature Genet. 2000;26:151–157. - PubMed
- Zondervan KT, Cardon LR. The complex interplay among factors that influence allelic association. Nature Rev Genet. 2004;5:89–100. - PubMed
- Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. - PubMed
- Grupe A, Germer S, Usuka J, Aud D, Belknap JK, et al. In silico mapping of complex disease-related traits in mice. Science. 2001;292:1915–1918. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials