Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes - PubMed (original) (raw)

doi: 10.1371/journal.pgen.0010060. Epub 2005 Nov 11.

Sung Kim, Keyan Zhao, Erica Bakker, Matthew Horton, Katrin Jakob, Clare Lister, John Molitor, Chikako Shindo, Chunlao Tang, Christopher Toomajian, Brian Traw, Honggang Zheng, Joy Bergelson, Caroline Dean, Paul Marjoram, Magnus Nordborg

Affiliations

Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes

María José Aranzana et al. PLoS Genet. 2005 Nov.

Abstract

There is currently tremendous interest in the possibility of using genome-wide association mapping to identify genes responsible for natural variation, particularly for human disease susceptibility. The model plant Arabidopsis thaliana is in many ways an ideal candidate for such studies, because it is a highly selfing hermaphrodite. As a result, the species largely exists as a collection of naturally occurring inbred lines, or accessions, which can be genotyped once and phenotyped repeatedly. Furthermore, linkage disequilibrium in such a species will be much more extensive than in a comparable outcrossing species. We tested the feasibility of genome-wide association mapping in A. thaliana by searching for associations with flowering time and pathogen resistance in a sample of 95 accessions for which genome-wide polymorphism data were available. In spite of an extremely high rate of false positives due to population structure, we were able to identify known major genes for all phenotypes tested, thus demonstrating the potential of genome-wide association mapping in A. thaliana and other species with similar patterns of variation. The rate of false positives differed strongly between traits, with more clinal traits showing the highest rate. However, the false positive rates were always substantial regardless of the trait, highlighting the necessity of an appropriate genomic control in association studies.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Summary of the Data Used in the Study

The columns on the left give the genotype and associated phenotype for four loci, for each of the 95 accessions. The four loci are the flowering time locus FRI (+, wild-type; 1, Ler null allele; 2, Col null allele [9]), for which the associated phenotype is flowering time in long-day conditions without vernalization (late flowering is indicated by height and color of bar), and the three pathogen resistance loci Rps5, Rpm1, and Rps2 (+, wild-type; −, null allele [10,11,12]), for which the associated phenotypes are hypersensitive response to the appropriate bacterial avr gene (red indicates resistance, black indicates susceptibility, and missing data are indicated by missing bar). The tree on the right illustrates the genetic relationships between the accessions [8]. It is clear that phenotypes and genotypes are correlated, genome-wide.

Figure 2

Figure 2. The Genome-Wide Distribution of _p_-Values under Different Scenarios

(A) Cumulative distribution of _p-_values for association tests across approximately 850 loci. The sequenced haplotypes at each locus were treated as alleles (after eliminating singleton polymorphisms), and the significance of genotype–phenotype associations was tested using Kruskal–Wallis tests in the case of flowering time (a continuous trait), and using χ2 tests in the case of resistance (a binary trait). Under the null hypothesis of no association, the cumulative distribution should be a straight line: the observed distributions are all heavily skewed towards zero. (B) The cumulative distribution of _p-_values for association with pathogen resistance, with and without correction for population structure using the program STRAT [13]. The false positive rate is decreased for avrPph3, but is unaffected for the other two phenotypes. (C) The cumulative distribution of _p-_values for association with flowering time, with and without correction for population structure. ANOVA was used instead of the nonparametric Kruskal–Wallis test to make it possible to use population structure as cofactor (cf. [14]). The distribution for ANOVA with accessions from Finland and northern Sweden removed is also shown (“ANOVA − northern”). The false positive rate is decreased using both approaches.

Figure 3

Figure 3. Genome-Wide Scans for Association with Flowering Time and Pathogen Resistance

For flowering time (A), four different statistical methods were used (described in Materials and Methods): Voronoi focusing on “late” alleles (magenta line), Voronoi focusing on “early” alleles (blue line), CLASS (green line), and fragment-based Kruskal–Wallis tests (red line; see also Figure 2). For pathogen resistance (avrRpm1 [B], avrRpt2 [C], and avrPph3 [D]), only the last two tests were used. Higher peaks indicate stronger association (the _y_-axes are proportional to the negative log _p-_values, but have been normalized to the highest value within each test). The dotted lines correspond to the 95% percentile and are mainly intended to facilitate comparison between figures. Yellow vertical lines indicate the positions of the appropriate candidate loci. Peaks occur at these loci for all methods, but are otherwise distributed throughout the genome.

Figure 4

Figure 4. Haplotypes Significantly Associated with Flowering Time Clustered by Haplotype Membership

To help determine which associations were real and which were due to population structure, the most significantly associated haplotypes (based on fragment-wise Kruskal–Wallis; see Materials and Methods) were clustered based on similarity in the list of accessions that carry each haplotype. (A) The tree shows the resulting cluster with tips colored according to average flowering time among the accessions that carry the haplotype corresponding to each tip (the scale is given on the right along with a histogram showing the distribution of flowering time across the 95 accessions). (B) The matrix shows the membership list for each haplotype. Each column corresponds to the haplotype (tip) in the tree above it; accessions highlighted in red carry the haplotype significantly associated with flowering time. The tree thus illustrates the clustering of the columns of the matrix: clustering was done based on pairwise distance as measured by the absolute value of the correlation in membership between columns. Phenotypes of the accessions are given on the right, and the rows of the matrix (i.e., the accessions) have been clustered based on pairwise Hamming distance. It is evident that most of the significant haplotypes, regardless of position in the genome, share similar membership lists that include the accessions from Finland and northern Sweden. On the other hand, the clusters corresponding to the known major alleles of FRI are unique, indicating that these are indeed true positives.

Figure 5

Figure 5. The Strength of Association (Using CLASS) around the Four Candidate Loci for Various Marker Densities

For each locus (FRI [A], Rpm1 [B], Rps2 [C], and Rps5 [D]), the bottom panel shows the pattern of association using all available fragment markers around the locus (the position of which is given by a grey vertical line), and the panels above show the effect of successively reducing the marker density so that no markers are within 10, 25, 50, and 100 kb (FRI only) of the causative polymorphisms. The dotted grey line represents the 95th percentile of all associations across the genome. Because we used an association statistic that utilizes the pattern of haplotype sharing across multiple fragments, the relative significance of any particular fragment may change depending on the presence or absence of other fragments. The FRI region (A) remains strongly associated with flowering time even for the lowest marker density, while the signal of association around the R genes (B–D) disappears as one goes from 10- to 25-kb spacing.

Similar articles

Cited by

References

    1. Nordborg M, Tavaré S. Linkage disequilibrium: What history has to tell us. Trends Genet. 2002;18:83–90. - PubMed
    1. Weiss KM, Terwilliger JD. How many diseases does it take to map a gene with SNPs? Nature Genet. 2000;26:151–157. - PubMed
    1. Zondervan KT, Cardon LR. The complex interplay among factors that influence allelic association. Nature Rev Genet. 2004;5:89–100. - PubMed
    1. Lander ES, Schork NJ. Genetic dissection of complex traits. Science. 1994;265:2037–2048. - PubMed
    1. Grupe A, Germer S, Usuka J, Aud D, Belknap JK, et al. In silico mapping of complex disease-related traits in mice. Science. 2001;292:1915–1918. - PubMed

Publication types

MeSH terms

LinkOut - more resources