A multivariate test of association (original) (raw)
Journal Article
,
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
*To whom correspondence should be addressed.
Search for other works by this author on:
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
Search for other works by this author on:
Revision received:
22 October 2008
Accepted:
24 October 2008
Published:
19 November 2008
Navbar Search Filter Mobile Enter search term Search
Abstract
Summary: Although genetic association studies often test multiple, related phenotypes, few formal multivariate tests of association are available. We describe a test of association that can be efficiently applied to large population-based designs.
Availability: A C++ implementation can be obtained from the authors.
Contact: manuel.ferreira@qimr.edu.au
Supplementary information: Supplementary figures are available at Bioinformatics online.
Genetic association studies often test multiple traits. For example, for many diseases, such as asthma or attention deficit hyperactivity disorder (ADHD), investigators routinely measure multiple endo-phenotypes that are thought to be more proximal to the biological etiology of the clinical disorder. The expectation that often underlies the analysis of multiple traits is that this strategy can identify not only trait-specific quantitative trait loci (QTL), but also those shared between correlated traits.
Analysis of such multivariate datasets typically consists of testing each trait individually and then informally comparing the evidence for association at a particular locus across traits. This approach, however, has two major caveats: first, if unaccounted for, multiple trait testing increases the experiment-wise false-positive rate. Second, it ignores the extra information provided by the cross-trait covariance intrinsic to multivariate datasets.
Although these limitations can be alleviated to some extent by pre- or post-analytic strategies, such as principal components analysis or permutation testing, these can be inefficient and/or computationally intensive when a large number of traits or loci are under investigation. Here, we describe an efficient multivariate test of association for population-based designs that we have implemented in PLINK (Purcell et al., 2007).
Consider a sample of n unrelated individuals with data for two sets of variables, a bi-allelic marker (set 1, with one variable) and k traits (set 2). We use canonical correlation analysis (CCA), which is a multivariate generalization of the Pearson product-moment correlation (Hotelling, 1936), to measure the association between the two sets of variables. Specifically, CCA extracts the linear combination of traits that explain the largest possible amount of the covariation between the marker and all traits. Although this approach is most appropriate for the analysis of normally distributed traits, as we show below, it shows good performance even when considering non-normal or disease traits.
Prior to the actual CCA, the marker is coded according to an allelic dosage scheme that can incorporate dominance; our approach can also be extended to the analysis of multiple markers by expanding the first set of variables to include more than one marker. Missing phenotype data are handled either by case-wise deletion (if data are missing above a pre-defined per-individual missingness threshold) or mean imputation (i.e. a missing phenotype is replaced by the sample mean). The test is based on Wilk's lambda (λ) and the corresponding _F_-approximation, both simplified to the situation where one of the sets contains only one variable (the marker). Specifically, , where is the canonical correlation between the marker and the k traits, calculated as the square root of the eigenvalue of . In the latter, _S_11 is the marker variance, _S_22 the k × k trait covariance matrix, while _S_12 and _S_21 are the 1 × k (or k × 1) covariance matrices between the marker and the k traits. The simplified _F_-approximation is:
The interpretation of a significant multivariate test is aided by the inspection of the weights attributed by the CCA to each phenotype.
To investigate the performance of the proposed method, we simulated data for five quantitative traits (60% heritability each) and one bi-allelic locus (20% minor allele frequency). The QTL explained 0.5% of the total variance of 0 (to assess type-I error rate), 1, 2, 3, 4 or all 5 traits. We considered residual cross-trait correlations (i.e. excluding the QTL effect) of 0.2, 0.4 or 0.6 (with the same sign as the QTL-induced correlation), and simulated data for 600 unrelated individuals. We compared the power of the multivariate approach against two univariate strategies: first, we tested each trait individually using linear regression, selected the most significant test and correct this for multiple testing through the analysis of 100 permuted datasets. Each dataset was generated by randomly permuting the genotypes between individuals, thus preserving the original cross-trait correlations. For the second strategy, a simple Bonferroni correction based on the number of traits analyzed was applied to the most significant univariate test. Power and type-I error (nominal α = 0.05) for each model were based on the analysis of 1000 and 5000 replicates, respectively. This same procedure was used to test the performance when analyzing 10 and 20 traits.
The simulation results are shown in Figure 1. When considering five traits with a modest residual cross-trait correlation (r = 0.2), the power of the multivariate test was comparable to both univariate strategies considered, with the advantage that no permutation testing or Bonferroni adjustment was required to correct for multiple testing. As the number of traits, or the residual correlation between traits, increased, the power of multivariate test improved, consistently outperforming the univariate approaches. The exception was for the extreme models where all traits were associated with the QTL; in this case, the power of the multivariate test decreased as the residual correlation between traits increased. This observation is consistent with previous reports (Allison et al., 1998; Amos et al., 2001; Evans and Duffy, 2004; Ferreira et al., 2006) and was specific to the situation where the QTL-induced trait correlation was of the same sign as the correlation induced by residual shared factors (Supplementary Fig. 1).
Fig. 1.
Performance of the multivariate test of association. Type-I error and power are also shown for two univariate strategies which correct for multiple testing through permutation or simple Bonferroni correction. r indicates, residual trait correlation. See text for details.
The multivariate test maintained appropriate type-I error when some or all traits tested were continuous but not normally distributed (Supplementary Fig. 2) or were measured on a discrete scale (Supplementary Fig. 3).
Finally, we also extended this approach to the analysis of family-based data. Briefly, prior to CCA, each individual's genotype is partitioned into the orthogonal between- (B) and within-family (W) components (Fulker et al., 1999). We then perform CCA using the k traits and either the B (between-family association test), W (within-family association test, which is robust to population stratification effects) or the B + W (total association test) genotype scores, and use an adaptive permutation procedure to account for family structure. Simulations show that when applied to family data, this approach also has appropriate type-I error and improved power when compared to the univariate strategy (Supplementary Fig. 4).
In conclusion, we propose a robust and powerful test of association that can accommodate multiple phenotypes and different study designs. As such, it can be relevant to many genetic association studies of complex traits or diseases.
Funding: Sidney Sax Fellowship from the National Health and Medical Research Council of Australia (to M.A.R.F.).
Conflict of Interest: none declared.
References
et al.
Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages
,
Am. J. Hum. Genet.
,
1998
, vol.
63
(pg.
1190
-
1201
)
et al.
Comparison of multivariate tests for genetic linkage
,
Hum. Hered.
,
2001
, vol.
51
(pg.
133
-
144
)
A simulation study concerning the effect of varying the residual phenotypic correlation on the power of bivariate quantitative trait loci linkage analysis
,
Behav. Genet.
,
2004
, vol.
34
(pg.
135
-
141
)
et al.
A simple method to localise pleiotropic susceptibility loci using univariate linkage analyses of correlated traits
,
Eur. J. Hum. Genet.
,
2006
, vol.
14
(pg.
953
-
962
)
et al.
Combined linkage and association sib-pair analysis for quantitative traits
,
Am. J. Hum. Genet.
,
1999
, vol.
64
(pg.
259
-
267
)
Relations between two sets of variables
,
Biometrika
,
1936
, vol.
28
(pg.
321
-
377
)
et al.
PLINK: a tool set for whole-genome association and population-based linkage analyses
,
Am. J. Hum. Genet.
,
2007
, vol.
81
(pg.
559
-
575
)
Author notes
Associate Editor: Alex Bateman
© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org
Supplementary data
Citations
Views
Altmetric
Metrics
Total Views 4,542
3,432 Pageviews
1,110 PDF Downloads
Since 11/1/2016
Month: | Total Views: |
---|---|
November 2016 | 6 |
December 2016 | 2 |
January 2017 | 12 |
February 2017 | 36 |
March 2017 | 37 |
April 2017 | 25 |
May 2017 | 23 |
June 2017 | 19 |
July 2017 | 20 |
August 2017 | 22 |
September 2017 | 20 |
October 2017 | 28 |
November 2017 | 21 |
December 2017 | 48 |
January 2018 | 35 |
February 2018 | 39 |
March 2018 | 68 |
April 2018 | 54 |
May 2018 | 61 |
June 2018 | 49 |
July 2018 | 58 |
August 2018 | 49 |
September 2018 | 41 |
October 2018 | 23 |
November 2018 | 45 |
December 2018 | 35 |
January 2019 | 46 |
February 2019 | 40 |
March 2019 | 57 |
April 2019 | 48 |
May 2019 | 62 |
June 2019 | 57 |
July 2019 | 67 |
August 2019 | 55 |
September 2019 | 62 |
October 2019 | 57 |
November 2019 | 33 |
December 2019 | 52 |
January 2020 | 53 |
February 2020 | 42 |
March 2020 | 33 |
April 2020 | 40 |
May 2020 | 22 |
June 2020 | 52 |
July 2020 | 19 |
August 2020 | 31 |
September 2020 | 39 |
October 2020 | 49 |
November 2020 | 48 |
December 2020 | 49 |
January 2021 | 49 |
February 2021 | 67 |
March 2021 | 66 |
April 2021 | 59 |
May 2021 | 44 |
June 2021 | 58 |
July 2021 | 58 |
August 2021 | 51 |
September 2021 | 55 |
October 2021 | 65 |
November 2021 | 65 |
December 2021 | 63 |
January 2022 | 81 |
February 2022 | 72 |
March 2022 | 69 |
April 2022 | 71 |
May 2022 | 61 |
June 2022 | 73 |
July 2022 | 89 |
August 2022 | 70 |
September 2022 | 70 |
October 2022 | 111 |
November 2022 | 98 |
December 2022 | 39 |
January 2023 | 35 |
February 2023 | 78 |
March 2023 | 38 |
April 2023 | 43 |
May 2023 | 39 |
June 2023 | 33 |
July 2023 | 42 |
August 2023 | 26 |
September 2023 | 30 |
October 2023 | 52 |
November 2023 | 41 |
December 2023 | 33 |
January 2024 | 80 |
February 2024 | 37 |
March 2024 | 66 |
April 2024 | 45 |
May 2024 | 49 |
June 2024 | 39 |
July 2024 | 35 |
August 2024 | 36 |
September 2024 | 24 |
October 2024 | 39 |
November 2024 | 9 |
Citations
183 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic