A multivariate test of association (original) (raw)

Journal Article

,

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

*To whom correspondence should be addressed.

Search for other works by this author on:

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

1Department of Psychiatry, Massachusetts General Hospital, 2Department of Psychiatry, Harvard Medical School, 3Center for Human Genetic Research, Massachusetts General Hospital, Boston, 4Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA, 5Genetic Epidemiology, Queensland Institute of Medical Research, QLD, Australia, 6Broad Institute of Harvard and MIT, Cambridge and 7Psychiatric & Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

Search for other works by this author on:

Revision received:

22 October 2008

Accepted:

24 October 2008

Published:

19 November 2008

Navbar Search Filter Mobile Enter search term Search

Abstract

Summary: Although genetic association studies often test multiple, related phenotypes, few formal multivariate tests of association are available. We describe a test of association that can be efficiently applied to large population-based designs.

Availability: A C++ implementation can be obtained from the authors.

Contact: manuel.ferreira@qimr.edu.au

Supplementary information: Supplementary figures are available at Bioinformatics online.

Genetic association studies often test multiple traits. For example, for many diseases, such as asthma or attention deficit hyperactivity disorder (ADHD), investigators routinely measure multiple endo-phenotypes that are thought to be more proximal to the biological etiology of the clinical disorder. The expectation that often underlies the analysis of multiple traits is that this strategy can identify not only trait-specific quantitative trait loci (QTL), but also those shared between correlated traits.

Analysis of such multivariate datasets typically consists of testing each trait individually and then informally comparing the evidence for association at a particular locus across traits. This approach, however, has two major caveats: first, if unaccounted for, multiple trait testing increases the experiment-wise false-positive rate. Second, it ignores the extra information provided by the cross-trait covariance intrinsic to multivariate datasets.

Although these limitations can be alleviated to some extent by pre- or post-analytic strategies, such as principal components analysis or permutation testing, these can be inefficient and/or computationally intensive when a large number of traits or loci are under investigation. Here, we describe an efficient multivariate test of association for population-based designs that we have implemented in PLINK (Purcell et al., 2007).

Consider a sample of n unrelated individuals with data for two sets of variables, a bi-allelic marker (set 1, with one variable) and k traits (set 2). We use canonical correlation analysis (CCA), which is a multivariate generalization of the Pearson product-moment correlation (Hotelling, 1936), to measure the association between the two sets of variables. Specifically, CCA extracts the linear combination of traits that explain the largest possible amount of the covariation between the marker and all traits. Although this approach is most appropriate for the analysis of normally distributed traits, as we show below, it shows good performance even when considering non-normal or disease traits.

Prior to the actual CCA, the marker is coded according to an allelic dosage scheme that can incorporate dominance; our approach can also be extended to the analysis of multiple markers by expanding the first set of variables to include more than one marker. Missing phenotype data are handled either by case-wise deletion (if data are missing above a pre-defined per-individual missingness threshold) or mean imputation (i.e. a missing phenotype is replaced by the sample mean). The test is based on Wilk's lambda (λ) and the corresponding _F_-approximation, both simplified to the situation where one of the sets contains only one variable (the marker). Specifically, formula⁠, where formula is the canonical correlation between the marker and the k traits, calculated as the square root of the eigenvalue of formula⁠. In the latter, _S_11 is the marker variance, _S_22 the k × k trait covariance matrix, while _S_12 and _S_21 are the 1 × k (or k × 1) covariance matrices between the marker and the k traits. The simplified _F_-approximation is:

formula

The interpretation of a significant multivariate test is aided by the inspection of the weights attributed by the CCA to each phenotype.

To investigate the performance of the proposed method, we simulated data for five quantitative traits (60% heritability each) and one bi-allelic locus (20% minor allele frequency). The QTL explained 0.5% of the total variance of 0 (to assess type-I error rate), 1, 2, 3, 4 or all 5 traits. We considered residual cross-trait correlations (i.e. excluding the QTL effect) of 0.2, 0.4 or 0.6 (with the same sign as the QTL-induced correlation), and simulated data for 600 unrelated individuals. We compared the power of the multivariate approach against two univariate strategies: first, we tested each trait individually using linear regression, selected the most significant test and correct this for multiple testing through the analysis of 100 permuted datasets. Each dataset was generated by randomly permuting the genotypes between individuals, thus preserving the original cross-trait correlations. For the second strategy, a simple Bonferroni correction based on the number of traits analyzed was applied to the most significant univariate test. Power and type-I error (nominal α = 0.05) for each model were based on the analysis of 1000 and 5000 replicates, respectively. This same procedure was used to test the performance when analyzing 10 and 20 traits.

The simulation results are shown in Figure 1. When considering five traits with a modest residual cross-trait correlation (r = 0.2), the power of the multivariate test was comparable to both univariate strategies considered, with the advantage that no permutation testing or Bonferroni adjustment was required to correct for multiple testing. As the number of traits, or the residual correlation between traits, increased, the power of multivariate test improved, consistently outperforming the univariate approaches. The exception was for the extreme models where all traits were associated with the QTL; in this case, the power of the multivariate test decreased as the residual correlation between traits increased. This observation is consistent with previous reports (Allison et al., 1998; Amos et al., 2001; Evans and Duffy, 2004; Ferreira et al., 2006) and was specific to the situation where the QTL-induced trait correlation was of the same sign as the correlation induced by residual shared factors (Supplementary Fig. 1).

Performance of the multivariate test of association. Type-I error and power are also shown for two univariate strategies which correct for multiple testing through permutation or simple Bonferroni correction. r indicates, residual trait correlation. See text for details.

Fig. 1.

Performance of the multivariate test of association. Type-I error and power are also shown for two univariate strategies which correct for multiple testing through permutation or simple Bonferroni correction. r indicates, residual trait correlation. See text for details.

The multivariate test maintained appropriate type-I error when some or all traits tested were continuous but not normally distributed (Supplementary Fig. 2) or were measured on a discrete scale (Supplementary Fig. 3).

Finally, we also extended this approach to the analysis of family-based data. Briefly, prior to CCA, each individual's genotype is partitioned into the orthogonal between- (B) and within-family (W) components (Fulker et al., 1999). We then perform CCA using the k traits and either the B (between-family association test), W (within-family association test, which is robust to population stratification effects) or the B + W (total association test) genotype scores, and use an adaptive permutation procedure to account for family structure. Simulations show that when applied to family data, this approach also has appropriate type-I error and improved power when compared to the univariate strategy (Supplementary Fig. 4).

In conclusion, we propose a robust and powerful test of association that can accommodate multiple phenotypes and different study designs. As such, it can be relevant to many genetic association studies of complex traits or diseases.

Funding: Sidney Sax Fellowship from the National Health and Medical Research Council of Australia (to M.A.R.F.).

Conflict of Interest: none declared.

References

et al.

Multiple phenotype modeling in gene-mapping studies of quantitative traits: power advantages

,

Am. J. Hum. Genet.

,

1998

, vol.

63

(pg.

1190

-

1201

)

et al.

Comparison of multivariate tests for genetic linkage

,

Hum. Hered.

,

2001

, vol.

51

(pg.

133

-

144

)

A simulation study concerning the effect of varying the residual phenotypic correlation on the power of bivariate quantitative trait loci linkage analysis

,

Behav. Genet.

,

2004

, vol.

34

(pg.

135

-

141

)

et al.

A simple method to localise pleiotropic susceptibility loci using univariate linkage analyses of correlated traits

,

Eur. J. Hum. Genet.

,

2006

, vol.

14

(pg.

953

-

962

)

et al.

Combined linkage and association sib-pair analysis for quantitative traits

,

Am. J. Hum. Genet.

,

1999

, vol.

64

(pg.

259

-

267

)

Relations between two sets of variables

,

Biometrika

,

1936

, vol.

28

(pg.

321

-

377

)

et al.

PLINK: a tool set for whole-genome association and population-based linkage analyses

,

Am. J. Hum. Genet.

,

2007

, vol.

81

(pg.

559

-

575

)

Author notes

Associate Editor: Alex Bateman

© The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 4,542

3,432 Pageviews

1,110 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 6
December 2016 2
January 2017 12
February 2017 36
March 2017 37
April 2017 25
May 2017 23
June 2017 19
July 2017 20
August 2017 22
September 2017 20
October 2017 28
November 2017 21
December 2017 48
January 2018 35
February 2018 39
March 2018 68
April 2018 54
May 2018 61
June 2018 49
July 2018 58
August 2018 49
September 2018 41
October 2018 23
November 2018 45
December 2018 35
January 2019 46
February 2019 40
March 2019 57
April 2019 48
May 2019 62
June 2019 57
July 2019 67
August 2019 55
September 2019 62
October 2019 57
November 2019 33
December 2019 52
January 2020 53
February 2020 42
March 2020 33
April 2020 40
May 2020 22
June 2020 52
July 2020 19
August 2020 31
September 2020 39
October 2020 49
November 2020 48
December 2020 49
January 2021 49
February 2021 67
March 2021 66
April 2021 59
May 2021 44
June 2021 58
July 2021 58
August 2021 51
September 2021 55
October 2021 65
November 2021 65
December 2021 63
January 2022 81
February 2022 72
March 2022 69
April 2022 71
May 2022 61
June 2022 73
July 2022 89
August 2022 70
September 2022 70
October 2022 111
November 2022 98
December 2022 39
January 2023 35
February 2023 78
March 2023 38
April 2023 43
May 2023 39
June 2023 33
July 2023 42
August 2023 26
September 2023 30
October 2023 52
November 2023 41
December 2023 33
January 2024 80
February 2024 37
March 2024 66
April 2024 45
May 2024 49
June 2024 39
July 2024 35
August 2024 36
September 2024 24
October 2024 39
November 2024 9

Citations

183 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic