Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests - PubMed (original) (raw)
Comparative Study
Detecting heritable phenotypes without a model using fast permutation testing for heritability and set-tests
Regev Schweiger et al. Nat Commun. 2018.
Abstract
Testing for association between a set of genetic markers and a phenotype is a fundamental task in genetic studies. Standard approaches for heritability and set testing strongly rely on parametric models that make specific assumptions regarding phenotypic variability. Here, we show that resulting p-values may be inflated by up to 15 orders of magnitude, in a heritability study of methylation measurements, and in a heritability and expression quantitative trait loci analysis of gene expression profiles. We propose FEATHER, a method for fast permutation-based testing of marker sets and of heritability, which properly controls for false-positive results. FEATHER eliminated 47% of methylation sites found to be heritable by the parametric test, suggesting a substantial inflation of false-positive findings by alternative methods. Our approach can rapidly identify heritable phenotypes out of millions of phenotypes acquired via high-throughput technologies, does not suffer from model misspecification and is highly efficient.
Conflict of interest statement
R.S. is an employee of MyHeritage Ltd. The remaining authors declare no competing interests.
Figures
Fig. 1
Discrepancy in _p_-values in a methylation study. _p_-values from 10,000 permutations, compared to GCTA _p_-values assuming asymptotics (in log scale). Evaluated on 431,366 methylation sites on all autosomal chromosomes, from the KORA dataset, with 1799 individuals, and with sex, age, and smoking status as covariates. Sites with ĥ2=0 or with a parametric p < 10−20 omitted for clarity of presentation, with 99.995% confidence intervals (CIs) shown. Parametric _p_-values are often smaller than the exact _p_-values obtained by the permutation test, frequently by several orders of magnitude, resulting in many false positives
Fig. 2
Discrepancy in _p_-values in a cis-eQTL study. _p_-values from 10,000 permutations, compared to GCTA _p_-values assuming asymptotics (in log scale). Evaluated on 22,171 gene expression profiles in whole-blood samples, from the GTEx dataset, with 338 individuals. Sites with ĥ2=0 (8604 profiles) omitted for clarity of presentation, with 99.995% CIs. Parametric _p_-values are often smaller than the exact _p_-values obtained by the permutation test, frequently by several orders of magnitude, resulting in many false positives
Fig. 3
Discrepancy in _p_-values, with quantile normalization. _p_-values after quantile normalization, from 10,000 permutations, compared to GCTA _p_-values assuming asymptotics (in log scale). Parametric _p_-values show large discrepancies compared to the exact _p_-values obtained by the permutation test, frequently by several orders of magnitude, resulting in many false positives and negatives
Fig. 4
Performance of SAMC. _p_-values from 10,000 permutations, compared to SAMC _p_-values with _t_0 = 1,000 and 1,000,000 permutations (in log scale). Evaluated on 7989 methylation sites on chromosome 22, from the KORA dataset. Sites with ĥ2=0 (3,779 sites), omitted for clarity of presentation, showing a total of 4210 sites, with 99.95% CIs shown. SAMC is well calibrated
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical