Statistical power of expression quantitative trait loci for mapping of complex trait loci in natural populations - PubMed (original) (raw)

Statistical power of expression quantitative trait loci for mapping of complex trait loci in natural populations

Paul Schliekelman. Genetics. 2008 Apr.

Abstract

A number of recent genomewide surveys have found numerous QTL for gene expression, often with intermediate to high heritability values. As a result, there is currently a great deal of interest in genetical genomics--that is, the combination of genomewide expression data and molecular marker data to elucidate the genetics of complex traits. To date, most genetical genomics studies have focused on generating candidate genes for previously known trait loci or have otherwise leveraged existing knowledge about trait-related genes. The purpose of this study is to explore the potential for genetical genomics approaches in the context of genomewide scans for complex trait loci. I explore the expected strength of association between expression-level traits and a clinical trait, as a function of the underlying genetic model in natural populations. I give calculations of statistical power for detecting differential expression between affected and unaffected individuals. I model both reactive and causative expression-level traits with both additive and multiplicative multilocus models for the relationship between phenotype and genotype and explore a variety of assumptions about dominance, number of segregating loci, and other parameters. There are two key results. If a transcript is causative for the disease (in the sense that disease risk depends directly on transcript level), then the power to detect association between transcript and disease is quite good. Sample sizes on the order of 100 are sufficient for 80% power. On the other hand, if the transcript is reactive to a disease locus, then the correlation between expression-level traits and disease is low unless the expression-level trait shares several causative loci with the disease--that is, the expression-level trait itself is a complex trait. Thus, there is a trade-off between the power to show association between a reactive expression-level trait and the clinical trait of interest and the power to map expression-level QTL (eQTL) for that expression-level trait. Gene expression-level traits that are most strongly correlated with the clinical trait will themselves be complex traits and therefore often hard to map. Likewise, the expression-level traits that are easiest to map will tend to have a low correlation with the clinical trait. These results show some fundamental principles for understanding power in eQTL-based mapping studies.

PubMed Disclaimer

Figures

Figure 1.—

Plot of power vs. L (number of disease loci) and h (the dominance coefficient). The power is the probability of rejecting the null hypothesis of no difference in expression between the treatments. Parameter values are π = 0, δ = 1, K = 0.01, and formula image , and the sample size is 100 for each treatment. In both plots and , where is the mean expression level for genotype with i disease alleles at the TCL. The dominance coefficient for expression _h_E is 0 in a and 0.5 in b. The number of genes tested on the microarray was assumed to be 20,000.

Figure 2.—

Plot of power vs. c (number of loci controlling expression-level trait). The _y_-axis is the probability of detecting a significant expression-level difference between affected and unaffecteds. The different curves correspond to the value of L as shown in a (the order is the same for b–d). The power is the probability of rejecting the null hypothesis of no difference in expression between the treatments. In a and b the expression-level trait has an additive dependence on genotype, while there is a multiplicative dependence in c and d. The number of genes tested on the microarray was assumed to be 20,000. Parameter values are π = 0, δ = 1, L = 9, K = 0.01, formula image , , and . The parameter h = 0.5 in a and c and h = 0.9 in b and d. The sample size is 100 for each treatment.

Figure 3.—

Effect of M. The _z_-axis is the probability of detecting a significant expression-level difference between affecteds and unaffecteds for at least one of the M expression-level traits plotted against the parameters c and M. Parameter values are otherwise the same as those in Figure 2. The power calculations are based on simulations with 500 repetitions.

Figure 4.—

Plot of power vs. sample size. The _y_-axis is the probability of detecting a significant expression-level difference between affecteds and unaffecteds. The different curves in a correspond to different values of c as shown. The curves in b correspond to values of M (first number) and c (second number). Parameter values are otherwise the same as those in Figure 2. The power curves in b are based on simulations with 100 repetitions. The waviness in the curves occurs because of the resulting sampling error in the simulations.

Figure 5.—

The effect of penetrance parameters. The _y_-axis is the probability of detecting a significant expression-level difference between affecteds and unaffecteds. The curves correspond to different values of the penetrance parameters π and δ as shown. π = 0 and δ = 1 except when otherwise indicated. Parameter values are otherwise the same as those in Figure 2.

Figure 6.—

The effect of the heritability of expression level. The _y_-axis is the probability of detecting a significant expression-level difference between affecteds and unaffecteds. The _x_-axis is the value of the parameter formula image . The curves correspond to different values L and c as shown. Parameter values are otherwise the same as those in Figure 2.

Figure 7.—

Power for the additive penetrance model. The _y_-axis is the probability of detecting a significant expression-level difference between affecteds and unaffecteds with an additive penetrance model. The _x_-axis in a is the sample size and the _x_-axis in b is the parameter M. The value of L is 4 in b and as shown in a. The sample size in b is 100 in each group. Parameter values are otherwise the same as those in Figure 2. The power curves in b are based on simulations and are approximate. The number of repetitions was 500. The waviness in the curves is due to the sampling error in the simulations.

Figure 8.—

Power for causative ELT. The _y_-axis is the probability of detecting a significant expression-level difference between affecteds and unaffecteds for a causative ELT as described in the text. The _x_-axis in a is the sample size and the _x_-axis in b is the threshold Q. The sample size in b is 100 in each group. Parameter values are otherwise the same as those in Figure 2.

Figure 9.—

Power to map eQTL using Haseman–Elston regression. Power was calculated using a Monte Carlo simulation with 500 replicates. The marker being tested is assumed to be completely linked to one of the c controlling loci for the expression-level trait. The _y_-axis shows the power to detect this marker as being linked to the expression-level trait. In a, the parameter M = 1 and the sample size is shown for each curve. In b, the sample size is 500 sib pairs and the value of M is shown for each curve. A significance threshold of α = 0.05/50 = 0.001 was applied. Parameter values are otherwise the same as those in Figure 2.

Cited by

Functional characterization of human genomic variation linked to polygenic diseases.
Fabo T, Khavari P. Fabo T, et al. Trends Genet. 2023 Jun;39(6):462-490. doi: 10.1016/j.tig.2023.02.014. Epub 2023 Mar 28. Trends Genet. 2023. PMID: 36997428 Free PMC article. Review.
Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease.
Castaldi PJ, Abood A, Farber CR, Sheynkman GM. Castaldi PJ, et al. Hum Mol Genet. 2022 Oct 20;31(R1):R123-R136. doi: 10.1093/hmg/ddac196. Hum Mol Genet. 2022. PMID: 35960994 Free PMC article. Review.
ASEP: Gene-based detection of allele-specific expression across individuals in a population by RNA sequencing.
Fan J, Hu J, Xue C, Zhang H, Susztak K, Reilly MP, Xiao R, Li M. Fan J, et al. PLoS Genet. 2020 May 11;16(5):e1008786. doi: 10.1371/journal.pgen.1008786. eCollection 2020 May. PLoS Genet. 2020. PMID: 32392242 Free PMC article.
Alzheimer Disease Pathology-Associated Polymorphism in a Complex Variable Number of Tandem Repeat Region Within the MUC6 Gene, Near the AP2A2 Gene.
Katsumata Y, Fardo DW, Bachstetter AD, Artiushin SC, Wang WX, Wei A, Brzezinski LJ, Nelson BG, Huang Q, Abner EL, Anderson S, Patel I, Shaw BC, Price DA, Niedowicz DM, Wilcock DW, Jicha GA, Neltner JH, Van Eldik LJ, Estus S, Nelson PT. Katsumata Y, et al. J Neuropathol Exp Neurol. 2020 Jan 1;79(1):3-21. doi: 10.1093/jnen/nlz116. J Neuropathol Exp Neurol. 2020. PMID: 31748784 Free PMC article.
Postmortem brain tissue as an underutilized resource to study the molecular pathology of neuropsychiatric disorders across different ethnic populations.
Vornholt E, Luo D, Qiu W, McMichael GO, Liu Y, Gillespie N, Ma C, Vladimirov VI. Vornholt E, et al. Neurosci Biobehav Rev. 2019 Jul;102:195-207. doi: 10.1016/j.neubiorev.2019.04.015. Epub 2019 Apr 24. Neurosci Biobehav Rev. 2019. PMID: 31028758 Free PMC article. Review.

References

1. Altmuller, J., L. J. Palmer, G. Fischer, H. Scherb and M. Wjst, 2001. Genomewide scans of complex human diseases: true linkage is hard to find. Am. J. Hum. Genet. 69 936–950. - PMC - PubMed
1. Benjamini, Y., and Y. Hochberg, 1995. Controlling the false discovery rate - a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57 289–300.
1. Berge, K. E., H. Tian, G. A. Graf, L. Yu, N. V. Grishin et al., 2000. Accumulation of dietary cholesterol in sitosterolemia caused by mutations in adjacent ABC transporters. Science 290 1771–1775. - PubMed
1. Brem, R. B., and L. Kruglyak, 2005. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102 1572–1577. - PMC - PubMed
1. Brem, R. B., G. Yvert, R. Clinton and L. Kruglyak, 2002. Genetic dissection of transcriptional regulation in budding yeast. Science 296 752–755. - PubMed

MeSH terms

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Statistical power of expression quantitative trait loci for mapping of complex trait loci in natural populations - PubMed (original) (raw)