Affecteds-only linkage methods are not a panacea (original) (raw)
Related papers
The Power to Detect Linkage in Complex Disease by Means of Simple LOD-Score Analyses
The American Journal of Human Genetics, 1998
Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are x80% of the ELOD under the true model when the ELOD for the true model is x3. Similarly, the power to reach a given LOD score was usually x80% that of the true model, when the power under the true model was x60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.
The American Journal of Human Genetics, 1999
Several methods have been proposed for linkage analysis of complex traits with unknown mode of inheritance. These methods include the LOD score maximized over disease models (MMLS) and the "nonparametric" linkage (NPL) statistic. In previous work, we evaluated the increase of type I error when maximizing over two or more genetic models, and we compared the power of MMLS to detect linkage, in a number of complex modes of inheritance, with analysis assuming the true model. In the present study, we compare MMLS and NPL directly. We simulated 100 data sets with 20 families each, using 26 generating models: (1) 4 intermediate models (penetrance of heterozygote between that of the two homozygotes); (2) 6 two-locus additive models; and (3) 16 two-locus heterogeneity models (admixture a = 1.0, .7, .5, and .3; a = 1.0 replicates simple Mendelian models). For LOD scores, we assumed dominant and recessive inheritance with 50% penetrance. We took the higher of the two maximum LOD scores and subtracted 0.3 to correct for multiple tests (MMLS-C). We compared expected maximum LOD scores and power, using MMLS-C and NPL as well as the true model. Since NPL uses only the affected family members, we also performed an affecteds-only analysis using MMLS-C. The MMLS-C was both uniformly more powerful than NPL for most cases we examined, except when linkage information was low, and close to the results for the true model under locus heterogeneity. We still found better power for the MMLS-C compared with NPL in affecteds-only analysis. The results show that use of two simple modes of inheritance at a fixed penetrance can have more power than NPL when the trait mode of inheritance is complex and when there is heterogeneity in the data set.
A New Method for Assessing How Sensitivity and Specificity of Linkage Studies Affects Estimation
PLoS ONE, 2014
Background: While the importance of record linkage is widely recognised, few studies have attempted to quantify how linkage errors may have impacted on their own findings and outcomes. Even where authors of linkage studies have attempted to estimate sensitivity and specificity based on subjects with known status, the effects of false negatives and positives on event rates and estimates of effect are not often described.
Approaches based on linear mixed models (LMMs) have recently gained popularity for modelling population substructure and relatedness in genome-wide association studies. In the last few years, a bewildering variety of different LMM methods/software packages have been developed, but it is not always clear how (or indeed whether) any newly-proposed method differs from previously-proposed implementations. Here we compare the performance of several LMM approaches (and software implementations, including EMMAX, GenABEL, FaST-LMM, Mendel, GEMMA and MMM) via their application to a genome-wide association study of visceral leishmaniasis in 348 Brazilian families comprising 3626 individuals (1972 genotyped). The implementations differ in precise details of methodology implemented and through various user-chosen options such as the method and number of SNPs used to estimate the kinship (relatedness) matrix. We investigate sensitivity to these choices and the success (or otherwise) of the approaches in controlling the overall genome-wide error-rate for both real and simulated phenotypes. We compare the LMM results to those obtained using traditional family-based association tests (based on transmission of alleles within pedigrees) and to alternative approaches implemented in the software packages MQLS, ROADTRIPS and MASTOR. We find strong concordance between the results from different LMM approaches, and all are successful in controlling the genome-wide error rate (except for some approaches when applied naively to longitudinal data with many repeated measures). We also find high correlation between LMMs and alternative approaches (apart from transmission-based approaches when applied to SNPs with small or non-existent effects). We conclude that LMM approaches perform well in comparison to competing approaches. Given their strong concordance, in most applications, the choice of precise LMM implementation cannot be based on power/type I error considerations but must instead be based on considerations such as speed and ease-of-use.
Human heredity, 2011
Multipoint (MP) linkage analysis represents a valuable tool for whole-genome studies but suffers from the disadvantage that its probability distribution is unknown and varies as a function of marker information and density, genetic model, number and structure of pedigrees, and the affection status distribution [Xing and Elston: Genet Epidemiol 2006;30:447-458; Hodge et al.: Genet Epidemiol 2008;32:800-815]. This implies that the MP significance criterion can differ for each marker and each dataset, and this fact makes planning and evaluation of MP linkage studies difficult. One way to circumvent this difficulty is to use simulations or permutation testing. Another approach is to use an alternative statistical paradigm to assess the statistical evidence for linkage, one that does not require computation of a p value. Here we show how to use the evidential statistical paradigm for planning, conducting, and interpreting MP linkage studies when the disease model is known (lod analysis) ...
Genetic association tests: a method for the joint analysis of family and case-control data
Human Genomics, 2009
With the trend in molecular epidemiology towards both genome-wide association studies and complex modelling, the need for large sample sizes to detect small effects and to allow for the estimation of many parameters within a model continues to increase. Unfortunately, most methods of association analysis have been restricted to either a family-based or a case-control design, resulting in the lack of synthesis of data from multiple studies. Transmission disequilibrium-type methods for detecting linkage disequilibrium from family data were developed as an effective way of preventing the detection of association due to population stratification. Because these methods condition on parental genotype, however, they have precluded the joint analysis of family and casecontrol data, although methods for case-control data may not protect against population stratification and do not allow for familial correlations. We present here an extension of a family-based association analysis method for continuous traits that will simultaneously test for, and if necessary control for, population stratification. We further extend this method to analyse binary traits (and therefore family and case-control data together) and accurately to estimate genetic effects in the population, even when using an ascertained family sample. Finally, we present the power of this binary extension for both family-only and joint family and case-control data, and demonstrate the accuracy of the association parameter and variance components in an ascertained family sample.
Optimal Ascertainment Strategies to Detect Linkage to Common Disease Alleles. Authors' Reply
The American Journal of Human Genetics, 1999
Traditionally, extended pedigrees with many affected individuals have been studied for the purpose of detection of linkage. For traits caused by a rare susceptibility allele, this is a productive strategy. However, this sampling strategy may not work well for traits determined by multiple loci in which one or more have common susceptibility alleles. We simulated three single-additive-locus models of inheritance and two-locus models with additive or multiplicative interactions, all with rare or common susceptibility alleles. A trait locus was linked, with no recombination, to a marker locus with four equally frequent alleles. Family structure varied, but the total number of affected individuals was held constant. Two generations of individuals were genotyped. We used three nonparametric affected-sib-pair programs and two nonparametric pedigree-analysis programs to perform linkage analysis. For single-locus, additive, and multiplicative models, we found that, when the susceptibility allele was rare, (frequency .0025), extended pedigrees with first or second cousins had the most power for detection of linkage. However, when the susceptibility allele was common in the single-locus, additive, and multiplicative two-locus models (frequency .25), extended pedigrees were no more powerful than nuclear families. There was also a decrease in power when the pedigrees had a greater number of affected individuals, more so for the single-locus and multiplicative models than for the additive model. We conclude that for single-locus, additive, and multiplicative models of qualitative traits with common alleles, there is no benefit to the collection of extended pedigrees, and there may be a loss of power in the collection of pedigrees with many affected individuals.
Genetic Epidemiology, 2008
We investigate the behavior of type I error rates in model-based multipoint (MP) linkage analysis, as a function of sample size (N). We consider both MP lods (i.e., MP linkage analysis that uses the correct genetic model) and MP mods (maximizing MP lods over 18 dominant and recessive models). Following Xing & Elston [2006], we first consider MP linkage analysis limited to a single position; then we enlarge the scope and maximize the lods and mods over a span of positions. In all situations we examined, type I error rates decrease with increasing sample size, apparently approaching zero. We show: (a) For MP lods analyzed only at a single position, wellknown statistical theory predicts that type I error rates approach zero. (b) For MP lods and mods maximized over position, this result has a different explanation, related to the fact that one maximizes the scores over only a finite portion of the parameter range.