Covariance components models for longitudinal family data (original) (raw)
Related papers
Twin Research and Human Genetics, 2019
Large multigenerational cohort studies offer powerful ways to study the hereditary effects on various health outcomes. However, accounting for complex kinship relations in big data structures can be methodologically challenging. The traditional kinship model is computationally infeasible when considering thousands of individuals. In this article, we propose a computationally efficient alternative that employs fractional relatedness of family members through a series of founding members. The primary goal of this study is to investigate whether the effect of determinants on health outcome variables differs with and without accounting for family structure. We compare a fixed-effects model without familial effects with several variance components models that account for heritability and shared environment structure. Our secondary goal is to apply the fractional relatedness model in a realistic setting. Lifelines is a three-generation cohort study investigating the biological, behavioral...
gazi university journal of science, 2018
In genetic epidemiology studies, many diseases are multifactorial that can be both environmental and genetic inherited pattern. The relationship between genetic variability and individual phenotypes is usually investigated by genetic association studies. In genetic association studies, longitudinal measures are very important scale in detecting disease variants. They enable to observe both factors in the progress of disease. Generalized Linear Modelling (GLM) techniques offer a flexible approach for testing and quantifying genetic associations considering different types of phenotype distributions. In this study, it is aimed to accommodate Generalized Estimating Equations (GEE) method for genetic association studies in the presence of both familial and serial correlation. For this purpose, a real genotyped data set with the pedigree information and a continuous trait measured over time is used to model the association between the disease and the genotype by analyzing several variant...
Covariance component models for multivariate binary traits in family data analysis
Statistics in Medicine, 2008
Family data are used extensively in quantitative genetic studies to disentangle the genetic and environmental contributions to various diseases. Many family studies based their analysis on population-based registers containing a large number of individuals composed of small family units. For binary trait analyses, exact marginal likelihood is a common approach, but, due to the computational demand of the enormous data sets, it allows only a limited number of effects in the model. This makes it particularly difficult to perform joint estimation of variance components for a binary trait and the potential confounders. We have developed a data-reduction method of ascertaining informative families from population-based family registers. We propose a scheme where the ascertained families match the full cohort with respect to some relevant statistics, such as the risk to relatives of an affected individual. The ascertainment-adjusted analysis, which we implement using a pseudo-likelihood approach, is shown to be efficient relative to the analysis of the whole cohort and robust to mis-specification of the random effect distribution. Keywords Segregation analysis Á Mixed models Á Variance components Á Probit models Introduction Family data have been used extensively for complex genetic modelling such as quantitative-trait linkage (e.g., Amos 1994; Blangero et al. 2001) or segregation analysis to separate genetic and environmental contributions to non-Mendelian diseases (e.g., Falconer 1965; Mather and Jinks 1977; Neale and Cardon 1992). Other than overcoming the sample size problem associated with twin studies, especially when the disease of interest has a low prevalence, family data potentially provide richer genetic information (e.g., Pawitan et al. 2004). However, this information is likely to be concentrated in 'genetically loaded' families, so that it is not efficient to collect data from, nor to analyse, all families from a population register. Non-random ascertainment is commonly used in genetics research to maximize the amount of information in the data for a given sample size (e.g., Elston and Sobel 1979). One of the most common methods of non-random ascertainment is to include families with at least one affected member: for variance component models this has been suggested, for example, in deAndrade and Amos (2000), Epstein et al. (2002), and Burton (2003). However, this sampling scheme may not be optimal, and in fact it has been shown (Glidden and Liang 2002; Noh et al. 2005) that the analysis of the ascertained data is sensitive to mis-specification of the random-effect distribution. In this paper we develop an efficient and robust method of ascertaining informative families from population-based family registers for the purpose of complex genetic modeling involving variance component analysis of a binary trait. In epidemiological analyses, we often need or wish to account for confounding factors. For variance component analysis of a binary trait, the most straightforward way to adjust for potential confounders is to include them as Edited by Stacey Cherny.
Disease-specific prospective family study cohorts enriched for familial risk
Epidemiologic Perspectives & Innovations, 2011
Most common diseases demonstrate familial aggregation; the ratio of the risk for relatives of affected people to the risk for relatives of unaffected people (the familial risk ratio)) > 1. This implies there are underlying genetic and/or environmental risk factors shared by relatives. The risk gradient across this underlying 'familial risk profile', which can be predicted from family history and measured familial risk factors, is typically strong. Under a multiplicative model, the ratio of the risk for people in the upper 25% of familial risk to the risk for those in the lower 25% (the inter-quartile risk gradient) is an order of magnitude greater than the familial risk ratio. If familial risk ratio = 2 for first-degree relatives, in terms of familial risk profile: (a) people in the upper quartile will be at more than 20 times the risk of those in the lower quartile; and (b) about 90% of disease will occur in people above the median. Historically, therefore, epidemiology ...
Longitudinal data analysis in pedigree studies
Genetic Epidemiology, 2003
Longitudinal family studies provide a valuable resource for investigating genetic and environmental factors that influence long-term averages and changes over time in a complex trait. This paper summarizes 13 contributions to Genetic Analysis Workshop 13, which include a wide range of methods for genetic analysis of longitudinal data in families. The methods can be grouped into two basic approaches: 1) two-step modeling, in which repeated observations are first reduced to one summary statistic per subject (e.g., a mean or slope), after which this statistic is used in a standard genetic analysis, or 2) joint modeling, in which genetic and longitudinal model parameters are estimated simultaneously in a single analysis. In applications to Framingham Heart Study data, contributors collectively reported evidence for genes that affected trait mean on chromosomes 1, 2, 3, 5, 8, 9, 10, 13, and 17, but most did not find genes affecting slope. Applications to simulated data suggested that even for a gene that only affected slope, use of a mean-type statistic could provide greater power than a slope-type statistic for detecting that gene. We report on the results of a small experiment that sheds some light on this apparently paradoxical finding, and indicate how one might form a more powerful test for finding a slope-affecting gene. Several areas for future research are discussed. Genet Epidemiol 25 (Suppl. 1):S18-S28,
A Likelihood Ratio Approach to Family-based Association Studies with Covariates
Annals of Human Genetics, 2006
We introduce a procedure for association based analysis of nuclear families that allows for dichotomous and more general measurements of phenotype and inclusion of covariate information. Standard generalized linear models are used to relate phenotype and its predictors. Our test procedure, based on the likelihood ratio, unifies the estimation of all parameters through the likelihood itself and yields maximum likelihood estimates of the genetic relative risk and interaction parameters. Our method has advantages in modelling the covariate and gene-covariate interaction terms over recently proposed conditional score tests that include covariate information via a two-stage modelling approach. We apply our method in a study of human systemic lupus erythematosus and the C-reactive protein that includes sex as a covariate.
2021
When quantitative longitudinal traits are risk factors for disease progression and subject to random biological variation, joint model analysis of time-to-event and longitudinal traits can effectively identify direct and/or indirect genetic association of single nucleotide polymorphisms (SNPs) with time-to-event. We present a joint model that integrates:i)a multivariate linear mixed model describing trajectories of multiple longitudinal traits as a function of time, SNP effects, and subject-specific random effects, andii)a frailty Cox survival model that depends on SNPs, longitudinal trajectory effects, and subject-specific frailty accounting for dependence among multiple time-to-event traits. Motivated by complex genetic architecture of type 1 diabetes complications (T1DC) observed in the Diabetes Control and Complications Trial (DCCT), we implement a two-stage approach to inference with bootstrap joint covariance estimation and develop a hypothesis testing procedure to classify di...
Models of population-based analyses for data collected from large extended families
European Journal of Epidemiology, 2010
Large studies of extended families usually collect valuable phenotypic data that may have scientific value for purposes other than testing genetic hypotheses if the families were not selected in a biased manner. These purposes include assessing population-based associations of diseases with risk factors/covariates and estimating population characteristics such as disease prevalence and incidence. Relatedness among participants however, violates the traditional assumption of independent observations in these classic analyses. The commonly used adjustment method for relatedness in population-based analyses is to use marginal models, in which clusters (families) are assumed to be independent (unrelated) with a simple and identical covariance (family) structure such as those called independent, exchangeable and unstructured covariance structures. However, using these simple covariance structures may not be optimally appropriate for outcomes collected from large extended families, and may under-or overestimate the variances of estimators and thus lead to uncertainty in inferences. Moreover, the assumption that families are unrelated with an identical family structure in a marginal model may not be satisfied for family studies with large extended families. The aim of this paper is to propose models incorporating marginal models approaches with a covariance structure for assessing population-based associations of diseases with their risk factors/covariates and estimating population characteristics for epidemiological studies while adjusting for the complicated relatedness among outcomes (continuous/categorical, normally/non-normally distributed) collected from large extended families. We also discuss theoretical issues of the proposed models and show that the proposed models and covariance structure are appropriate for and capable of achieving the aim.
Bivariate association analysis of longitudinal phenotypes in families
Statistical genetic methods incorporating temporal variation allow for greater understanding of genetic architecture and consistency of biological variation influencing development of complex diseases. This study proposes a bivariate association method jointly testing association of two quantitative phenotypic measures from different time points. Measured genotype association was analyzed for single-nucleotide polymorphisms (SNPs) for systolic blood pressure (SBP) from the first and third visits using 200 simulated Genetic Analysis Workshop 18 (GAW18) replicates. Bivariate association, in which the effect of an SNP on the mean trait values of the two phenotypes is constrained to be equal for both measures and is included as a covariate in the analysis, was compared with a bivariate analysis in which the effect of an SNP was estimated separately for the two measures and univariate association analyses in 9 SNPs that explained greater than 0.001% SBP variance over all 200 GAW18 replicates.The SNP 3_48040283 was significantly associated with SBP in all 200 replicates with the constrained bivariate method providing increased signal over the unconstrained bivariate method. This method improved signal in all 9 SNPs with simulated effects on SBP for nominal significance (p-value <0.05). However, this appears to be determined by the effect size of the SNP on the phenotype. This bivariate association method applied to longitudinal data improves genetic signal for quantitative traits when the effect size of the variant is moderate to large.