Jinko Graham - Academia.edu (original) (raw)
Papers by Jinko Graham
Family-based sequencing studies are increasingly used to find rare genetic variants of high risk ... more Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants. Since different families can harbor different causal variants and each family harbors many rare variants, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, e.g., pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent-child trios. Extending this idea to families, we propose methods to prioritize rare variants shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known pr...
Frontiers in Pediatrics, 2018
Objectives: Chronic primary systemic vasculitidies (CPV) are a collection of rare diseases involv... more Objectives: Chronic primary systemic vasculitidies (CPV) are a collection of rare diseases involving inflammation in blood vessels, often in multiple organs. CPV can affect adults and children and may be life-or organ-threatening. Treatments for adult CPV, although effective, have known severe potential toxicities; safety and efficacy of these drugs in pediatric patients is not fully understood. There is an unmet need for biologic measures to assess the level of disease activity and, in turn, inform treatment choices for stopping, starting, or modifying therapy. This observational study determines if S100 calcium-binding protein A12 (S100A12) and common inflammatory indicators are sensitive markers of disease activity in children and adolescents with CPV that could be used to inform a minimal effective dose of therapy. Methods: Clinical data and sera were collected from 56 participants with CPV at study visits from diagnosis to remission. Serum concentrations of S100A12, C-reactive protein (CRP) and hemoglobin (Hb) as well as whole blood cell counts and erythrocyte sedimentation rate (ESR) were measured. Disease activity was inferred by physician's global assessment (PGA) and the pediatric vasculitis activity score (PVAS). Results: Serum concentrations of standard markers of inflammation (ESR, CRP, Hb, absolute blood neutrophil count), and S100A12 track with clinically assessed disease activity. These measures-particularly neutrophil counts and sera concentrations of S100A12-had the most significant correlation with clinical scores of disease activity Brown et al. Tracking Inflammation in Childhood CPV in those children with vasculitis that is associated with anti-neutrophil cytoplasmic antibodies (ANCA) against proteinase 3. Conclusions: S100A12 and neutrophil counts should be considered in the assessment of disease activity in children with CPV particularly the most common forms of the disease that involve proteinase 3 ANCA. Key messages:-In children with chronic primary systemic vasculitis (CPV), classical measures of inflammation are not formally considered in scoring of disease activity.-Inflammatory markers-specifically S100A12 and neutrophil count-track preferentially with the most common forms of childhood CPV which affect small to medium sized vessels and involve anti neutrophil cytoplasmic antibodies (ANCA) against proteinase-3.
bioRxiv (Cold Spring Harbor Laboratory), Oct 18, 2022
In genetic epidemiology, log-linear models of population risk may be used to study the effect of ... more In genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow the genotypes to modify the effect of the exposure, or equivalently, the exposure to modify the effect of genotypes on the relative risk. When a measured test locus is in linkage disequilibrium with an unmeasured causal locus, exposurerelated genetic structure in the population can lead to spurious gene-environment interaction; that is, to apparent gene-environment interaction at the test locus in the absence of true geneenvironment interaction at the causal locus. Exposure-related genetic structure occurs when the distributions of exposures and of haplotypes at the test and causal locus both differ across population strata. A case-parent trio design can protect inference of genetic main effects from confounding bias due to genetic structure in the population. Unfortunately, when the genetic structure is exposure-related, the protection against confounding bias for the genetic main effect does not extend to the gene-environment interaction term. We show that current methods to reduce the bias in estimated gene-environment interactions from case-parent trio data can only account for simple population structure involving two strata. To fill this gap, we propose to directly accommodate multiple population strata by adjusting for genetic principal components. We evaluate our approach through simulation and illustrate it on data from a study of genetic modifiers of cleft palate.
Frontiers in Genetics, Jan 4, 2023
Introduction: In genetic epidemiology, log-linear models of population risk may be used to study ... more Introduction: In genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow the genotypes to modify the effect of the exposure, or equivalently, the exposure to modify the effect of genotypes on the relative risk. When a measured test locus is in linkage disequilibrium with an unmeasured causal locus, exposure-related genetic structure in the population can lead to spurious gene-environment interaction; that is, to apparent gene-environment interaction at the test locus in the absence of true gene-environment interaction at the causal locus. Exposure-related genetic structure occurs when the distributions of exposures and of haplotypes at the test and causal locus both differ across population strata. A case-parent trio design can protect inference of genetic main effects from confounding bias due to genetic structure in the population. Unfortunately, when the genetic structure is exposure-related, the protection against confounding bias for the genetic main effect does not extend to the geneenvironment interaction term. Methods: We show that current methods to reduce the bias in estimated geneenvironment interactions from case-parent trio data can only account for simple population structure involving two strata. To fill this gap, we propose to directly accommodate multiple population strata by adjusting for genetic principal components (PCs). Results and Discussion: Through simulations, we show that our PC adjustment maintains the nominal type-1 error rate and has nearly identical power to detect gene-environment interaction as an oracle approach based directly on population strata. We also apply the PC-adjustment approach to data from a study of genetic modifiers of cleft palate comprised primarily of case-parent trios of European and East Asian ancestry. Consistent with earlier analyses, our results suggest that the gene-environment interaction signal in these data is due to the self-reported European trios.
Handbook of Statistical Methods for Case-Control Studies, 2018
Human Heredity, 2015
Access to full text and tables of contents, including tentative ones for forthcoming issues: www.... more Access to full text and tables of contents, including tentative ones for forthcoming issues: www.karger.com/hhe_issues
Description Produces a graphical display, as a heat map, of measures of pairwise linkage disequil... more Description Produces a graphical display, as a heat map, of measures of pairwise linkage disequilibria between single nucleotide polymorphisms (SNPs). Users may optionally include the physical locations or genetic map distances of each SNP on the plot. The methods are described in Shin et al. (2006) doi:10.18637/jss.v016.c03. Users should note that the imported package 'snpStats' and the suggested packages 'rtracklayer', 'GenomicRanges', 'GenomInfoDb' and 'IRanges' are all BioConductor packages <https://bioconductor.org>.
Data in Brief
We present simulated exome-sequencing data for 150 families from a North American admixed populat... more We present simulated exome-sequencing data for 150 families from a North American admixed population, ascertained to contain at least four members affected with lymphoid cancer. These data include information on the ascertained families as well as single-nucleotide variants on the exome of affected family members. We provide a brief overview of the simulation steps and links to the associated software scripts. The resulting data are useful to identify genomic patterns and disease inheritance in families with multiple disease-affected members.
Non UBCUnreviewedAuthor affiliation: Simon Fraser UniversityFacult
approximate exact conditional inference for logistic regression models. Exact conditional inferen... more approximate exact conditional inference for logistic regression models. Exact conditional inference is based on the distribution of the sufficient statistics for the parameters of interest given the sufficient statistics for the remaining nuisance parameters. Using model formula notation, users specify a logistic model and model terms of interest for exact inference. License GPL (> = 2)
This repository contains the code used to simulate an exome-sequencing study of 150 families asce... more This repository contains the code used to simulate an exome-sequencing study of 150 families ascertained to contain at least four members affected with lymphoid cancer. The code is contained in RMarkdown documents that detail how to generate the datasets in the Zenodo repository at https://zenodo.org/record/5797035. The software tools used are R , SLiM and shell scripts.
SimRVPedigree Supplement. This is a pdf file that provides detailed information about the simulat... more SimRVPedigree Supplement. This is a pdf file that provides detailed information about the simulation procedure, as well as additional information for the applications discussed in the main text. (PDF 254 kb)
Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with exce... more Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with excess relatedness among individuals with similar trait values. Analysis may be conducted on related individuals from families, or on samples of unrelated individuals from a population. For allelically heterogeneous traits, population-based linkage analysis can be more powerful than genotypic-association analysis. Here, we focus on linkage analysis in a population sample, but use sequences rather than individuals as our unit of observation. Earlier investigations of sequence-based linkage mapping relied on known sequence relatedness, whereas we infer relatedness from the sequence data. We propose two ways to associate similarity in relatedness of sequences with similarity in their trait values and compare the resulting linkage methods to two genotypic-association methods. We also introduce a procedure to label case sequences as potential carriers or non-carriers of causal variants after an a...
The American Statistician
We propose two probability-like measures of individual cluster-membership certainty which can be ... more We propose two probability-like measures of individual cluster-membership certainty which can be applied to a hard partition of the sample such as that obtained from the Partitioning Around Medoids (PAM) algorithm, hierarchical clustering or kmeans clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual's tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior-probability estimators from either FANNY or the modelbased clustering methods. We also illustrate the proposed measures by applying them to Fisher's classic iris data set.
Genes and Immunity, 2015
The possible interrelations between HLA-DQ, non-HLA single nucleotide polymorphisms (SNPs) and is... more The possible interrelations between HLA-DQ, non-HLA single nucleotide polymorphisms (SNPs) and islet autoantibodies were investigated at clinical onset in 1-34 year old type 1 diabetes (T1D) patients (n=305) and controls (n=203). Among the non-HLA SNPs reported by the Type 1 Diabetes Genetics Consortium, 24% were supported in this Swedish replication set including that the increased risk of minor PTPN22 allele and high risk HLA was modified by GAD65 autoantibodies. The association between T1D and the minor AA+AC genotype in ERBB3 gene was stronger among IA-2 autoantibody-positive patients (comparison p=0.047). The association between T1D and the common insulin (AA) genotype was stronger among insulin autoantibody (IAA)-positive patients (comparison p=0.008). In contrast, the association between T1D and unidentified 26471 gene was stronger among IAA-negative (comparison p=0.049) and IA-2 autoantibody-negative (comparison p=0.052) patients. Finally, the association between IL2RA and T1D was stronger among IAA-positive than among IAA-negative patients (comparison p=0.028). These results suggest that the increased risk of T1D by non-HLA genes is often modified by both islet autoantibodies and HLA-DQ. The interactions between non-HLA genes, islet autoantibodies Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Family-based sequencing studies are increasingly used to find rare genetic variants of high risk ... more Family-based sequencing studies are increasingly used to find rare genetic variants of high risk for disease traits with familial clustering. In some studies, families with multiple disease subtypes are collected and the exomes of affected relatives are sequenced for shared rare variants. Since different families can harbor different causal variants and each family harbors many rare variants, tests to detect causal variants can have low power in this study design. Our goal is rather to prioritize shared variants for further investigation by, e.g., pathway analyses or functional studies. The transmission-disequilibrium test prioritizes variants based on departures from Mendelian transmission in parent-child trios. Extending this idea to families, we propose methods to prioritize rare variants shared in affected relatives with two disease subtypes, with one subtype more heritable than the other. Global approaches condition on a variant being observed in the study and assume a known pr...
Frontiers in Pediatrics, 2018
Objectives: Chronic primary systemic vasculitidies (CPV) are a collection of rare diseases involv... more Objectives: Chronic primary systemic vasculitidies (CPV) are a collection of rare diseases involving inflammation in blood vessels, often in multiple organs. CPV can affect adults and children and may be life-or organ-threatening. Treatments for adult CPV, although effective, have known severe potential toxicities; safety and efficacy of these drugs in pediatric patients is not fully understood. There is an unmet need for biologic measures to assess the level of disease activity and, in turn, inform treatment choices for stopping, starting, or modifying therapy. This observational study determines if S100 calcium-binding protein A12 (S100A12) and common inflammatory indicators are sensitive markers of disease activity in children and adolescents with CPV that could be used to inform a minimal effective dose of therapy. Methods: Clinical data and sera were collected from 56 participants with CPV at study visits from diagnosis to remission. Serum concentrations of S100A12, C-reactive protein (CRP) and hemoglobin (Hb) as well as whole blood cell counts and erythrocyte sedimentation rate (ESR) were measured. Disease activity was inferred by physician's global assessment (PGA) and the pediatric vasculitis activity score (PVAS). Results: Serum concentrations of standard markers of inflammation (ESR, CRP, Hb, absolute blood neutrophil count), and S100A12 track with clinically assessed disease activity. These measures-particularly neutrophil counts and sera concentrations of S100A12-had the most significant correlation with clinical scores of disease activity Brown et al. Tracking Inflammation in Childhood CPV in those children with vasculitis that is associated with anti-neutrophil cytoplasmic antibodies (ANCA) against proteinase 3. Conclusions: S100A12 and neutrophil counts should be considered in the assessment of disease activity in children with CPV particularly the most common forms of the disease that involve proteinase 3 ANCA. Key messages:-In children with chronic primary systemic vasculitis (CPV), classical measures of inflammation are not formally considered in scoring of disease activity.-Inflammatory markers-specifically S100A12 and neutrophil count-track preferentially with the most common forms of childhood CPV which affect small to medium sized vessels and involve anti neutrophil cytoplasmic antibodies (ANCA) against proteinase-3.
bioRxiv (Cold Spring Harbor Laboratory), Oct 18, 2022
In genetic epidemiology, log-linear models of population risk may be used to study the effect of ... more In genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow the genotypes to modify the effect of the exposure, or equivalently, the exposure to modify the effect of genotypes on the relative risk. When a measured test locus is in linkage disequilibrium with an unmeasured causal locus, exposurerelated genetic structure in the population can lead to spurious gene-environment interaction; that is, to apparent gene-environment interaction at the test locus in the absence of true geneenvironment interaction at the causal locus. Exposure-related genetic structure occurs when the distributions of exposures and of haplotypes at the test and causal locus both differ across population strata. A case-parent trio design can protect inference of genetic main effects from confounding bias due to genetic structure in the population. Unfortunately, when the genetic structure is exposure-related, the protection against confounding bias for the genetic main effect does not extend to the gene-environment interaction term. We show that current methods to reduce the bias in estimated gene-environment interactions from case-parent trio data can only account for simple population structure involving two strata. To fill this gap, we propose to directly accommodate multiple population strata by adjusting for genetic principal components. We evaluate our approach through simulation and illustrate it on data from a study of genetic modifiers of cleft palate.
Frontiers in Genetics, Jan 4, 2023
Introduction: In genetic epidemiology, log-linear models of population risk may be used to study ... more Introduction: In genetic epidemiology, log-linear models of population risk may be used to study the effect of genotypes and exposures on the relative risk of a disease. Such models may also include gene-environment interaction terms that allow the genotypes to modify the effect of the exposure, or equivalently, the exposure to modify the effect of genotypes on the relative risk. When a measured test locus is in linkage disequilibrium with an unmeasured causal locus, exposure-related genetic structure in the population can lead to spurious gene-environment interaction; that is, to apparent gene-environment interaction at the test locus in the absence of true gene-environment interaction at the causal locus. Exposure-related genetic structure occurs when the distributions of exposures and of haplotypes at the test and causal locus both differ across population strata. A case-parent trio design can protect inference of genetic main effects from confounding bias due to genetic structure in the population. Unfortunately, when the genetic structure is exposure-related, the protection against confounding bias for the genetic main effect does not extend to the geneenvironment interaction term. Methods: We show that current methods to reduce the bias in estimated geneenvironment interactions from case-parent trio data can only account for simple population structure involving two strata. To fill this gap, we propose to directly accommodate multiple population strata by adjusting for genetic principal components (PCs). Results and Discussion: Through simulations, we show that our PC adjustment maintains the nominal type-1 error rate and has nearly identical power to detect gene-environment interaction as an oracle approach based directly on population strata. We also apply the PC-adjustment approach to data from a study of genetic modifiers of cleft palate comprised primarily of case-parent trios of European and East Asian ancestry. Consistent with earlier analyses, our results suggest that the gene-environment interaction signal in these data is due to the self-reported European trios.
Handbook of Statistical Methods for Case-Control Studies, 2018
Human Heredity, 2015
Access to full text and tables of contents, including tentative ones for forthcoming issues: www.... more Access to full text and tables of contents, including tentative ones for forthcoming issues: www.karger.com/hhe_issues
Description Produces a graphical display, as a heat map, of measures of pairwise linkage disequil... more Description Produces a graphical display, as a heat map, of measures of pairwise linkage disequilibria between single nucleotide polymorphisms (SNPs). Users may optionally include the physical locations or genetic map distances of each SNP on the plot. The methods are described in Shin et al. (2006) doi:10.18637/jss.v016.c03. Users should note that the imported package 'snpStats' and the suggested packages 'rtracklayer', 'GenomicRanges', 'GenomInfoDb' and 'IRanges' are all BioConductor packages <https://bioconductor.org>.
Data in Brief
We present simulated exome-sequencing data for 150 families from a North American admixed populat... more We present simulated exome-sequencing data for 150 families from a North American admixed population, ascertained to contain at least four members affected with lymphoid cancer. These data include information on the ascertained families as well as single-nucleotide variants on the exome of affected family members. We provide a brief overview of the simulation steps and links to the associated software scripts. The resulting data are useful to identify genomic patterns and disease inheritance in families with multiple disease-affected members.
Non UBCUnreviewedAuthor affiliation: Simon Fraser UniversityFacult
approximate exact conditional inference for logistic regression models. Exact conditional inferen... more approximate exact conditional inference for logistic regression models. Exact conditional inference is based on the distribution of the sufficient statistics for the parameters of interest given the sufficient statistics for the remaining nuisance parameters. Using model formula notation, users specify a logistic model and model terms of interest for exact inference. License GPL (> = 2)
This repository contains the code used to simulate an exome-sequencing study of 150 families asce... more This repository contains the code used to simulate an exome-sequencing study of 150 families ascertained to contain at least four members affected with lymphoid cancer. The code is contained in RMarkdown documents that detail how to generate the datasets in the Zenodo repository at https://zenodo.org/record/5797035. The software tools used are R , SLiM and shell scripts.
SimRVPedigree Supplement. This is a pdf file that provides detailed information about the simulat... more SimRVPedigree Supplement. This is a pdf file that provides detailed information about the simulation procedure, as well as additional information for the applications discussed in the main text. (PDF 254 kb)
Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with exce... more Linkage analysis maps genetic loci for a heritable trait by identifying genomic regions with excess relatedness among individuals with similar trait values. Analysis may be conducted on related individuals from families, or on samples of unrelated individuals from a population. For allelically heterogeneous traits, population-based linkage analysis can be more powerful than genotypic-association analysis. Here, we focus on linkage analysis in a population sample, but use sequences rather than individuals as our unit of observation. Earlier investigations of sequence-based linkage mapping relied on known sequence relatedness, whereas we infer relatedness from the sequence data. We propose two ways to associate similarity in relatedness of sequences with similarity in their trait values and compare the resulting linkage methods to two genotypic-association methods. We also introduce a procedure to label case sequences as potential carriers or non-carriers of causal variants after an a...
The American Statistician
We propose two probability-like measures of individual cluster-membership certainty which can be ... more We propose two probability-like measures of individual cluster-membership certainty which can be applied to a hard partition of the sample such as that obtained from the Partitioning Around Medoids (PAM) algorithm, hierarchical clustering or kmeans clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be used to investigate an individual's tendency to belong to a cluster. We also suggest two possible ways to evaluate the hard partition. We evaluate the performance of both measures in individuals with ambiguous cluster membership, using simulated binary datasets that have been partitioned by the PAM algorithm or continuous datasets that have been partitioned by hierarchical clustering and k-means clustering. For comparison, we also present results from soft clustering algorithms such as soft analysis clustering (FANNY) and two model-based clustering methods. Our proposed measures perform comparably to the posterior-probability estimators from either FANNY or the modelbased clustering methods. We also illustrate the proposed measures by applying them to Fisher's classic iris data set.
Genes and Immunity, 2015
The possible interrelations between HLA-DQ, non-HLA single nucleotide polymorphisms (SNPs) and is... more The possible interrelations between HLA-DQ, non-HLA single nucleotide polymorphisms (SNPs) and islet autoantibodies were investigated at clinical onset in 1-34 year old type 1 diabetes (T1D) patients (n=305) and controls (n=203). Among the non-HLA SNPs reported by the Type 1 Diabetes Genetics Consortium, 24% were supported in this Swedish replication set including that the increased risk of minor PTPN22 allele and high risk HLA was modified by GAD65 autoantibodies. The association between T1D and the minor AA+AC genotype in ERBB3 gene was stronger among IA-2 autoantibody-positive patients (comparison p=0.047). The association between T1D and the common insulin (AA) genotype was stronger among insulin autoantibody (IAA)-positive patients (comparison p=0.008). In contrast, the association between T1D and unidentified 26471 gene was stronger among IAA-negative (comparison p=0.049) and IA-2 autoantibody-negative (comparison p=0.052) patients. Finally, the association between IL2RA and T1D was stronger among IAA-positive than among IAA-negative patients (comparison p=0.028). These results suggest that the increased risk of T1D by non-HLA genes is often modified by both islet autoantibodies and HLA-DQ. The interactions between non-HLA genes, islet autoantibodies Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use: