Genetics of Kidneys in Diabetes (GoKinD) Study: A Genetics Collection Available for Identifying Genetic Susceptibility Factors for Diabetic Nephropathy in Type 1 Diabetes (original) (raw)

. Author manuscript; available in PMC: 2009 Oct 30.

Published in final edited form as: J Am Soc Nephrol. 2006 Jun 14;17(7):1782–1790. doi: 10.1681/ASN.2005080822

Abstract

The Genetics of Kidneys in Diabetes (GoKinD) study is an initiative that aims to identify genes that are involved in diabetic nephropathy. A large number of individuals with type 1 diabetes were screened to identify two subsets, one with clear-cut kidney disease and another with normal renal status despite long-term diabetes. Those who met additional entry criteria and consented to participate were enrolled. When possible, both parents also were enrolled to form family trios. As of November 2005, GoKinD included 3075 participants who comprise 671 case singletons, 623 control singletons, 272 case trios, and 323 control trios. Interested investigators may request the DNA collection and corresponding clinical data for GoKinD participants using the instructions and application form that are available at http://www.gokind.org/access. Participating scientists will have access to three data sets, each with distinct advantages. The set of 1294 singletons has adequate power to detect a wide range of genetic effects, even those of modest size. The set of case trios, which has adequate power to detect effects of moderate size, is not susceptible to false-positive results because of population substructure. The set of control trios is critical for excluding certain false-positive results that can occur in case trios and may be particularly useful for testing gene—environment interactions. Integration of the evidence from these three components into a single, unified analysis presents a challenge. This overview of the GoKinD study examines in detail the power of each study component and discusses analytic challenges that investigators will face in using this resource.


Diabetes is the leading cause of treated ESRD, accounting for almost half of the new cases each year (13). Among European Americans with type 1 diabetes, approximately one in three develops severe nephropathy that leads to ESRD (46). Evidence that genetic susceptibility plays an important role in diabetic nephropathy in type 1 diabetes first was presented more than a decade ago by Seaquist et al. (7) and Borch-Johnsen (8), and subsequent studies by researchers at the Joslin Diabetes Center (9) and The Diabetes Control and Complications Trial Research Group (10) further characterized the nature of the genetic effect.

Despite the strong evidence for genetic susceptibility factors, success in identifying the responsible genetic variants has been limited by the modest data collections that individual research groups have been able to assemble. The Genetics of Kidneys in Diabetes (GoKinD) study, an initiative supported by the Juvenile Diabetes Research Foundation (JDRF) and by the National Institute of Diabetes and Digestive and Kidney Diseases and the Centers for Disease Control and Prevention, was conceived to address this bottleneck by assembling a large DNA collection that is suitable for genetic association studies of nephropathy in type 1 diabetes.

The resulting collection includes nearly 1900 individuals with long-term (10+ yr) type 1 diabetes, half with nephropathy (943 case patients) and half without (946 control subjects). The set of case patients includes two subgroups: 328 patients with persistent proteinuria and 615 with ESRD. The set of control subjects consists only of individuals with normoalbuminuria despite 15 yr of type 1 diabetes. Both sets can be partitioned into two subsets: Those with neither parent enrolled (singletons) and those with both parents enrolled (trios). The totals as of November 2005 included 671 case singletons, 272 case trios, 623 control singletons, and 323 control trios.

The concept of using family trios to detect genetic association was developed more than a decade ago by various researchers who were wary of implicating a genetic variant simply because it happens to occur with greater frequency in a subset of the study participants who also have a relatively high occurrence of disease. To illustrate, consider a study of osteoporosis in individuals of European descent. If, in general, osteoporosis is more common in those of northern European descent compared with southern European descent, then any genetic variant that is more common in the former will tend to exhibit association with case-control analysis. The gold standard that has emerged for addressing such population stratification is the transmission/disequilibrium test (TDT) (11). The TDT procedure evaluates case trios in such a way that only relevant genetic variants are identified. An excellent review of the TDT has been written by two of the pioneers of the field, Ewens and Spielman (12). Recently, Scott and Rogus (13) examined the utility of control trios and found that they are useful in special situations, such as when a disease is highly prevalent or when certain types of gene—environment interaction exist.

GoKinD uses both case trios and control trios as well as a set of unrelated case and control singletons. The advantage of including singletons is that, in addition to being much easier to identify and ascertain, they offer exceptionally high power to detect genetic association. The tradeoff, of course, is that they are prone to false-positive results if population stratification exists.

The GoKinD Collection of DNA and clinical documentation of case patients and control subjects are available to the research community through an application process that is accessible on the GoKinD web site (https://www.gokind.org/access). Nonrenewable samples also will become available at a later date. Broad distribution of the collection is intended to spark creativity with regard to both the genetic variants studied and the analytic approaches used. These approaches are not limited to those that require the whole collection. The large collection also may be used as a sampling frame for selecting narrowly defined groups for testing very specific hypotheses. Here, we summarize the clinical characteristics of the study groups and provide detailed power calculations for each of the collection’s design components. Finally, we discuss some analytic challenges that await potential users of the collection.

Materials and Methods

Organization of GoKinD

The collaborative effort to build the GoKinD collection was organized through a coordinating center, housed jointly at the Joslin Diabetes Center (JDC) and the George Washington University Biostatistics Center (GWU), a Central Biochemistry Laboratory at the University of Minnesota (CBL), and a genetics laboratory and specimen repository at the Centers for Disease Control and Prevention (CDC).

Recruitment and Study Groups

Patients for this study were recruited through two centers. The Section of Genetics and Epidemiology at the JDC recruited and examined patients of the Joslin Clinic who were already enrolled in the Joslin Kidney Study on Genetics of Diabetic Nephropathy. All these patients resided in New England. In total, 320 case singletons, 180 case trios, 346 control singletons, and 154 control trios were recruited through the JDC. The George Washington Biostatistics Center worked with Matthews Media Group to identify for this study in the United States and Canada volunteers who subsequently were directed to one of 27 clinical centers around the United States for examination. In total, 351 case singletons, 92 case trios, 277 control singletons, and 169 control trios were recruited through GWU. The principal investigators and recruitment staff who contributed to the collection are listed in the Acknowledgments. All data management was centralized at GWU.

To be eligible as a case patient, a patient had to have type 1 diabetes (minimum 10 yr duration) and severe diabetic nephropathy (ESRD or persistent proteinuria). To be eligible as a control subject, a patient had to have type 1 diabetes for at least 15 yr and have normoalbuminuria despite never having been treated with angiotensin-converting enzyme inhibitors or angiotensin receptor blockers and not receiving current treatment with antihypertensive medication. Persistent proteinuria and normoalbuminuria were defined by the urinary albumin to creatinine ratio (ACR) (≥300 and <20 _μ_g/mg, respectively). Further details of the eligibility criteria are summarized in Figure 1. When both parents of a participant were alive and willing to participate, both were examined to form complete trios (proband and both parents) for TDT analysis. Additional eligibility requirements were age 18 through 59 yr at the time of enrollment. Patients were recruited regardless of gender, race, or ethnic origin. However, patients were excluded when they could not communicate with staff or reported HIV infection or active tuberculosis. Pregnant women were excluded, but they became eligible for screening 3 mo postpartum.

Figure 1.

Definitions of Genetics of Kidneys in Diabetes (GoKinD) study eligibility criteria.

Type 1 diabetes Diabetes diagnosed before age 31, insulin treatment begun within one yearof diagnosis and continued uninterrupted since diagnosis. Tests for GADantibodies were not performed.
Severe diabetic nephropathy Persistent proteinuria or ESRD not attributable to a condition other thandiabetes and arising after at least 10 years of diabetes duration.
ESRD Chronic dialysis or kidney transplant. The onset of ESRD is defined asthe date of the first dialysis or kidney transplant, whichever occurred first.
Persistent proteinuria At least two of the last three urine samples positive for albuminuria inspecimens taken at least one month apart. One test could be a historicalresult from the medical record documenting a urinaryalbumin/creatinine ratio (ACR) exceeding 300 μg albumin/mg creatinine ora 1+ dipstick (e.g. Multistix). All others had to be confirmed by theCBL as a urinary ACR exceeding 300 μg albumim/mg creatinine
Normoalbuminuria: At least two of the last three ACR measurements in random urinespecimens taken at least one month apart being less than 20 μg albumin/mgcreatinine. If 3 ACR measurements are needed, the highest must be lessthan 40 μg albumin/mg creatinine. One could be a historical result fromthe medical record. All others had to be confirmed by the CBL as aurinary ACR less than 20 μg albumin/mg creatinine

All participants signed informed consent forms that explained the purpose of the collection and the intention to share their DNA and other biologic samples with investigators who were approved by a scientific review process that was established by JDRF. The project and consent procedures were approved by local Institutional Review Boards of all recruitment centers, the coordinating centers, CBL, and the CDC.

Sample Processing

Detailed descriptions of the methods that were used at the CBL and CDC are available in Supplementary Appendix A (available online). In brief, biologic samples were shipped from each recruitment facility to the CBL for analysis of albumin and creatinine in urine, hemoglobin A1c (HbA1c) in blood, and total cholesterol, HDL cholesterol, cystatin C, and creatinine in serum. The CBL also prepared whole-blood lysates for DNA extraction and transformed peripheral blood lymphocytes to establish cell lines for additional DNA supplies. Cryopreserved cell lines; whole-blood lysates; and saved urine, serum, and plasma samples were shipped to the CDC, which is the repository for all GoKinD biologic samples. The CDC extracted DNA from both whole-blood cell lysates and transformed lymphocyte lysates and genotyped the HLA DQA1, DQB1, and DRB1 loci; the -23 insulin gene single-nucleotide polymorphism (14); and additional microsatellite markers to test for sample mix-ups and verify family relationships.

Tracking of specimens from recruitment facilities to the repository at the CDC was the responsibility of GWU. Distribution of the collection to approved investigators will be handled jointly by the CDC (DNA) and GWU (clinical data).

Quality Control

Replicate samples were collected from 5% of the participants to permit quality control analysis of study procedures from sample collection through DNA genotyping. For the seven clinical measurements at the CBL, the coefficients of reliability ranged from 95 to 99% except for urine albumin (93%) and urine ACR (91%) (15). The lymphocyte cell transformation success rate at the CBL was 99.8% (2354 of 2360 samples). For testing sample mix-ups and nonpaternity, three microsatellites and a gender-specific locus were genotyped at the CDC (in addition to the HLA and insulin loci). All problematic samples subsequently were genotyped for nine additional microsatellites to resolve the issue. Noteworthy is that these microsatellite tests are sensitive enough to detect even slight sample contamination. The CDC genotyped 3302 potentially eligible individuals for the collection. Not one instance of contamination of a blood sample with a second individual’s blood was found. We detected 17 instances of sample labeling errors and, allowing for undetected errors, estimate that labeling errors occurred between five and seven times per 1000 individuals. The 17 detected errors were resolved or removed from the collection. After these corrections, we estimate that the final collection of 3076 individuals may include three to six undetected sample mix-ups in the 1291 singletons and none in the trios (error rate between one and two per 1000 for the whole collection).

Statistical Analyses

Case patients and control subjects were compared using Wilcoxon rank-sums tests for quantitative variables and χ2 or Fisher exact test for categorical variables.

Power Calculations

For purposes of calculating power, diabetic nephropathy was considered as a dichotomy. Patients with ESRD were not distinguished from patients with proteinuria. For each component of the collection (singletons, case trios, and control trios), power was estimated for a range of scenarios with regard to the underlying genetic model. To do so, we used the first approximation suggested by Knapp (16) for case trios, the extensions of Scott and Rogus (13) for control trios, and Rogus et al. (17) for singletons. The required input parameters include the frequency of the risk allele (P) and the relative risks (RR) for those who are homozygotes (_ψ_2) or heterozygotes (_ψ_1) with respect to the risk allele. For control trios and singletons, prevalence also must be specified. Our calculations assumed 35% prevalence of renal disease in type 1 diabetes and risk allele frequencies (P) of 0.1, 0.3, and 0.5. Sensitivity analysis also was performed assuming prevalence of either 30 or 40%. RR were set according to four modes of inheritance: Multiplicative (_ψ_2 = γ, _ψ_1 = _γ_1/2), additive (_ψ_2 = γ, _ψ_1 = [γ + 1]/2), recessive (_ψ_2 = γ, _ψ_1 = 1), and dominant (_ψ_2 = γ, _ψ_1 = γ). A feature of this parameterization is the consistency of homozygote RR (γ) across all modes of inheritance. We report power estimates for homozygote RR values of γ = 3.0, 2.5, and 2.0, assuming a one-sided test with α = 5%. We also examine models with γ = 1.5 in the context of power to detect genes with modest effects.

Results

Characteristics of the Study Groups

The characteristics of probands whose parents were unavailable for completing trios (singletons) were, in general, similar to the characteristics of those with parents (trio probands). Therefore, singleton and trio probands were combined in Tables 1 through 3 to focus attention on the differences between case patients and control subjects. All the characteristics shown here, as well as many additional characteristics that were omitted for brevity, are available according to study group and separately for singletons and trios in Supplementary Appendix B (available online).

Table 1.

Nephropathy status at enrollment according to study groupa

Characteristic Case Probands ControlProbands
ESRD(n = 615) Proteinuriab(n = 328) Normoalbuminuria(n = 946)
Kidney transplant (%) 91 NA NA
ESRD duration (yr) 8.5 ± 5.3 NA NA
ACR (_μ_g albumin/mg creatinine)
median NA 1061 5.8
interquartile range NA 606 to 1966 4.0 to 8.5
MDRD GFR (ml/min per 1.73 m2) (18) NA 52 ± 26 88 ± 17
Estimated GFR <60 ml/min per 1.73 m2 (%) NA 65 5

Table 3.

Other characteristics related to diabetes

Characteristic CaseProbands(n = 943) ControlProbands(n = 946) P
Hypertension 85% 6% <0.0001
Antihypertensive treatment 83% NA
Systolic BP (mmHg) 131 ± 19 118 ± 12 <0.0001
Diastolic BP (mmHg) 74 ± 11 71 ± 8 <0.0001
Total cholesterol (mg/dl) 189 ± 46 185 ± 32 0.1575
HDL cholesterol (mg/dl) 54 ± 18 58 ± 16 <0.0001
Use of lipid-lowering drugs 45% 15% <0.0001
No. of parents living <0.0001a
0 26% 13%
1 23% 20%
2 48% 64%
unknown 2% 3%
Laser therapy for retinopathyb 85% 16% <0.0001
Cardiovascular diseaseb 89% 11% <0.0001
Neuropathyb 68% 11% <0.0001

Nephropathy Status

Case patients included two subgroups: Those with ESRD (65%) and those with proteinuria (35%). For highlighting the differences between these two subgroups, renal characteristic of probands are summarized in Table 1 according to three categories: Case patients with ESRD, case patients with proteinuria, and control subjects with normoalbuminuria. At enrollment, case patients with ESRD had survived 8.5 ± 5.3 yr after the onset of ESRD, and 91% of them had received a kidney transplant; the remainder were on dialysis. Urinary albumin excretion of case patients with proteinuria generally was well above the lower limit for proteinuria (ACR ≥ 300 _μ_g/mg). Median ACR was 1061 _μ_g/mg (interquartile range 602 to 1941). For control subjects, albumin excretion generally was well below the upper limit of normoalbuminuria (ACR < 20 _μ_g/mg). Median ACR was 5.8 _μ_g/mg (interquartile range 4.0 to 8.4). Renal function, as estimated by the Modification of Diet in Renal Disease equation from serum creatinine, was significantly reduced in case patients with proteinuria as compared with control subjects, with 65% having an estimated GFR <60 ml/min as compared with only 5% of control subjects. Alternative estimates of renal function as based on serum creatinine and cystatin C are available in Supplementary Appendix B.

Demographic Characteristics

The GoKinD collection is primarily a white collection: 90% of case patients and 97% of control subjects (Table 2). Most of the study groups are approximately 40% male, with the exception of the case singletons, which is 53% male. On average, case patients were 4 yr older than control subjects, and this difference was due largely to the age of case patients with ESRD (43.9 ± 6.5), which is 3 yr older than the age of case patients with proteinuria (40.3 ± 7.8). Regardless of renal status, the age of trio probands was younger than singletons, presumably because the availability of both parents was age related. A positive smoking history was reported by almost half of the case patients as compared with one third of the control subjects (P < 10-4).

Table 2.

Characteristics of probands according to study groupa

CaseProbands(n = 943) ControlProbands(n = 946) P
Demographic characteristics
white race (%) 90 97 <0.0001
male gender (%) 50 41 <0.0001
age at entry (yr) 42.6 ± 7.2 38.1 ± 8.6 <0.0001
body mass index (kg/m2) 25.7 ± 5.3 26.2 ± 4.4 <0.0001
ever smoked cigarettes 48% 33% <0.0001
Diabetes history
age at diabetes diagnosis (yr) 11.9 ± 6.7 12.9 ± 7.3 0.0095
diabetes duration (yr) 30.7 ± 7.9 25.3 ± 7.7 <0.0001
PTX (%) 33 0 <0.0001
HbA1c (%) with PTX 5.8 ± 1.5b NA
HbA1c (%) without PTX 8.3 ± 1.6 7.5 ± 1.2 <0.0001
insulin pump (%) 23 40 <0.0001

Diabetes History

The age at diagnosis of type 1 diabetes was similar in control subjects and case patients, but the duration of diabetes at enrollment was 5 yr longer, on average, for case patients (P < 10-4). This difference was due partly to the longer diabetes duration of case patients with ESRD (32.0 ± 7.3) as compared with case patients with proteinuria (28.3 ± 8.4). However, the diabetes duration of case patients with ESRD at the onset of ESRD was 23.9 ± 6.7, which was significantly (P < 10-4) less than the diabetes duration at enrollment for case patients with proteinuria and similar to that for control subjects (P = 0.0022). The level of glycemic control at enrollment was significantly affected (P < 10-4) by whether the proband had a pancreas transplant, a procedure reported only by case patients. The HbA1c of the 33% of case patients with a pancreas transplant was 5.8 ± 1.5%, whereas it was 8.3 ± 1.6% for case patients without a pancreas transplant as compared with 7.5 ± 1.2% for control subjects (P = 0.0001). Noteworthy, at enrollment, insulin pumps were being used by 40% of control subjects but only 23% of case patients (P < 10-4).

Hypertension was present in 85% of case patients, and almost all were treated with antihypertensive medication. A history of antihypertensive medication was an exclusion criterion for control subjects; therefore, hypertension was infrequent (6%). The few control subjects who were recruited with hypertension had not yet begun treatment because the hypertension was diagnosed in conjunction with the enrollment examination. Despite treatment, measured systolic and diastolic BP were higher in case patients than in control subjects (P < 10-4 for both). Total cholesterol and HDL cholesterol were similar in case patients and control subjects despite more frequent use of lipid-lowering drugs by case patients than control subjects (45 and 15%, respectively; P < 10-4). Parental mortality was higher among case patients than control subjects (P < 10-4) and was the chief reason that probands (among control subjects as well as case patients) were not available for forming trios. Both parents were living for only 48% of case patients and 64% of control subjects.

Other Complications of Diabetes

A history of laser treatment for retinopathy and diagnosed cardiovascular disease were reported by most case patients but only a few control subjects (P < 10-4; Table 3). Self-reported neuropathy was less prevalent but reported mainly by case patients. The prevalence of all three was somewhat higher in case patients with ESRD than in case patients with proteinuria.

Power Calculations

The goal of GoKinD is to identify genetic variants that play a role in diabetic nephropathy. A variant may exert an independent effect on nephropathy or an interacting effect that involves other genes or nongenetic factors. In this article, the simplest situation of a single genetic locus is presented in depth. More complex situations are addressed in the Discussion section.

Power calculations were performed separately for each of the three study components assuming a lifetime cumulative risk of 35% for diabetic nephropathy in patients with type 1 diabetes (4). Parameters of a single locus model were varied to include all combinations of four alternative modes of inheritance (dominant, recessive, additive, and multiplicative), three choices for the frequency of the risk allele (0.1, 0.3, and 0.5), and three values for the disease risk for individuals who carry two risk alleles (homozygotes) relative to individuals who carry none. Results are summarized in Table 4. Results also were obtained for models with a RR of 1.5 for the homozygotes. These are not shown in Table 4 but are described in the text.

Table 4.

Power for each of the GoKinD study design components to detect genetic associationa

Multiplicative Model Additive Model Recessive Model Dominant Model
Pb CaseTrios(n = 272) ControlTrios(n = 323) Singletons(n = 1294) CaseTrios(n = 272) ControlTrios(n = 323) Singletons(n = 1294) CaseTrios(n = 272) ControlTrios(n = 323) Singletons(n = 1294) CaseTrios(n = 272) ControlTrios(n = 323) Singletons(n = 1294)
γb = 3.0c
0.10 0.91 0.69 0.99 0.98 0.85 0.99 0.23 0.13 0.70 0.99 0.99 0.99
0.30 0.99 0.87 0.99 0.99 0.89 0.99 0.97 0.76 0.99 0.99 0.94 0.99
0.50 0.99 0.83 0.99 0.99 0.78 0.99 0.99 0.95 0.99 0.96 0.61 0.99
γ = 2.5
0.10 0.79 0.52 0.99 0.89 0.66 0.99 0.18 0.11 0.54 0.99 0.97 0.99
0.30 0.97 0.73 0.99 0.98 0.76 0.99 0.88 0.56 0.99 0.99 0.86 0.99
0.50 0.98 0.71 0.99 0.97 0.67 0.99 0.99 0.85 0.99 0.91 0.52 0.99
γ = 2.0
0.10 0.57 0.33 0.99 0.66 0.40 0.99 0.13 0.08 0.32 0.96 0.77 0.99
0.30 0.85 0.51 0.99 0.88 0.54 0.99 0.64 0.34 0.99 0.96 0.68 0.99
0.50 0.88 0.51 0.99 0.87 0.49 0.99 0.95 0.62 0.99 0.77 0.40 0.99

In almost all circumstances, the set of 1294 singletons has excellent power (>99%). The lone exception, the recessive model with 10% allele frequency, has power of only 30 to 70%. This situation, however, represents an unlikely scenario that assigns approximately 34% risk for nephropathy to 99% of the population and almost 100% risk to the remaining 1%. Excluding this unlikely case, good power (>80%) is maintained even for models with RR of 1.5. Therefore, the set of singletons is sufficient to detect genetic effects of even modest size.

The set of case trios has ample power to detect most effects of moderate size (RR for homozygotes for the risk allele ≥2). For example, power ranges from 77 to 99% for the dominant models, 66 to 99% for the additive models, and 57 to 99% for the multiplicative models. For the recessive models, excluding those with a 10% risk allele, power ranges from 64 to 99%. For RR of 1.5, the maximum power for the models considered is only 66%.

For the set of control trios, power was more model dependent. Excluding the rare recessive case, 30% of the models had power that exceeded 80%, another 36% had power between 60 to 80%, and the remaining 33% had power <60%.

European-Americans constitute 1757 (93%) of the probands, and the remaining 134 probands represent a collection of small numbers from other ethnic/racial groups. When the analysis is restricted to European Americans, power is consistently reduced by approximately 2 percentage points for any given scenario (i.e., power of 86% in the entire data set would decrease to 84% if only white individuals are considered).

As noted above, power calculations were based on an assumed lifetime risk for diabetic nephropathy of 35%. This figure was based on cohort studies of European American children with type 1 diabetes in New England. That risk for patients with type 1 diabetes may vary geographically or between ethnic/racial groups. Therefore, we conducted a sensitivity analysis by varying the assumed lifetime risk. The results of this analysis are unique for each of the three study design components. For case trios, power does not depend at all on lifetime risk. This property, which has been described previously (13), suggests that the power calculations for case trios in Table 4 apply regardless of the actual lifetime risk. For singletons, when the rare recessive case is excluded, the change in lifetime risk to 30 or to 40% is immaterial, because power exceeds 99% in all circumstances. Even at a lifetime risk of 20%, the power of singletons exceeds 98% for all scenarios. Where lifetime risk matters more, as predicted by Scott and Rogus (13), is for control trios. If risk is 30%, then the actual power of the control trios is approximately 20% less than the values in Table 4 (range 73 to 100%). If risk is 40%, then the actual power is approximately 20% greater than the values in Table 4 (range 100 to 136%).

Discussion

Value of Three Study Design Components

The GoKinD collection represents a unique opportunity for scientists to use three complementary study designs to uncover the genetic basis of diabetic nephropathy. Although the probands of the trio families could be combined with the singleton subset, this strategy would sacrifice the independence of the trio components as validation sets. Moreover, the benefit of this strategy would be small because the set of singletons already has excellent power for a wide range of genetic models, even loci with small effects. Because of the vulnerability of singleton analysis to spurious findings as a result of population stratification, confirmation of positive findings must be sought in independent data sets.

The set of case trios, which is immune to population stratification effects, was recruited for just this purpose. However, the usefulness of this remedy is limited to situations in which the hypothesized gene effect is relatively large. In situations in which trio analysis does not have adequate power, an alternative is to test and adjust for population stratification (19). Although straightforward in principle, testing for stratification may be more difficult than anticipated. As demonstrated recently, standard methods for detecting it failed in a study involving European Americans (20). Two alternatives that do not rely on tests for population stratification may be considered. One is to match case patients and control subjects for country of origin of their grandparents. Unfortunately, this information is not available for GoKinD participants. The second alternative provides a more versatile and comprehensive solution. This would entail collection of a panel of DNA that comprises diverse European populations. Then, when a positive association is found with diabetic nephropathy in the singleton case patients and control subjects in GoKinD, the frequency of the risk allele would be examined in the European panel. If the frequency varies little among European populations, then the association is unlikely to be due to stratification in a European American population. Conversely, if the frequency varies widely across European populations, then the association is more plausibly attributed to stratification. Although the set of control trios has the least power to identify nephropathy genes, it plays important roles in other respects. First, it provides protection from false-positive results that arise in case trios from the phenomenon of segregation distortion, the preferential transmission of an allele irrespective of disease phenotype. For example, any allele related to the phenotype of type 1 diabetes will be preferentially transmitted in GoKinD case trios. Because the same will be true in control trios, the locus can be recognized as a type 1 diabetes locus rather than a nephropathy locus (11). The second important role for control trio analysis is the evaluation of models that involve gene—environment interaction, where control trios can outperform case trios (13). This class of genetic models is of particular relevance to diabetic nephropathy, a phenotype that develops only in the presence of a diabetic milieu, particularly with poorer glycemic control (10).

The availability of three complementary designs is a major strength; however, it also presents challenges in interpreting results that are not consistent across all study designs. When all three components are genotyped (3075 samples), one of eight outcomes will occur (Table 5). Patterns 1 and 8 represent clearcut scenarios in which all components are in agreement. In pattern 2, only control trios yield statistically nonsignificant results, a scenario that is likely to be common given the generally lower power of this component. This pattern’s mirror image, pattern 7, in which significance is found only in control trios, is consistent with certain patterns of gene—gene or gene—-environment interaction (13). Pattern 3 also is consistent with this possibility. The remaining patterns have less clear interpretations. Although the observation of pattern 4 may be expected because of the higher statistical power of the singleton component, it also is consistent with population stratification, an interpretation that would be strengthened if this pattern were seen across many loci. Pattern 6 or 7 would be expected if segregation distortion exists, but this interpretation is tested easily when case and control trios are considered together (11,21).

Table 5.

Possible outcomes of genetic association analysis using the three designs in GoKinDa

Pattern Case/ControlSingletons CaseTrios ControlTrios
1 + + +
2 + + -
3 + - +
4 + - -
5 - + +
6 - + -
7 - - +
8 - - -

Duration of Diabetes in Case Patients and Control Subjects

Recent theoretical work has demonstrated the importance of considering duration of diabetes when carrying out either singleton or trio analysis (17). In this context, “duration” refers to duration of diabetes at onset of nephropathy for case patients and duration of diabetes at time of enrollment for control subjects. Because the onset of proteinuria often is undocumented, various approximations or surrogates for this information should be tried. Ignoring duration can result in substantial power loss or even findings that paradoxically implicate non-risk alleles as causative (17). On the basis of simulation studies, the effect of a risk allele most clearly is demonstrable in a comparison of case patients with short diabetes duration with control subjects with long duration. The simplest analytic strategy for addressing the duration issue is subgroup analysis of reasonably defined duration strata. Another option is to use conditional logistic regression with duration as an independent variable (22). In any event, it will be incumbent on investigators to formulate an appropriate analytic model that is based on the hypothesized duration effect (e.g., threshold, linear, quadratic).

High Mortality among Case Patients and Their Parents

Genetic studies of diabetic complications may be vulnerable to survivor effects as a consequence of the very high mortality rates for patients with diabetic nephropathy, especially ESRD. To participate in GoKinD, case patients with ESRD had survived an average of 8 yr of ESRD, and case patients with proteinuria had survived more intense mortality than control subjects (23). Therefore, any genetic factor that is associated with survival will be enriched to some degree in case patients. Moreover, the known clustering of early mortality in parents of patients with type 1 diabetes and nephropathy (24) resulted in only 48% of case patients having two parents available to form a trio as compared with 64% of control subjects having two (Table 3). As a result, the enrichment of a survival factor may be particularly strong in case trios. The likelihood that such mortality effects would result in a spurious association that requires further investigation. However, an investigator should consider this alternative among the interpretations of pattern 6 in Table 5, an association that is significant only in case trios.

Limitations of the GoKinD Collection

GoKinD represents a major collaborative effort that promises to speed the discovery of the genetic basis of diabetic nephropathy. Nevertheless, several important issues remain to be addressed. One is the development of novel analytic approaches that will bring together all three design components in a systematic manner. The procedure for doing so will depend profoundly on whether population stratification exists in the collection; therefore, a reasonably large effort is warranted to examine the GoKinD samples for this phenomenon. Moreover, our power calculations considered only single locus models. However, an appealing feature of GoKinD is that sample sizes are likely to be adequate for testing many hypotheses related to gene—gene interaction. Gauderman (25) recently outlined sample size requirements for various types of gene—gene interaction models. Four study designs were considered, including case trio and case only analysis, both of which are possible in GoKinD. QUANTO (http://hydra.usc.edu/gxe), the software package that implements these power calculations, subsequently has been extended for unmatched case-control studies that are relevant to the singletons in GoKinD. Similar power calculations for gene—environment interaction models also are possible using QUANTO (26).

Supplementary Material

GKManuApendA

GKManuApendB

Acknowledgments

The GoKinD Coordinating Center was funded by the JDRF, and the CDC was funded by PL 105-33, 106-554, and 107-360 administered by the National Institutes of Health.

The GoKinD collaborators acknowledge the contributions to recruitment of the JDC, the Clinical Centers associated with GWU, and Matthews Media Group without which the study would not have been possible. GoKinD investigators from these centers include Stephen A. Brietzke, Debbie Eichelberger, and Christine Hogue, University of Missouri; David Brillon and Juan Cordero, New York Presbyterian Hospital, Cornell University; George A. Burghen and Pam LeNoue, University of Tennessee; George W. Burke and Eva P. Herrada, University of Miami; Debra Counts and Sherry Johnsonbaugh, University of Maryland Medical System; James Desemone and Manjula Salgam, Albany Medical Center; Steven V. Edelman and Gayle Lorenzi, University of California San Diego; Carla Greenbaum and Daxa Sabhaya, Virginia Mason Research Center; Richard A. Guthrie and Ann Brenner, Mid-America Diabetes Associates, P.A.; Irene Hramiak and Judith Harth, St. Joseph’s Health Care, University of Western Ontario; Mark Johnson and Paula McIver, University of North Carolina at Chapel Hill; Lois Jovanovic and Allison Wollitzer, Sansum Medical Research Center; John I. Malone and Jennifer Steinbrueck, University of South Florida; Michael Mauer, Nick Rabe, and Cathy Bagne, University of Minnesota; Michael E. May and Janie Lipps, Vanderbilt University Medical Center; Larry Melton and Jonnie Feller, Baylor University Medical Center; Mark E. Molitch and Daphne Adelman, Northwestern University; Robert E. Ratner and Evelyn Robinson, Med-Star Clinical Research Center; John Rogus, Adam Smiles, James H. Warram, Andrzej S. Krolewski, Amanda Johnson, Andrea Segal, Josh Rubin, Julie Bonner, Katie Georgitis, Kimberly Prudhomme Fader, Kristen Silva, Matt Niemi, Melissa Sugar, Nicole Wilkinson, Sarah Connearney, Scott Tucker, Susan Orsillo, Tom Reynolds, and Kellie Anderson, Joslin Diabetes Center; William L. Sivitz and Meg Bayless, University of Iowa; John A. Colwell, Denise Wood, Maria Szpiech, and Kathy Bradbury, Medical University of South Carolina; Neil H. White and Lucy Levandoski, Washington University School of Medicine; Bernard Zinman and Annette Barnie, Mount Sinai Hospital, University of Toronto; and Therese B. Gibson, Aspen Systems, Inc.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

GKManuApendA

GKManuApendB