Cognitive Deficits in Unaffected First-Degree Relatives of Schizophrenia Patients: A Meta-analytic Review of Putative Endophenotypes (original) (raw)

Abstract

Cognitive deficits may index genetic liability for schizophrenia and are candidate endophenotypes for the illness. In order to compare the degree of sensitivity among cognitive tasks to group differences between healthy relatives and controls and the influence of moderator variables, this review reports mean effect sizes for 43 cognitive test scores from 58 studies of cognitive performance in the unaffected adult relatives of schizophrenia patients. Results indicate reliable relative–control differences, in the small to medium effect size range, over a diverse array of tasks, with the largest effect sizes seen in complex versions of continuous performance tasks, auditory verbal learning, design copy tests, and category fluency. Three study design features were found to have significant effects on overall effect size magnitude: groups unmatched on education, groups unmatched on age, and asymmetric psychiatric exclusion criteria. After excluding studies with the latter 2 design features, reliable performance differences were still observed over a smaller subset of cognitive test variables, with the largest effect sizes seen in Trails B (d = 0.50) and performance measures from both simple (d = 0.56) and complex (d = 0.60–0.66) versions of continuous performance tasks. Four of the 6 largest effect sizes reflect tasks with high executive control demands in common, such as working memory demands, set shifting, and inhibition of prepotent responses. Cognitive deficits, particularly those tapping such executive control functions, should continue to prove valuable as endophenotypes of interest in the search for specific genetic factors related to schizophrenia.

Keywords: executive functions, family studies, genetic risk, neuropsychology, vulnerability indicators

Introduction

Endophenotypes are intermediate phenotypes that provide a more reliable index of liability than the illness itself.1 In recent years, reports of cognitive deficits in schizophrenia patients and in their biological relatives have rapidly increased, including efforts to link endophenotypes to specific genes,2–3 reflecting the research goal of identifying candidate endophenotypes that may index genetic liability to schizophrenia.

Gottesman and Gould summarize 5 criteria for identifying useful endophenotypes in psychiatry: (1) the endophenotype should be associated with illness in the population; (2) the endophenotype should be heritable; (3) the endophenotype should be primarily state independent; (4) within families, the endophenotype and the illness should co-segregate; and (5) the endophenotype should be found in nonaffected family members at a higher rate than in the general population.4 With regard to the first criterion, it is indeed well established that cognitive impairments are pervasive in schizophrenia over a wide array of ability domains,5–7 despite important ongoing research questions about the nature of these impairments (e.g., selective deficits versus generalized impairment, within-group heterogeneity, associations with symptoms, etc.). The meta-analyses conducted by Heinrichs and Zakzanis report moderate to large effect sizes in patient versus control differences in global and selective verbal memory, nonverbal memory, bilateral and unilateral motor performance, visual and auditory attention, general intelligence, spatial ability, executive function, language, and interhemispheric tactile transfer.6 None of the confidence intervals for mean effect sizes in the above domains included zero, which is to say, no behavioral domain tested is spared by the illness.

With regard to the last criterion, studies of cognition in biological relatives of schizophrenia patients are less consistent than those of patients. Generally, findings indicate that relatives are also impaired, albeit to a lesser degree than patients, on a wide array of cognitive tasks (e.g., 8–14), although findings of no impairment have also been reported (e.g., 15–16). The first review of family studies evaluating potential neuropsychological risk indicators appeared over 10 years ago and indicated “promising leads” in sustained attention, perceptual-motor speed, and concept formation and abstraction.17

Recently the first quantitative reviews of cognitive impairments in the relatives of schizophrenia patients have appeared.18–19 Both meta-analyses examined mean performance differences between relatives and controls and reported standardized mean difference effect sizes (i.e., Cohen's d, Hedge's g). The first, from Sitskoorn et al., synthesizes data from 37 studies and yields 9 cognitive variables from the domains of attention, memory, and executive functioning.18 Results indicate small to moderate effect sizes, with the largest group differences found on verbal memory recall (d = 0.54) and Trail Making Test B (d = 0.51). There was no evaluation of the methodological features of studies and their potential influence on effect size magnitude. The second meta-analysis, from Szöke et al., focuses on 4 tests of executive functions—the Wisconsin Card Sorting Test (WCST), Trail Making Test B, the Stroop Test, and verbal fluency, yielding 6 cognitive variables from 25 studies.19 Effect size magnitudes for nearly all of the measures were again in the small to moderate range (d = 0.26–0.65), except for semantic verbal fluency, which was found to have a large effect size (d = 0.87). This review also recorded a number of study characteristics and found that different versions of the WCST used accounted for the heterogeneity of its effect size estimates, while proband diagnosis (i.e., schizoaffective patients included or not) resolved the heterogeneity of semantic verbal fluency effect size estimates. However, other study characteristics of interest, such as the nature of the exclusion criteria for relatives and controls and type of biological relative included, were not systematically examined across cognitive variables.

The goal of the present review is to expand upon the results of existing meta-analyses in 3 important ways: (1) to compare sensitivities to relative–control group differences among a wider range of cognitive measures commonly used in the literature, (2) to report on specific cognitive test scores whenever possible and avoid aggregating tests into broader domains, and (3) to examine the role of a priori identified study characteristics on the magnitude of effect sizes. With regard to the first aim, we culled performance data from a wide array of cognitive tasks from neuropsychological as well as experimental-cognitive literatures. In addition to tasks assessing the most frequently studied cognitive domains—attention, memory, and executive functions—we included cognitive tasks tapping spatial ability (e.g., Block Design), language functions (e.g., tests of reading ability), psychomotor tests (e.g., finger tapping), and general intellectual ability (i.e., IQ estimates). Within attention, memory, and executive functions, we also expanded test coverage to include such tasks as span of apprehension, spatial delayed-response tasks, and antisaccade tasks.

As with the 2 other recent reviews, the present series of meta-analyses attempted to address Gottesman and Gould's fifth criterion for endophenotype validation—that the endophenotype is found at a higher rate among unaffected relatives than in the general population—by comparing the magnitude of group mean differences between relative and control samples. It is essential to note that calculating the standardized mean-difference effect size (e.g., Cohen's d) is not equivalent to evaluating rates of an endophenotype present or absent in family members versus in the general population. If the distribution of a cognitive variable significantly departs from normality, these 2 approaches can yield very different results. However, in most cases the standardized mean-difference effect size is a reasonable proxy for group differences in impairment rate (i.e., relative risk or odds ratio) and is currently the most viable approach to research synthesis in this literature, as the vast majority of data are reported as group mean differences and not as rates of impairment.

Regarding the second study aim of cognitive specificity, Heinrichs and Zakzanis note in their meta-analysis of cognitive deficits in schizophrenia: “Organizing the myriad of neurocognitive test variables reported in the literature into a coherent classification was a major challenge.”6(p429) As in their study, in order to avoid assumptions about the structure of underlying cognitive processes, and to provide the clearest picture of the average sensitivity to group differences of a given test, we avoided aggregating effect sizes from different tests into broad cognitive constructs (e.g., “executive functioning,” “verbal memory,” “spatial abilities”). Instead, we attempted to calculate independent effect sizes, not only for each test that we could meaningfully distinguish but also for each of the different scores from each test to which we had access (e.g., WCST categories and WCST perseverative errors, Continuous Performance Test [CPT] omission errors, and CPT commission errors). In this regard, if specific measures reflect different core cognitive processes, different genetic contributions, and/or varying effect size ranges or psychometric properties, this information was less likely to be lost by collapsing specific scores within a broader test category. Although this meant that fewer studies were available for each mean effect size calculation, preserving the nature of the original scores was a more important consideration.

The final study aim was to examine the role of 5 experimental design features that varied significantly across studies. The first was the type(s) of family members in the relative sample. We hypothesized an order effect on magnitude of cognitive impairment, from parents (smallest), to siblings, to adult offspring (largest), based upon the degree to which family members have, on average, passed through the age of risk for schizophrenia (approximately 15–45 years old).

Second, we examined whether the presence or absence of schizophrenia spectrum personality disorders and/or schizotypal features in relatives of patients influenced the measured effect sizes. Some studies screen out schizophrenia-related personality disorders from their relative sample (e.g., 15, 20), while others clinically document their presence or that of schizotypal symptoms (e.g., 21–22). Whether there is an association between schizotypy and cognitive impairment in family members remains unresolved; thus, this review provided a useful opportunity to systematically examine whether the inclusion of participants with schizophrenia-like subclinical traits is associated with an overall magnitude of effect sizes of cognitive deficits.

We also examined the role of group matching on age and education, respectively, with regard to cognitive effect sizes. Group matching has been a thorny issue in schizophrenia research since Paul Meehl called attention to the “matching fallacy”—that groups may be overmatched on a variable that is not independent of the illness, per se.23 If schizophrenia is a neurodevelopmental disorder, then matching patients and normal controls on education or IQ may cause mismatching of theoretically expected cognitive ability. It is not clear whether the same reasoning holds for biological relatives of patients, although there is evidence to suggest this may be the case.24 If unexpressed schizophrenia vulnerability genes result in reduced educational attainment, groups unmatched on education may indeed be an appropriate study design feature. Independent of an interpretation as a confound, we predicted that studies with groups unmatched on education will have larger effect sizes than studies with education-matched groups.

Whether groups were matched on age was tracked as a study characteristic because of a common scenario in this literature: if a control group was used as reference for both a patient and a relative group, often the normal controls were age matched to the patients, while siblings and parents typically made up the relative group, resulting in relatives older on average than controls. Since cognitive performance is negatively associated with age, studies with groups unmatched for age were predicted to have larger effect sizes than studies with age-matched groups. Thus, in contrast to the more complex issue of education matching, groups unmatched on age would more clearly represent an important confound in studies of cognitive performance.

Finally, the influence of asymmetric rule-out criteria was examined. This refers to the application of stricter psychopathology exclusion criteria to controls (e.g., excluding all Axis I disorders) than to relatives (e.g., excluding only psychotic disorders). At issue is whether cognitive impairment in the relatives could then be interpretable as compromised by schizophrenia genes per se or psychopathology in general. This is a design feature that varies across studies, but its impact on cognitive performance is unclear and is currently under debate with specific regard to antisaccade deficits in relatives.25–26 We hypothesized that studies with asymmetric rule-out criteria will have larger effect sizes than studies with symmetric exclusion criteria, since the former likely included relatives with more general psychopathology than controls. In sum, the following questions motivated and guided the present meta-analyses and distinguish it from other recent meta-analytic studies:

METHOD

Literature Definition

We defined cognitive endophenotypes of interest as behavioral paradigms from neuropsychological, experimental-cognitive, or information-processing literatures reflecting higher-order cognitive processing. Excluded were variables from psychophysiological paradigms, such as electroencephalograph/event-related potentials markers, measures of skin conductance, or smooth-pursuit eye tracking, which we considered to be separate endophenotype literatures. The majority of measures came from traditional neuropsychological test batteries, since this has been the predominant assessment method in studies of relatives. Thus, for pragmatic reasons, only some experimental-cognitive and information-processing measures could be included (i.e., those that reached the critical mass of n = 3 studies, see below), such as span of apprehension, and not others, such as backward masking.

We also limited the review to studies of adult relatives and excluded studies of high-risk offspring assessed as children. This allowed us to focus on cognitive deficits that were markers of unexpressed liability genes rather than risk factors for developing schizophrenia, since different paradigms may be implicated in each case.

Two strategies were used to locate studies for inclusion in the meta-analysis. The first was a search of PsycINFO and Medline databases using the following keywords: schizophrenia, cognitive, neuropsychology, relatives, genetic, family, endophenotype, twins, siblings, parents, vulnerability marker, and risk factor. The second literature search strategy was a systematic search of references cited in every study located to date to find other studies that may have been missed by the database searches. These efforts yielded 113 studies for potential inclusion, all published before August 2004.

Criteria for Inclusion

Eligibility criteria for study inclusion were as follows: First, the study must have used an operationalized definition of schizophrenia for probands (e.g., DSM-III or later, ICD-9 or later, RDC). Second, the biological relative group must have been made up exclusively or predominantly of first-degree relatives. Third, the relatives were unaffected, meaning no relative had a lifetime history of schizophrenia or schizoaffective disorder. Fourth, the mean age of relatives was 18 or older, with no relatives younger than 16. Fifth, a healthy control group was included. Sixth, at least 1 cognitive, neuropsychological, or information-processing task was administered with results reported as group means and SDs or such that an effect size was calculable via _t_-test values, _F_-ratio values, or exact _p_-values.

In order to maintain the statistical independence of effect sizes, a “study” was defined as a written report with no obvious sample and task redundancy with other reports. When there was obvious sample and task redundancy, the most recent report with the largest _N_s was coded. Several studies from the same research groups had apparently overlapping samples to varying degrees (e.g., 8, 27–28). In these cases, only 1 cognitive task per sample was coded, and only data from the latest published report with the largest sample sizes were included. Some studies broke out the relative sample into those with and without schizophrenia spectrum personality disorders (e.g., 29–30). If these subgroups of relatives shared a control group, only data from the healthier relative subgroup were coded. In a few cases, authors directly provided us with raw means and standard deviations per request when these values were not reported in a published study.

Of the 113 previously identified studies, 58 met the above criteria for inclusion (see the appendix). Reasons for exclusion were as follows: Twelve studies provided neither group means with SDs nor sufficient information for estimating effect sizes, including 6 studies that reported only factor scores. Eleven studies included affected individuals in the relative group. Five studies provided no information about proband diagnoses or used preoperationalized diagnostic criteria for schizophrenia. Three studies lacked a healthy control group. Thirteen studies represented sample and task redundancy with other published reports. The remaining 11 studies were excluded for reasons such as use of idiosyncratic, nonstandardized cognitive tasks, no behavioral data reported, or no group comparison included in data analyses. All of the studies included in the meta-analysis were published except for 1 comprising a relatively large data set31 with which 1 of the present authors (AWM) was familiar.

Recorded Variables

A cognitive variable was tracked for meta-analytic synthesis if it was reported in at least 3 primary studies. In total, 43 cognitive variables were included and grouped into 8 cognitive domains common in neuropsychology (see table 1) for presentation. Of note, some of the cognitive test variables recorded represented control conditions for other test variables (e.g., prosaccade reaction time [RT] for antisaccade RT; Stroop color-naming and word-reading conditions for color–word interference) and thus were not generally considered theoretical endophenotypes with regard to expected group differences. However, they were included for the same reasons control conditions are informative in individual studies: to control for basic performance factors for purposes of discriminative validity.

Table 1.

Cognitive Test Variables Selected for Meta-analyses

Cognitive Domain Recorded Test Variable Number of Studies Relatives (n) Controls (n)
Attention/Working Memory Trails A (time) 11 466 446
Trails B (time)a 16 679 685
Visual cancellation tests (accuracy) 4 187 231
Digit span, forward 11 519 464
Digit span, backward 9 422 400
Spatial span, forward 3 112 139
CPT-X hits/omissions 3 165 75
CPT-X false alarms 3 165 75
CPT-X d-prime 8 365 322
CPT-AX/-IP hits/omissions 5 280 170
CPT-AX/-IP false alarms 5 280 170
CPT-AX/-IP d-prime 8 528 277
Stroop Test, color namingb 3 93 107
Stroop Test, word readingb 4 134 150
Stroop Test, color–word conditionbc 5 156 170
Span of apprehension (accuracy) 6 197 222
Antisaccade, percent errors 6 325 274
Antisaccade RT 6 325 274
Prosaccade RT 6 325 274
Spatial delayed-response tasks—accuracyd 4 139 97
Spatial delayed-response tasks—RTd 4 139 97
Verbal Memory WMS(-R) Logical Memory I 8 376 351
WMS(-R) Logical Memory IIe 8 463 385
WMS(-R) Verbal Paired Associates 4 223 189
Auditory verbal learning tests (total words trials I–V)f 3 148 155
Visual Memory WMS(-R) Visual Reproduction I 7 354 326
WMS(-R) Visual Reproduction IIg 8 503 405
Executive Function WCST, categoriesh 17 524 590
WCST, total errorsh 8 247 388
WCST, perseverative errors/responsesh 19 831 741
Spatial Ability WAIS-R Block Design 7 396 340
Design copy tasksi 4 145 184
Line orientation 3 165 188
Motor Function Pegboard tasks, dominant handj 4 254 301
Pegboard tasks, nondominant handj 4 254 301
Finger tapping, dominant hand 3 138 183
Finger tapping, nondominant hand 3 138 183
Language Function WAIS-R Vocabulary 6 315 289
WAIS-R Information 3 82 112
NART/WRAT(-R) Reading 6 304 173
Letter fluency tasks 7 285 248
Category fluency tasks 6 206 177
General Intelligence WAIS(-R) Full-Scale IQk 9 423 294

In addition to recording information for calculating standardized mean differences for each cognitive variable, the following study characteristics were also coded: diagnostic system used for probands, type(s) of relatives, presence of documented schizotypal symptoms in relatives (diagnosed schizotypal personality disorder [SPD] or schizotypy assessed quantitatively), mean (+ SD) age of relative and control groups, mean (+ SD) education of relative and control groups, and presence of asymmetric inclusion criteria for relatives and controls (i.e., stricter psychopathology exclusion criteria applied to control groups than to relative groups).

Effect Size Estimation and Aggregation

Effect sizes (ESs) were computed by taking the difference in mean relative and control scores divided by the pooled standard deviation (Cohen's d). Error scores and time were recoded so that positive ESs always indicate the control group as superior to the relative group. ESs were adjusted to correct for bias attributable to small sample size32 and then weighted by the inverse variance when pooled across studies to compute a mean value for each cognitive variable.33 This approach gives greatest weight to the most reliably estimated ESs, those with the smallest standard errors.

Results

Descriptive Data

Descriptive statistics for the study set are presented in table 1. The table shows how many studies were used in each of the 43 cognitive variables for which mean ESs were computed, as well as the number of relatives and healthy controls for each cognitive variable. In total, cognitive test results from 2,872 relatives and 2,457 healthy controls were recorded across studies.

Study design characteristics are reported in table 2. Of note, all of the studies in which groups were unmatched for age reported relatives significantly older than controls. Similarly, relatives were less educated than controls in every study with groups unmatched for education. With regard to type of relative, “siblings only” makes up the largest category (n = 17 studies), followed by mixtures of “siblings and parents” (n = 11) and “siblings, parents, and offspring” (n = 10); meanwhile, the number of studies including offspring only was low (n = 3), consequent to limiting the scope of review to studies of adult relatives. Also of note, about an equal number of studies reported the presence (n = 20) or absence (n = 21) of documented SPD/schizotypal symptoms.

Table 2.

Study Design Characteristics and Associated Mean (+ SD) Effect Sizes

Design characteristic n of Studies (%) n of Effect Sizes Mean Effect Size (SD)
Diagnostic System for Probands
Research Diagnostic Criteria 7 (12) 37 0.28 (0.35)
Diagnostic and Statistical Manual of Mental Disorders, 3rd ed. 2 (3) 8 0.37 (0.43)
Diagnostic and Statistical Manual of Mental Disorders, 3rd ed.—revised 31 (54) 157 0.38 (0.35)
International Classification of Diseases, 9th revision 1 (2) 1 −0.08 (−)
Diagnostic and Statistical Manual of Mental Disorders, 4th ed. 14 (24) 74 0.45 (0.37)
Operational Criteria for Psychotic Illness 3 (5) 8 0.34 (0.26)
Type of First-Degree Relatives
Siblings 17 (29) 63 0.41 (0.38)
Parents 6 (10) 27 0.45 (0.31)
Offspring 3 (5) 9 0.36 (0.26)
Monozygotic twins 1 (2) 18 0.42 (0.30)
Parents and siblings 11 (19) 69 0.39 (0.35)
Siblings and offspring 1 (2) 20 0.35 (0.23)
Parents, siblings, and offspring 10 (17) 53 0.25 (0.35)
Not reported 9 (16) 26 0.53 (0.47)
Groups Matched on Age
Yes 45 (78) 224 0.36 (0.35)
No 11 (19) 55 0.48 (0.40)
Not reported 2 (3) 6 0.27 (0.28)
Groups Matched on Education
Yes 36 (62) 167 0.34 (0.36)
No 9 (16) 66 0.45 (0.29)
Not reported 13 (22) 52 0.45 (0.41)
Presence of Schizotypal Symptomsa
Yes 21 (36) 102 0.39 (0.30)
No 20 (35) 120 0.35 (0.36)
Not reported 17 (29) 63 0.43 (0.42)
Asymmetric Inclusion Criteria
Yes 21 (36) 115 0.44 (0.40)
No 30 (52) 134 0.34 (0.32)
Not reported 7 (12) 36 0.36 (0.32)

Magnitude of Effect Sizes for Cognitive Test Variables

Figure 1 displays the estimated mean effect sizes for all 43 cognitive test variables in order of magnitude, with 95% confidence intervals (CIs). Of note, with the exception of prosaccade RT (mean d = 0.00), all mean ESs were positive, indicating poorer performance for relatives than controls. However, a number of mean ESs had CIs that included zero, statistically equivalent to a failure to reject the null hypothesis of no mean difference between groups at p < .05. From the attention/working memory domain these included prosaccade RT, Stroop Test word-reading condition, CPT–Simple Version (-X) false alarms, CPT–“X” target only following “A” or equivalent/Identical Pairs Version (-AX/-IP) hits/omission errors, spatial delayed-response tasks RT, and spatial span forward. From the spatial ability domain, line orientation was included, and from the language function domain, Wechsler Adult Intelligence Scale(–Revised; WAIS[-R]) Information was also included. The remainder of the mean ESs had 95% CIs that did not include zero, indicating reliable performance differences between relatives and controls.

Fig. 1.

Fig. 1.

Mean effect sizes (d) for 43 cognitive test variables in order of magnitude. Error bars represent 95% confidence intervals; n = number of effect sizes included per variable. See table 1 for abbreviations.

The magnitude of effect size ranged from 0.00 (prosaccade RT) to 0.68 (category fluency), with a large proportion falling between 0.20 and 0.50. Those falling below 0.20, with d = 0.20, considered “small” according to Cohen's convention of appraising ES magnitude,34 included prosaccade RT, Stroop Test word-reading condition, and antisaccade RT from the attention/working memory domain; dominant-hand pegboard task performance from the motor domain; and line orientation from the spatial ability domain. ESs falling in between 0.20 and 0.40 from the attention/working memory domain included CPT-AX/-IP and CPT-X hits/omission errors, CPT-X false alarms, span of apprehension, spatial delayed-response tasks RT, spatial span forward, antisaccade errors, digit span forward and backward, Trails A, Stroop Test color–word interference, and visual cancellation tests. From the memory domains (both verbal and visual) they included Wechsler Memory Scale(–Revised; WMS[-R]) Logical Memory II and Visual Reproduction I and II. All 3 WCST variables from the executive function domain fell into the 0.20–0.40 range, as did those from both nondominant-hand motor tasks and dominant-hand finger tapping from the motor function domain, WAIS(-R) Vocabulary and Information from the language function domain, WAIS(-R) Block Design from the spatial ability domain, and WAIS(-R) IQ.

The remainder of the effect sizes ranged from 0.41 to 0.68, considered “medium” in magnitude by convention (d = 0.50).34 Among the attention/working memory tasks they included Trails B, Stroop color-naming condition, CPT-X d-prime, CPT-AX/-IP d-prime and false alarms, and accuracy of spatial delayed-response tasks. From the language function domain both letter and category fluency and National Adult Reading Test (NART)/Wide Range Achievement Test–Revised (WRAT[-R]) reading were included. Among verbal memory tasks, WMS(-R) Paired Verbal Associates, WMS(-R) Logical Memory I, and auditory verbal learning tasks were included, as well as design copy tasks from the spatial ability domain.

Effect of Study Characteristics on Cognitive Effect Sizes

Study characteristics were examined as independent variables in a series of one-way analyses of variances (ANOVAs), with all effect sizes pooled across cognitive test variables as the dependent measure (n = 285). These analyses provided an omnibus test for the effect of a given study characteristic on the overall magnitude of mean group performance differences. ESs from studies that did not provide enough information for coding a given study characteristic were excluded from that analysis (see table 2).

The diagnostic system used for probands did not have a significant effect on ES magnitude (F [5, 279] = 1.58, p > .17), nor did the type of biological family member making up the relative sample (F [6, 252] = 1.71, p > .12). The documented presence of SPD and/or schizotypal symptoms in the relatives also had no effect (F [1, 220] = 0.79, p > .38). Whether the groups were age matched, however, did have a significant effect (F [1, 276] = 4.30, p < .05), with the mean ES from studies with age-matched samples (d = 0.36) lower than that from studies with non-age-matched samples (d = 0.48). Education matching, as well, had a significant effect (F [1, 231] = 5.15, p < .05), with the mean ES from studies with education-matched samples (d = 0.34) lower than that from studies with non-education-matched samples (d = 0.45). Finally, asymmetric exclusion criteria had a significant effect on ES magnitude (F [1, 246] = 4.85, p < .05), with the mean ES from studies with asymmetric criteria (d = 0.44) larger than that from studies with symmetric exclusion criteria for relatives and controls (d = 0.34). (See table 2.)

In the above omnibus ANOVAs, some observations were not statistically independent, since multiple ESs came from the same studies. To correct for this, all ANOVAs were repeated while adjusting the degrees of freedom and corresponding _p_-values to reflect the number of studies rather than the number of individual effect sizes. After this adjustment, the 3 study design features that were previously found to have significant effects on ES magnitude remained significant (at p < .05).

Because of the significant influence of several study characteristics on overall ES magnitude, we recalculated mean effect sizes for cognitive test variables excluding studies with non-age-matched samples or with asymmetric exclusion criteria for relatives and controls. We did not exclude studies with non-education-matched samples, since, as discussed above, it is unclear whether matching on education represents an important control of a nuisance variable or effectively reduces variance of interest between groups. After excluding the respective studies and applying the criterion of requiring at least 3 ESs of a given cognitive test variable, 24 cognitive test variables remained (see figure 2).

Fig. 2.

Fig. 2.

Mean effect sizes (d) for 24 cognitive test variables in order of magnitude, after excluding studies with non-age-matched groups and/or asymmetric exclusion criteria. Error bars represent 95% confidence intervals; n = number of effect sizes included per variable. See table 1 for abbreviations.

The magnitude of effect size in figure 2 ranges from 0.17 to 0.66, with the 2 smallest ESs showing confidence intervals that include zero (span of apprehension and spatial delayed-response tasks RT). Test variables from the attention/working memory domain in the 0.20 to 0.40 range include digit span forward and backward, cancellation tests, and CPT-AX/-IP hits/omissions. From the verbal and visual memory domains it includes WMS(-R) Logical Memory II and Visual Reproduction I and II. From the language function domain it includes NART/WRAT(-R) reading and letter fluency. From the executive functions domain it includes WCST categories, total errors, and perseverative errors; from the spatial ability domain, Block Design; and from the general ability domain, WAIS(-R) IQ. Effect sizes in the 0.40 to 0.66 range, corresponding to “medium” ESs, include WAIS(-R) Vocabulary from language function, and from attention/working memory, Trails A and B, accuracy of spatial delayed-response tasks, CPT-X d-prime, and CPT-AX/-IP d-prime and false alarms.

Discussion

Study Design Characteristics and Effect Size Magnitudes

Because of their impact on the interpretation of effect size magnitudes, and on comparisons among cognitive test variables and between studies, we turn first to the issue of study design characteristics. Of the 6 study design features examined, 3 had significant effects on overall ES magnitude: groups unmatched for age, groups unmatched for education, and stricter psychiatric exclusion criteria for controls than relatives. Of note, we culled only raw (uncorrected) means when calculating ESs, while in individual studies, typically investigators will attempt to correct for preexisting group differences, such as age and education, by statistical means (i.e., analysis of covariance [ANCOVA]). Miller and Chapman argue, however, that this application of ANCOVA, so widespread in psychopathology research, is problematic for multiple reasons, 1 of which is the distortion of the grouping variable after variance shared with the covariate is partialled out.35 Simply put (in describing a hypothetical data set in which gender and age are confounded), they maintain that “there is no way to determine what values of (the dependent variable) men younger than those tested or women older than those tested would have provided.”35(p44) While the general issue of controlling for nuisance group differences is complex, in the literature relevant to the present meta-analyses, relatives significantly older than controls clearly may account for a portion of group difference in task performance and may obscure the true effect size due to genetic status differences, as age is associated with cognitive decline. This is indicated by our results (overall d = 0.36 for studies with age-matched groups versus d = 0.48 for those with non-age-matched groups). That nearly 20% of the included studies had group age differences in this direction indicates that this is a concern that future investigations of cognitive deficits in relatives may do well to attend to.

The question of how to appropriately screen control groups in psychiatric genetic studies has been debated.36–37 At issue is whether controls should be completely free of any psychiatric disorder (i.e., “supernormal” controls) or studied regardless of psychiatric status (excepting the disorder for which index cases were selected). In studies of cognitive endophenotypes in relatives of schizophrenia patients, the goal is the identification of impairments associated with genetic risk for schizophrenia. In an ideal study design, then, the only difference in psychiatric status between the groups of interest would be first-degree family history of schizophrenia. This may be accomplished either by screening out lifetime psychopathology (e.g., mood, anxiety, substance use disorders, attention deficit/hyperactivity disorder) equally stringently from both control and relative groups (e.g., 20, 38) or by allowing psychiatric history to comparable degrees in both groups (e.g., 8). The latter strategy may be preferable, as it reduces the risk of a priori exclusion of genetically informative relatives (for discussion, see 25). Screening controls more stringently than relatives would only appear justified if there were strong evidence of increased risk for other psychiatric disorders in the relatives of schizophrenia patients (i.e., coaggregation). As this does not appear to be the case, beyond increased risk for schizophrenia spectrum disorders,39–40 asymmetric screening criteria likely only result in a higher probability that groups will perform differently on cognitive tasks for reasons other than genetic relatedness to a schizophrenia patient. Again, this view is supported by our results, which show a significant increase in overall ES magnitude in studies using asymmetric (d = 0.44) versus symmetric (d = 0.34) exclusion criteria.

Also noteworthy are the study characteristics that did not show significant influence on ES magnitude. The type of biological relative, parent, sibling, or offspring, did not have an overall effect. In fact, the means were ordered in the opposite direction to that hypothesized, based on extent to which each relative type had passed through the age of risk for schizophrenia (offspring, d = 0.36; siblings, d = 0.41; parents, d = 0.45). However, as discussed above, only 9 ESs from 3 studies contributed to the offspring mean, and so more informative comparisons may be accomplished with a larger pool of studies.

The documented presence or absence of SPD or schizotypal symptoms also did not show an effect. There is much variability in the literature with regard to associations between cognitive deficits and schizotypy, a candidate endophenotype in its own right.41–42 An important consideration for the present results is the coarse grouping of all cognitive ESs into 1 dependent variable and all measures and symptom factors of SPD as a diagnostic entity and dimensionally assessed schizotypy into 1 independent variable, in an omnibus analysis. Associations between specific dimensions of schizotypy and specific cognitive deficits have been reported in relatives, such as social-interpersonal deviance and CPT performance,11, 21 interpersonal deviance and antisaccade deficits,43 positive schizotypy and verbal memory, and disorganization schizotypy and CPT false alarms.44 As well, there is a significant literature reporting specific cognitive–schizotypal associations in community samples.45–48 Thus, although testing for such specific associations was precluded in the present meta-analysis, the more global approach taken may have obscured their detection. Such questions may be better addressed by a meta-analysis in which correlations, rather than group differences, are coded—a future endeavor that may be useful for determining the degree of overlap between different genetic risk indicators.

Overall Magnitude of Performance Differences and Comparisons Among Cognitive Test Variables

If one considers only effect sizes from studies free from age and inclusion criteria confounds (figure 2), reliable ES magnitudes range from small (span of apprehension, d = 0.17) to medium (CPT-AX/-IP false alarms, d = 0.66). By comparison, Heinrichs and Zakzanis's meta-analyses of schizophrenia versus control differences yielded ESs in the medium (Block Design, d = 0.46) to large (global verbal memory, d = 1.41) range.6 One way to gauge the meaning of effect sizes is to consider the percent overlap (100—Cohen's U1)34 of the 2 group distributions of scores: in the present relative meta-analyses, this ranged from approximately 79 to 59%. In Heinrichs and Zakzanis's schizophrenia meta-analyses, percent overlap of group distributions ranged from 57 to 29%. Cognitive impairments in healthy relatives are clearly less severe than in their ill probands, which comes as little surprise. However, the range of ES magnitude for healthy relatives is comparable to that reported for cognitive deficits in some clinical populations, such as people occupationally exposed to mercury,49 symptomatic individuals infected with HIV who have not yet developed AIDS,50 and patients with mild head injury (overall d = 0.41 within 7 days of injury).51

Another useful conceptualization of effect size magnitude is to consider the corresponding percentage of relatives with test scores below the median of the control group (U3)34: in the present meta-analyses, this number ranged from about 62 to 75%. That approximately 75% of relatives performed worse than the control group median on specific scores of CPT-AX or -IP, for example, suggests that there may be sufficient sensitivity in these measures to detect vulnerability genes for schizophrenia. Although we are unaware of any absolute threshold that distinguishes a “useful” cognitive endophenotype, it has been suggested that for linkage analysis any cut point imposed upon a distribution of scores should encompass 10 times more genetically liable participants compared to controls.52 Such a ratio is most likely to be found among particularly poor performers in tests that show the overall largest effect sizes.

We found the largest absolute ESs derived from more complex versions of the CPT. Although these meta-analyses cannot address questions of specific versus generalized deficits in relatives,53 the only published study to date demonstrating a differential deficit in the adult relatives of schizophrenia patients examined context processing, an executive control process conceptually related to working memory and inhibition, using a version of the CPT-AX.54 Thus, a cognitive task showing the largest sensitivity to group differences may also provide specificity with regard to cognition, potentially a critically valuable combination in the service of mapping cognitive deficits, to brain function, to molecular biology, to the location of abnormal gene expression.55

We attempted to preserve as much specificity of the individual test scores as possible in the meta-analyses. Consequently, our results expand upon the work of previous meta-analyses in suggesting that WMS(-R) Logical Memory I (immediate recall) may be a more sensitive measure to group differences than Logical Memory II (delayed recall), although caution is warranted here since the 95% CIs of the 2 measures do overlap. Similarly, digit span backward may be more sensitive than digit span forward and CPT-AX/-IP false alarms and d-prime may be more sensitive than omission errors. Hence, although the assigned cognitive domain labels may vary among investigators, when one considers the largest ESs in figure 2, 4 of the 6 largest (accuracy of spatial delayed-response tasks, Trails B, and CPT-AX/-IP d-prime and false alarms) come from variables having in common executive control functions such as working memory demands, set shifting, and the inhibition of prepotent responses. The apparent exceptions to this pattern are WAIS(-R) Vocabulary, reflecting expressive verbal ability, and CPT-X d-prime, a relatively simpler measure of attentional functioning.

Taking a broader look at variables from figure 2 with ESs >0.40, there is some correspondence with previous literature reviews regarding the cognitive tasks/domains found to be most sensitive to group differences. Kremen et al. identify sustained attention (i.e., CPT tasks) and perceptual-motor speed (notably Trails B) as among the most promising risk indicators.17 Sitskoorn et al. also report Trails B as having 1 of the largest ESs, along with verbal memory.18 The Trails B ES estimate from Szöke et al. (d = 0.49) is almost identical to our best estimate (d = 0.50), although the largest ESs in that meta-analysis derived from both phonologic (letter) and semantic (category) verbal fluency.19

The range of ES magnitudes is also quite similar between the current study and Sitskoorn et al.,18 ranging from d = 0.28 to 0.54, although there are some differences in the ordering by ES magnitude of some cognitive variables. Sitskoorn et al. found verbal memory to have the largest mean ES (d = 0.54), while aggregation of CPT variables had a comparatively smaller ES (d = 0.33). In fact, our CPT-AX/-IP effect size estimates are above the 95% CI estimated for CPT tasks in Sitskoorn et al., while our Logical Memory I effect size estimate is just below the 95% CI estimated for verbal memory in that study.

Several aspects of our meta-analyses differ from those in Sitskoorn et al. and may account for discrepancies in results. As mentioned above, our approach to ES aggregation is more test score specific than in Sitskoorn et al., where tasks were aggregated at the broader levels of test (e.g., Digit Span), family of tests (e.g., CPT), or cognitive construct (i.e., verbal memory). Furthermore, somewhat different studies contributed effect sizes to the 2 different meta-analyses, as our study inclusion criteria were more restrictive. Unlike Sitskoorn et al. we excluded studies for which it was reported that any relative had a lifetime diagnosis of schizophrenia or schizoaffective disorder (n = 11 in our study pool). As well, we were careful to screen out multiple studies from the same research group reporting sample overlap or indications that overlap could be reasonably assumed (e.g., 27, 56–57) and, in these cases, to record only 1 ES per sample per cognitive variable. Finally, at the second stage of analysis, studies with age and psychiatric exclusion confounds were removed, making our study pool more restrictive.

Comparing the present results with those of Szöke et al., all of the estimated ESs have overlapping 95% CIs with the exception of the 2 verbal fluency tasks, which were reported to have much higher ESs in Szöke et al. (d = 0.65 for letter fluency; d = 0.87 for category fluency). In the present study, as well, category fluency was associated with the largest ES in the first-stage analysis (d = 0.68) but was dropped from the second-stage analysis after the removal of studies with age and psychiatric exclusion confounds. Again, overlapping but different study pools contributed to the ES estimates of verbal fluency tasks of the present study and those of Szöke et al. Of the studies in Szöke et al. that contributed ESs for letter fluency, for example, 1 was excluded from the present review because of the inclusion of individuals with schizophrenia in the relative sample, 1 was not included because it was not published in English, and 2 studies (1 of which had large _n_s and an ES of d = 0.15) were included in the present meta-analysis31, 58 but not included in Szöke et al. Differences of inclusion such as these can be expected to alter the estimated mean ES to the degree observed between reviews.

Summary and Conclusions

These meta-analyses provide evidence that cognitive deficits are present in the small to medium effect size range in unaffected adult first-degree relatives of schizophrenia patients. Significant group differences were present on tasks from all cognitive domains represented. The largest ESs (d > 0.50) were seen in CPT-AX/-IP performance, accuracy of spatial delayed-response tasks, auditory verbal (list) learning, design copy tests, and category fluency. After excluding studies with significant age differences between groups and asymmetric psychiatric exclusion criteria, reliable group differences were still found on tasks of language and spatial ability, executive functions, and attention/working memory. The largest ESs (d ≥ 0.50) were seen in Trails B, CPT-X d-prime, and CPT-AX/-IP d-prime and false alarms.

The limitations to interpreting the present results include the relatively small number of study ESs contributing to cognitive test variable means. This is a consequence of 2 approaches taken in this study: (1) the decision to examine specific test scores when possible and avoid aggregating tests into broader cognitive domains and (2) a quite conservative approach to study inclusion. With relatively small _n_s per cell, further meta-analytic investigations such as formal moderator analyses were not possible. We would argue, however, that the advantage of our conservative approach is a high level of confidence in the results: that relatives show cognitive deficits of moderate effect size on some tests that cannot be accounted for by an age confound, by secondary psychiatric history, or by the presence of schizophrenia itself (for discussion of “garbage in, garbage out” critiques of meta-analysis, see 59).

In a final note of caution regarding interpretation of these results, mean difference effect sizes of cognitive test variables are not a proxy for heritability estimates of those variables. Gottesman and Gould's4 second criterion of endophenotype validation, that the endophenotype is heritable, must be evaluated by appropriate genetic modeling of twin data, and this should be a focus of future meta-analyses as the data become available.60–62

In sum, several methodological features of studies were identified that inflate the estimated effect sizes of cognitive impairments in unaffected relatives of schizophrenia patients. Despite these study design confounds, however, the cognitive deficits characterized in these meta-analyses provide support for Gottesman and Gould's last criterion that a candidate endophenotype for an illness is associated with unaffected family members. Cognitive deficits, perhaps those involving executive control, working memory, and inhibition, in particular, may continue to prove valuable in the search for specific genes conferring risk for schizophrenia.

Appendix: Studies Included in Meta-analyses

Acknowledgments

This research was supported by funds received from National Institute of Mental Health MH59883 and by a Burroughs Wellcome Translational Clinical Scientist Award to Dr. Carter. Data from this study were presented at the ninth meeting of the International Congress on Schizophrenia Research, Colorado Springs, March 2003. We thank Mary Amanda Dew for her helpful review and critique of the manuscript.

References