Methods (original) (raw)
Topic Development and Refinement
This topic was nominated by a member of the U.S. Preventive Services Task Force (USPSTF), which aims to update its recommendations every 5 years in accordance with criteria for inclusion in the National Guideline Clearinghouse. The most recent USPSTF recommendations for screening and behavioral counseling interventions in primary care to reduce risky/harmful alcohol use were issued in 2004.37
During the topic development and refinement processes, we generated an analytic framework, preliminary Key Questions (KQs), and preliminary inclusion/exclusion criteria in the form of PICOTS (Populations, Interventions, Comparators, Outcomes, Timing, Settings). The processes were guided by the information provided by the topic nominator, a scan of the literature, methods and content experts, and Key Informants. We worked with seven Key Informants during the topic refinement, all of whom were also members of our Technical Expert Panel (TEP) for this report. Key Informants and TEP members participated in conference calls and discussions through email to review the analytic framework, KQs, and PICOTS at the beginning of the project; discuss the preliminary assessment of the literature, including inclusion/exclusion criteria and review of the protocol; and provide input on the information and categories included in evidence tables.
Our KQs were posted for public comment on AHRQ's Effective Health Care Web site from December 14, 2010, through January 11, 2011, and were finalized after review of the comments and discussion with the TEP. Our preliminary KQs included additional questions about pharmacotherapy for alcohol dependence in the primary care setting. After public input and feedback from the TEP, we decided not to include pharmacotherapy in this report. One of the main reasons was that initial literature searching and expert input suggested that there are no studies of pharmacotherapy in the primary care setting that would meet inclusion/exclusion criteria, but that there are numerous studies of pharmacotherapy in other settings. Thus, we determined that to give the pharmacotherapy topic the attention it deserves would require greatly expanding the scope of this report to include many other settings or considering the pharmacotherapy topic for a separate report.
This report adopted nearly all of the KQs identified in the earlier systematic review that informed the USPSTF recommendations, titled Behavioral Counseling Interventions in Primary Care to Reduce Risky/Harmful Alcohol Use.1 In addition, the scope of this report has been expanded to allow the inclusion of screening and behavioral interventions for the full spectrum of alcohol misuse, expanding the review to include subjects with alcohol abuse and dependence, as long as subjects were identified by screening in a primary care or primary care-like setting. We also expanded the eligible settings from traditional primary care to also include settings with primary care-like relationships (e.g., infectious disease clinics for people with HIV), added additional outcomes of interest to our PICOTS and analytic framework, and added referral as an intervention of interest and changed the title to reflect this addition.
Analytic Framework
We developed an analytic framework to guide the systematic review process (Figure 1). KQ 1 addresses the direct evidence of effectiveness of screening for alcohol misuse for improving morbidity, mortality, or other long-term outcomes. KQ 2 examines how specific screening approaches compare with one another for detecting alcohol misuse. KQ 3 and KQ 5 address the potential adverse effects of screening (KQ 3) and behavioral counseling interventions (KQ 5). KQ 4 examines the efficacy and comparative effectiveness of behavioral counseling interventions for improving intermediate outcomes (e.g., rates of alcohol use, heavy drinking episodes). KQ 6 investigates the efficacy and comparative effectiveness of behavioral counseling interventions for improving morbidity, mortality, or other long-term outcomes. KQ 7 addresses the health care system influences that promote or hinder effective screening and intervention for alcohol misuse.
Figure 1
Analytic framework for screening, behavioral counseling, and referral in primary care to reduce alcohol misuse. KQ = Key Question
Literature Search
To identify articles relevant to each KQ, we searched MEDLINE®, Embase®, the Cochrane Library, CINAHL®, PsycINFO®, and the International Pharmaceutical Abstracts. The full search strategy is presented in Appendix A. We used either Medical Subject Headings (MeSH or MH) as search terms when available or key words when appropriate, focusing on terms to describe the relevant population and the screening and behavioral interventions of interest. We reviewed our search strategy with the TEP and incorporated their input into our search strategy.
We limited the electronic searches to “human” and “English language.” Sources were searched from January 1, 1985, to August 30, 2011. The start date was selected based on the earliest publication date found in previous systematic reviews (which was 1988) and expert opinion about when the earliest literature on this topic was published. We did not simply conduct searches starting from where the 2004 systematic review1 left off because our review has some differences in scope (described above under Topic Development and Refinement). We used the National Library of Medicine (NLM) publication type tags to identify reviews, randomized controlled trials (RCTs), and meta-analyses. Because our scope included pharmacotherapy at the time of the initial searches, the following terms were also included: “naltrexone,” “Revia,” “Vivitrol,” “acamprosate,” “Campral,” disulfiram,” “Antabuse,” and “Alcohol Deterrents”[MeSH]. After public review of the KQs and discussion with the TEP, studies of pharmacotherapy were removed from the inclusion criteria.
We manually searched reference lists of pertinent reviews, included trials, and background articles on this topic to look for any relevant citations that our searches might have missed. We imported all citations into an EndNote® X4 electronic database.
We searched for unpublished studies relevant to this review using ClinicalTrials.gov and the World Health Organization's International Clinical Trials Registry Platform.
Any literature suggested by Peer Reviewers or from the public was investigated and, if appropriate, incorporated into the final review. Appropriateness was determined by the same methods described throughout this section.
Study Selection
We developed eligibility (inclusion and exclusion) criteria with respect to patient populations, interventions, comparators, outcomes, timing, settings, and study designs and durations for each KQ (Table 2). For KQ 2, we focused on systematic reviews and meta-analyses, and we did not restrict the publication date. We supplemented the findings with information from other sources (TEP members, Peer Reviewers, or the public) to fill in important gaps. For all other KQs, we focused on controlled trials published no earlier than 1985 and systematic reviews/meta-analyses published in the last 5 years that directly address our KQs. We limited them to the last 5 years because we wanted to ensure that findings were sufficiently current; we did not need to rely on older systematic reviews and meta-analyses because we intended to conduct our own meta-analyses that would better reflect the current body of literature. We did not perform separate searches for system influences; evidence from studies included in KQs 1, 3, 4, 5, and 6 was used to address KQ 7.
For this review, results from well-conducted trials provide the strongest evidence to compare interventions with respect to efficacy, effectiveness, and harms. We defined controlled trials as those comparing screening with no screening (KQs 1 and 3) or one type of intervention and/or referral with another and/or with usual care (all other KQs). Studies of at least 6 months' duration were eligible for inclusion, and we did not impose any limits on sample size.
All titles and abstracts identified through searches were independently reviewed for eligibility against our inclusion/exclusion criteria by two trained members of the research team. Studies marked for possible inclusion by either reviewer underwent full-text review. For studies without adequate information to determine inclusion or exclusion, we retrieved the full text and then made the determination. All results were tracked in an EndNote database.
Each full-text article included during title/abstract review was independently reviewed by two trained members of the team for inclusion or exclusion based on the eligibility criteria described above. If both reviewers agreed that a study did not meet the eligibility criteria, the study was excluded. If the reviewers disagreed, conflicts were resolved by discussion and consensus or by consulting a third member of the review team. As described above, all results were tracked in an EndNote database. We recorded the reason that each excluded full-text publication did not satisfy the eligibility criteria and compiled a comprehensive list of such studies (Appendix B).
Data Extraction and Data Management
For studies that met our inclusion criteria, we abstracted important information into evidence tables. We designed and used structured data abstraction forms to gather pertinent information from each article, including characteristics of study populations, settings, interventions, comparators, study designs, methods, and results. Trained reviewers extracted the relevant data from each included article into the evidence tables. All data abstractions were reviewed for completeness and accuracy by a second member of the team. We recorded intention-to-treat (ITT) results if available. All data abstraction was performed using Microsoft Excel® software. Evidence tables containing all abstracted data of included studies are presented in Appendix C.
Quality Assessment
To assess the quality (internal validity) of studies, we used predefined criteria based on those developed by the USPSTF (ratings: good, fair, poor)45 and the University of York Centre for Reviews and Dissemination.46 In general terms, a “good” study has the least risk of bias and its results are considered to be valid. A “fair” study is susceptible to some bias but probably not sufficient to invalidate its results. A “poor” study has significant risk of bias (e.g., stemming from serious errors in design or analysis) that may invalidate its results.
Two independent reviewers assigned quality ratings for each study. For each article, one of the two reviewers was always an experienced/senior investigator (DJ or RH). Disagreements between the two reviewers were resolved by discussion and consensus or by consulting a third member of the team. We gave good quality ratings to studies that met all, or all but one, criteria. We gave poor quality ratings to studies that had a fatal flaw (defined as a methodological shortcoming that leads to a very high risk of bias) in one or more categories, and we excluded them from our analyses. Appendix D details the criteria used for evaluating the quality of all included studies.
Data Synthesis
Prioritization and/or categorization of outcomes were determined by the research team with input from TEP members. We separated evidence for adults, older adults, young adults and college students, and pregnant women. We conducted quantitative analyses using meta-analyses of outcomes reported by a sufficient number of studies that were homogeneous enough to justify combining their results. To determine whether quantitative analyses were appropriate, we assessed the clinical and methodological heterogeneity of the studies under consideration following established guidance.47 We did this by qualitatively assessing the PICOTS of the included studies, looking for similarities and differences. We stratified results by population, separating those for adults, young adults or college students, older adults, and pregnant women. When quantitative analyses were not appropriate (e.g., due to clinical heterogeneity, insufficient numbers of similar studies, or insufficiency or variation in outcome reporting), we synthesized the data qualitatively.
For our meta-analyses, our primary outcome was change in alcohol consumption (drinks per week) between baseline and 12 months for intervention groups compared with control groups. Some studies reported alcohol consumption over a different time period (e.g., past 30 days). For those studies, we converted the number of drinks into a weekly rate. In cases in which alcohol consumption was reported in gram units, we used a conversion factor of 13.7 grams as equivalent to a standard drink.48 Many studies did not report a variance measure of the mean change from baseline to endpoint, but included variance information at baseline and 12 months. We assumed a correlation of 0.5 to estimate the mean change variance49, 50 and conducted sensitivity analyses with assumed correlations of 0.3 and 0.7 to confirm that this assumption did not significantly change our results. Separate analyses were run for studies reporting 6-month alcohol consumption outcomes. We also ran meta-analyses for several other intermediate outcomes (e.g., heavy drinking episodes, achievement of recommended drinking limits) with sufficient data and for all-cause mortality. In addition to calculating an overall pooled point estimate, we calculated pooled point estimates for each category of intensity of the interventions. Intervention intensity was categorized as very brief (single contact, 5 minutes or less), brief (single contact, up to 15 minutes), extended (single contact, greater than 15 minutes), brief multicontact (multiple contacts, up to 15 minutes each), or extended multicontact (multiple contacts, one or more of them greater than 15 minutes). We also performed subgroup analyses for men and women to assess whether intervention effects differed by sex. Other subgroups were explored through separate analyses stratifying by each of the following: type of provider conducting the intervention, country, and whether the study included alcohol-dependent subjects.
Random-effects models were used to estimate pooled effects.51 For the primary outcome of alcohol consumption (drinks per week), the effect measure was the mean difference between behavioral counseling intervention and control. For the intermediate outcomes of heavy drinking episodes and achievement of recommended drinking limits, the percentages of patients at 12 months were compared with a risk difference. For all-cause mortality, because the followup period varied between trials, the analysis was based on number of deaths per person-year and the comparison between intervention and control was calculated as a risk ratio. Forest plots graphically summarize results of individual studies and of the pooled analysis (Appendix E).52
The chi-squared statistic and the I2 statistic (the proportion of variation in study estimates due to heterogeneity) were calculated to assess statistical heterogeneity in effects between studies.53, 54 An I2 from 0 to 40 percent might not be important, 30 percent to 60 percent may represent moderate heterogeneity, 50 percent to 90 percent may represent substantial heterogeneity, and ≥75 percent represents considerable heterogeneity.55 The importance of the observed value of I2 depends on the magnitude and direction of effects and on the strength of evidence for heterogeneity (e.g., p value from the chi-squared test, or a confidence interval for I2). Whenever including a meta-analysis with considerable statistical heterogeneity in this report, we provide an explanation for doing so, considering the magnitude and direction of effects.55 Potential sources of heterogeneity were examined by analysis of subgroups of study design, study quality, patient population, and variation in interventions. Heterogeneity was also explored through sensitivity analyses. We also conducted meta-regression for our primary analysis (change in alcohol consumption at 12 months) to assess the potential impact of geographic location of studies (United States vs. non-United States), severity of alcohol misuse (studies enrolling more than 10% of subjects with alcohol dependence), and type of provider delivering the intervention (primary care provider, nurse, researcher). Quantitative analyses were conducted using Stata® version 11.1 (StataCorp LP, College Station, TX) and Comprehensive Meta Analysis® version 2.2.055 (BioStat, Inc., Englewood, NJ).
Grading Strength of Evidence
We graded the strength of evidence based on the guidance established for the Evidence-based Practice Center Program.56 Developed to grade the overall strength of a body of evidence, this approach incorporates four key domains: risk of bias (includes study design and aggregate quality), consistency, directness, and precision of the evidence. We considered all evidence from intermediate outcomes to be indirect. It also considers other optional domains that may be relevant for some scenarios, such as a dose-response association, plausible confounding that would decrease the observed effect, strength of association (magnitude of effect), and publication bias.
Table 3 describes the grades of evidence that we assigned. We graded the strength of evidence for harms (KQs 3 and 5), the intermediate outcomes analyzed in KQ 4, and for morbidity, mortality, and other long-term health outcomes for KQ 6. Two reviewers assessed each domain for each key outcome, and differences were resolved by consensus. For each assessment, one of the two reviewers was always an experienced/senior investigator (DJ or RH).
Table 3
Definitions of the grades of overall strength of evidence.
Applicability Assessment
We assessed applicability of the evidence following guidance from the Methods Guide for Comparative Effectiveness Reviews.57 We used the PICOTS framework to explore factors that affect applicability. Some factors identified a priori that may limit the applicability of evidence included the following: age of enrolled populations; sex of enrolled populations (e.g., few women may be enrolled in studies); race/ethnicity of enrolled populations; few studies evaluating pregnant women, the elderly, or adolescents; and the use of interventions that may be difficult to incorporate into routine practice for many providers (i.e., they require substantial resources or time, they may be delivered by research staff rather than existing staff in the practice).
Peer Review and Public Commentary
An external peer review was performed on this report. Peer Reviewers were charged with commenting on the content, structure, and format of the evidence report, providing additional relevant citations, and pointing out issues related to how we conceptualized the topic and analyzed the evidence. Our Peer Reviewers (listed in the front matter) gave us permission to acknowledge their review of the draft. We compiled all comments and addressed each one individually, revising the text as appropriate. AHRQ also provided review from its own staff. In addition, the Scientific Resource Center placed the draft report on the AHRQ Web site (effectivehealthcare.ahrq.gov/) for public review.