The Role of Formative Evaluation in Implementation Research and the QUERI Experience (original) (raw)

Abstract

This article describes the importance and role of 4 stages of formative evaluation in our growing understanding of how to implement research findings into practice in order to improve the quality of clinical care. It reviews limitations of traditional approaches to implementation research and presents a rationale for new thinking and use of new methods. Developmental, implementation-focused, progress-focused, and interpretive evaluations are then defined and illustrated with examples from Veterans Health Administration Quality Enhancement Research Initiative projects. This article also provides methodologic details and highlights challenges encountered in actualizing formative evaluation within implementation research.

Keywords: process assessment (health care), evaluation methodology, evaluation studies


As health care systems struggle to provide care based on well-founded evidence, there is increasing recognition of the inherent complexity of implementing research into practice. Health care managers and decision makers find they need a better understanding of what it takes to achieve successful implementation, and they look to health care researchers to provide this information. Researchers in turn need to fill this need through collection of new, diverse sets of data to enhance understanding and management of the complex process of implementation.

A measurement approach capable of providing critical information about implementation is formative evaluation (FE). Formative evaluation, used in other social sciences, is herein defined as a rigorous assessment process designed to identify potential and actual influences on the progress and effectiveness of implementation efforts. Formative evaluation enables researchers to explicitly study the complexity of implementation projects and suggests ways to answer questions about context, adaptations, and response to change.

The Department of Veterans Affairs (VA) Quality Enhancement Research Initiative (QUERI) has integrated FE into its implementation program.13 This article introduces QUERI and its implementation focus. It then describes research challenges that call for the use of FE in this specialized field of study, reviews FE relative to QUERI implementation research, identifies 4 evaluative stages, and presents challenges to the conduct of FE.

THE VETERAN HEALTH ADMINISTRATION'S QUERI PROGRAM

The Quality Enhancement Research Initiative, begun in 1998, is a comprehensive, data-driven, outcomes-based, and output-oriented improvement initiative.2,3 It focuses on identification and implementation of empirically based practices for high-risk/high-volume conditions among the veteran population and on the evaluation and refinement of these implementation efforts.3 The Quality Enhancement Research Initiative's innovative approach14 calls upon researchers to work toward rapid, significant improvements through the systematic application of best clinical practices. It also calls upon researchers to study the implementation process to enhance and continuously refine these quality improvement (QI) efforts.14

Classic intervention research methods5,6 provide the means to evaluate targeted outcomes of implementation/QI efforts. From an evaluation perspective, studies using intervention designs, such as a cluster-randomized trial or quasi-experimental approaches, routinely include a summative evaluation. Summative evaluation is a systematic process of collecting data on the impacts, outputs, products, or outcomes hypothesized in a study.7 Resulting data provide information on the degree of success, effectiveness, or goal achievement of an implementation program.

In an action-oriented improvement program, such as QUERI, summative data are essential but insufficient to meet the needs of implementation/QI researchers. Evaluative information is needed beyond clinical impact of the change effort and beyond discovering whether a chosen adoption strategy worked. Implementation researchers need to answer critical questions about the feasibility of implementation strategies, degree of real-time implementation, status and potential influence of contextual factors, response of project participants, and any adaptations necessary to achieve optimal change. Formative evaluation provides techniques for obtaining such information and for overcoming limitations identified in early implementation/QI studies.

NEED FOR FE IN IMPLEMENTATION/QI RESEARCH

The RE-AIM framework of Glasgow and colleagues highlights critical information that is missing from current research publications—i.e.,information needed to evaluate a study's potential for translation and public health impact.8,9 Such information includes the efficacy/effectiveness of an intervention, its reach relative to actual/representative subject participation rate, its adoption relative to actual/representative setting participation rate, its implementation or intervention fidelity, and its maintenance over time.

The focus of the RE-AIM framework is the study of health promotion interventions. Similar issues must be addressed during implementation research if potential adopters are to replicate critical implementation processes. In addition, implementation researchers need to capture in-depth information on participant and contextual factors that facilitate or hinder successful implementation. Such factors can be used during the project to optimize implementation and inform post hoc interpretation.

As implementation efforts can be a relatively messy and complex process, traditional study designs alone are often inadequate to the task of obtaining evaluative information. For example, randomized clinical trials (RCT) may leave questions important to system-wide uptake of targeted research unanswered. As Stead et al.10,11 suggest, traditional intervention research can fail to “capture the detail and complexity of intervention inputs and tactics” (10, p. 354), thereby missing the true nature of interventions as well as significant organizational factors important for replication.10,11

Another argument for performing FE has been highlighted in guideline/QI literature, i.e., the need to address potential interpretive weaknesses. Such weaknesses relate to a failure to account for key elements of the implementation process and may lead to unexplainable and/or poor results. For example, Ovretveit and Gustafson12 identified implementation assessment failure, explanation failure, and outcome attribution failure. Implementation assessment failure can lead to a “Type III” error, where erroneous study interpretations occur because the intervention was not implemented as planned.12,13Explanation and outcome attribution relate to failures to explore the black box of implementation. Specifically, what actually did/did not happen within the study relative to the implementation plan, and what factors in the implementation setting, anticipated or unanticipated, influenced the actual degree of implementation? By failing to collect such data, potential study users have little understanding of a particular implementation strategy. For example, 1 study regarding opinion leadership did not report the concurrent implementation of standing orders.14

Use of a traditional intervention design does not obviate collection of the critical information cited above. Rather, complementary use of FE within an experimental study can create a dual or hybrid style approach for implementation research.15 The experimental design is thus combined with descriptive or observational research that employs a mix of qualitative and quantitative techniques, creating a richer dataset for interpreting study results.

FORMATIVE EVALUATION WITHIN QUERI

As with many methodologic concepts, there is no single definition/approach to FE. In fact, as Dehar et al.16 stated, there is a decided “lack of clarity and some disagreement among evaluation authors as to the meaning and scope” of related concepts (16, p. 204; see Table 1 for a sampling). Variations include differences in terminology, e.g., an author may refer to FE, process evaluation, or formative research.16,17

Table 1.

A Spectrum of Definitions of Formative Evaluation

“Evaluative activities undertaken during the design and pretesting of programs to guide the design process”17
“… a method of judging the worth of a program while the program activities are forming or happening. Formative evaluation focuses on the process”18
An assessment that focuses on “the internal dynamics and actual operations of a program in order to understand its strengths and weaknesses and changes that occur in it over time”19

Given a mission to make rapid, evidence-based improvements to achieve better health outcomes, the authors have defined FE as a rigorous assessment process designed to identify potential and actual influences on the progress and effectiveness of implementation efforts. Related data collection occurs before, during, and after implementation to optimize the potential for success and to better understand the nature of the initiative, need for refinements, and the worth of extending the project to other settings. This approach to FE incorporates aspects of the last 2 definitions in Table 1 and concurs with the view that formative connotes action.16 In QUERI, this action focus differentiates FE from “process” evaluations where data are not intended for concurrent use.

Various uses of FE for implementation research are listed in Table 2. Uses span the timeframe or stages of a project, i.e., development/diagnosis, implementation, progress, and interpretation. Within QUERI, these stages are progressive, integrated components of a single hybrid project. Each stage is described below, in the context of a single project, and illustrated by QUERI examples (Tables 3, 4, and 610). Each table provides an example of 1 or more FE stages. However, as indicated in some of the examples, various evaluative activities can serve multiple stages, which then merge in practice. Formative evaluation at any stage requires distinct plans for adequate measurement and analysis.

Table 2.

Potential Uses of Formative Evaluation10,13,16,2027

Understand the nature of the local implementation setting
Assess whether a program or intervention addresses a significant need
Modify a proposed program or intervention, as needed
Determine the extent, fidelity, and qualities of the implementation of an intervention program … (e.g., to) describe the activities actually implemented. … (and) … explain program operations21
Systematically detect and monitor unanticipated events (and adjust if appropriate)
Optimize/control implementation to improve the potential for success
Obtain ongoing input for short-term adjustments
Document continual progress
Inform future similar implementation efforts, e.g., within other health care sites or a larger system
Avoid type III errors: “Failing to detect differences between the original intervention plan and the ultimate manner of implementation”13; or failure to understand how complex the phenomena of interest really are
Understand the extent/dose, consistency, usefulness, context, and quality of an intervention's implementation
Understand the nature and implications of local adaptation
Assist interpretation of program outcomes or worth in terms of the effort required to achieve a designated level of improvement
Foster an understanding of the causal events leading to change and the specific components of the intervention that most influenced it20
Standardize on-going implementation
Understand the experience of those directly affected by implementation efforts

Table 3.

An Example of Developmental FE

Spinal cord injury (SCI) QUERI: respiratory vaccine initiative
Implementation goal: one evidence-based strategy to improve preventive care delivery rates is a practitioner reminder system. As the VHA had a nationally developed and distributed computerized clinical reminder (CCR) for influenza that did not appear to be in wide use at study sites, SCI QUERI set a goal of having the CCR installed locally and used by practitioners in the SCI centers across the country.
FE activity: electronic questionnaires with follow-up interviews about the influenza CCR were administered to computer application coordinators at VA facilities with SCI centers and to personnel at each SCI center. From these data it was clear that the current version of the reminder was installed at all locations, but that local methods of installing and using the CCR varied widely. This indicated that the initial plan to develop a standard set of instructions was inappropriate. As an alternative, the research team developed general principles and encouraged SCI personnel to seek further help from their local information technology staff.35,36
Value: the information from this activity guided the research team's choice of methods to assist SCI center personnel to move toward the goal of using the CCR.

Table 4.

Implementation-Focused FE

Congestive heart failure (CHF) QUERI
Implementation goals: four clinical strategies were identified from research to improve health status and reduce readmission rate costs for CHF patients, i.e., identifying patients, determining readiness for discharge, patient education, and rigorous outpatient follow-up.41,42 One evidence-based recommendation within the patient education component was weight management. To self-manage weight, patients were asked to weigh themselves every day.
FE activity: following development of individualized implementation intervention plans and timelines for each site, CHF QUERI staff conducted weekly teleconferences with key local contacts. Using structured interview questions, barriers and facilitators were identified and potential strategies to resolve the barriers were discussed. The following week, sites were provided with any modification of procedures or forms (mid-course corrections).
One barrier to accomplishing the weight management goal was uncovered during weekly calls. The Prosthetic Department at 1 site did not interpret current policy in a manner that allowed provision of scales for CHF patients. The research team of CHF QUERI was able to negotiate a policy change, so access to a scale was no longer a barrier to self-monitoring fluid retention when patients were discharged.
Value: the weekly teleconferencing and problem resolutions changed the perceptions of clinical staff that this research was not related to day-to-day activities. The calls also provided a model of problem-solving behaviors for the clinical staff (e.g.,Plan, Do, Study, Act) as well as information for the research team that enabled “user friendly” modifications of original project tools. Overall, changes at the system level improved the probability that patient education would lead to better self-monitoring of weight.

Table 6.

Implementation and Progress-Focused FE

Substance use disorders QUERI
Implementation goal: improve opioid agonist clinics' implementation of 4 best-practice recommendations: (1) adequate dosing, (2) adequate counseling support, (3) maintenance orientation, and (4) contingency management interventions.
FE activity50,51:
i. Prototype toolkits for implementing QI and monitoring clinic practices and patient outcomes were distributed to 8 pilot clinics. A minimum of monthly contact between implementation team staff and clinic staff assessed barriers, strategies to overcome barriers, suggestions for improving tools or adding additional tools, and progress. Toolkits were modified to meet the needs of the clinics. Satisfaction with the modifications was evaluated on the next scheduled call.
ii. Pilot clinics also submitted monthly data on practice variables (e.g., mean dose, counseling frequency) and patient outcomes (e.g., percent of urine screens positive for illicit substances).
Value:
i. The original toolkit was modified to improve its usefulness. The exchange of information and responsiveness of the implementation team to clinic suggestions improved working relationships.
ii. Data were used for QI goal setting and modification as initial goals were achieved. Data were also used to monitor progress and provide feedback to sites. Formative evaluation allowed quick identification of stagnation of progress that suggested the need for increased facilitation or introduction of new tools to sustain enthusiasm for the intervention.

Table 10.

An Illustrative, Potential FE

Ischemic heart disease QUERI
Implementation goal: pilot intervention teams from 8 hospitals developed and implemented strategies to increase measurement and management of low-density lipoprotein cholesterol (LDL-c) in coronary heart disease patients. Interventions included audit/feedback, patient education, pharmacist case management, lipid clinics, order templates, and paper point-of-care reminders.
FE activity (an interpretive post hoc evaluation or missed FE opportunity60): researchers conducted structured interviews with participants to identify barriers and facilitators to implementation of pilot interventions. Interviews specifically addressed awareness of and agreement with secondary prevention guidelines, priorities for intervention activities during the patient visit, availability of laboratory and pharmacy data, and effectiveness of intervention planning and implementation. Results were organized and interpreted using theory-based content analysis based on the Promoting Action Research in Health Systems (PARIHS) framework, which identifies 3 components of successful implementation of evidence-based practice: evidence, context, and facilitation.9,38
Value: the analysis identified barriers to successful implementation that related primarily to the intervention process and secondarily to characteristics of the intervention context. Interview responses indicated that planning, including identification of resources and potential barriers and facilitators, was a critical and universally underutilized step in the intervention process. Data from the interviews suggested tools and guidelines to improve planning and implementation skills.

Developmental Evaluation

Developmental evaluation occurs during the first stage of a project and is termed a diagnostic analysis.1,28 It is focused on enhancing the likelihood of success in the particular setting/s of a project, and involves collection of data on 4 potential influences: (a) actual degree of less-than-best practice; (b) determinants of current practice; (c) potential barriers and facilitators to practice change and to implementation of the adoption strategy; and (d) strategy feasibility, including perceived utility of the project. (Note: studies conducted to obtain generic diagnostic information prior to development of an implementation study are considered formative research, not FE. Even if available, a diagnostic analysis is suggested given the likelihood that generically identified factors will vary across implementation sites.)

Activity at this stage may involve assessment of known prerequisites or other factors related to the targeted uptake of evidence, e.g., perceptions regarding the evidence, attributes of the proposed innovation, and/or administrative commitment.11,21,2931 Examples of formative diagnostic tools used within QUERI projects include organizational readiness and attitude/belief surveys32,33 (also see Tables 3 and 7). Such developmental data enable researchers to understand potential problems and, where possible, overcome them prior to initiation of interventions in study sites.

Table 7.

Developmental/Implementation/Progress FE

SCI QUERI: respiratory vaccine initiative
Implementation goal: research evidence indicates that persons who have negative attitudes about influenza vaccine (safety, effectiveness, utility) are less likely to seek it out annually. So, SCI QUERI identified both providers and patients to receive educational information to directly and indirectly reduce such negative attitudes.
FE activity: attitude data were collected from both SCI staff52 and patients53 longitudinally over 3 years by questionnaire to both identify initial issues with beliefs about vaccination and to track changes over time.
Value: data regarding prevalent negative attitudes were used to construct targeted questionnaires as well as educational materials to counter these attitudes. For example, staff materials during the second year were changed to include current data regarding their patients in terms of the percent who endorsed specific incorrect statements about vaccines.

In addition to information available from existent databases about current practice or setting characteristics, formative data can be collected from experts and representative clinicians/administrators. For example, negative unintended consequences might be prospectively identified by key informant or focus group interviews. This participatory approach may also facilitate commitment among targeted users.34

Implementation-Focused Evaluation

This type of FE occurs throughout implementation of the project plan. It focuses on analysis of discrepancies between the plan and its operationalization and identifies influences that may not have been anticipated through developmental activity. As Hulscher et al. note in a relevant overview of “process” evaluation, FE allows “researchers and implementers to (a) describe the intervention in detail, (b) evaluate and measure actual exposure to the intervention, and (c) describe the experience of those exposed (13, p. 40)”— concurrently. It also focuses on the dynamic context within which change is taking place, an increasingly recognized element of implementation.3740

Implementation-focused formative data enable researchers to describe and understand more fully the major barriers to goal achievement and what it actually takes to achieve change, including the timing of project activities. By describing the actuality of implementation, new interventions may be revealed. In terms of timing, formative data can clarify the true length of time needed to complete an intervention, as failure to achieve results could relate to insufficient intervention time.

Implementation-focused formative data also are used to keep the strategies on track and as a result optimize the likelihood of affecting change by resolving actionable barriers, enhancing identified levers of change, and refining components of the implementation interventions. Rather than identify such modifiable components on a post hoc basis, FE provides timely feedback to lessen the likelihood of type III errors (see Tables 4, 6, 7, and 9).

Table 9.

Implementation/Interpretive FE

HIV/AIDS QUERI
Implementation goal: as in many conditions, HIV care processes fall short of best practice recommendations. We sought to implement and evaluate real-time computerized clinical reminders (CR) and a collaborative intensive quality improvement program, based on the Institute for healthcare improvement breakthrough series (IQS).
FE activity:
i. By conducting ethnographic interviews and observations at the participating CR sites, we assessed existing organizational barriers and facilitators regarding use of reminders.
ii. The degree to which each site implemented the IQS improvement technique was measured at regular intervals using a Site Activation Scale (SAS) developed specifically for this project. Longitudinal tracking of SAS scores allowed calculation of time-to-improvement as well as qualitative comparisons of activation. We also assessed whether specific barriers or characteristics were associated with how well a site scored on activation.
Value:
i. We identified 6 barriers to the effective use of the CR intervention, including workload, lack of time to follow-up, inapplicability, limited training, interruption of patient-provider face time, and use of paper for physician orders at some sites. Seventeen prioritized short- and long-term recommendations were generated to improve the usefulness and usability of the 9 reminders.59
ii. By using the SAS scoring system, we were able to identify that higher performing IQS sites rapidly adopted and applied basic quality improvement concepts, like the PDSA (PLAN, Do, Study, Act). In contrast, we were able to see that lower performing IQS sites were slow to begin applying key concepts and had not completed an entire PDSA cycle well into an action period. Although the contrast between levels of activation was not always marked, the sites showing the most rapid adoption and generalization of quality improvement concepts tended to show significant improvements in care. IQS FE was also used to refine the intervention, to some extent, during the project, and to a major extent for the next “nationwide rollout” phase.

In summary, FE data collected at this stage offer several advantages. They can (a) highlight actual versus planned interventions, (b) enable implementation through identification of modifiable barriers, (c) facilitate any needed refinements in the original implementation intervention, (d) enhance interpretation of project results, and (e) identify critical details and guidance necessary for replication of results in other clinical settings.

Measurement within this stage can be a simple or complex task. Table 5 describes several critical issues that researchers should consider. As with other aspects of FE, both quantitative and qualitative approaches can be used.

Table 5.

Critical Measures of Implementation

1. Integrity of the innovation4345
Data are needed that specify the fidelity of the implementation intervention, i.e., the extent to which the intervention was actually implemented or delivered. For example, to what extent did the opinion leader “lead”; in what way and to what degree did administration provide “support”; or to what extent did a case manager or educational outreach worker perform role expectations?
Clear operational definitions for each component of the implementation intervention are required. These should relate to the conceptual rationale for the intervention's selection.
When variable degrees of delivery are possible, the dose of the intervention delivery should be measured. If relevant, this quantitative score may be correlated with targeted outcomes.
2. Exposure to the innovation
Data are needed that specify the degree to which delivered products are actually experienced by the targeted users. For example, education may be 1 element of an implementation strategy that is delivered, yet may only reach or be accessed by a limited number of targeted users; likewise, written or web-based materials may be disseminated but not received or read.
Clear operational definitions for each related exposure are required. These too should relate to the conceptual rationale for the intervention's selection.
When variable degrees of experience are possible, the dose of the exposure should be measured. This quantitative score can be correlated with targeted outcomes. Also, it is possible that an intermediate result of exposure should be measured, i.e., a change in attitude or knowledge, or engagement in a targeted process, rather than mere presence/attention.
3. Intensity of implementation
When multifaceted interventions are used, an overall implementation or intensity score can be considered. This overall level of effort may relate to the number of change interventions. However, it is probably more productive to measure intensity both within and across conceptual categories of interventions, e.g., efforts geared to change knowledge versus behavior versus systems.
An overall score per site can be calculated as Boyd and Windsor46 did with their numeric “Program Implementation Index.” Another approach is use of goal attainment scaling.47,48 This enables comparison across sites or units within a site, especially in those studies where “local adaptation” or choice of alternative interventions is a selected strategy.

Progress-Focused Evaluation

This type of FE occurs during implementation of study strategies, but focuses on monitoring impacts and indicators of progress toward goals. The proactive nature of FE is emphasized, as progress data become feedback about the degree of movement toward desired outcomes. Using implementation data on dose, intensity, and barriers, factors blocking progress may be identified. Steps can then be taken to optimize the intervention and/or reinforce progress via positive feedback to key players. As Krumholz and Herrin49 suggest, waiting until implementation is completed to assess results “obscures potentially important information … about trends in practice during the study [that] could demonstrate if an effort is gaining momentum—or that it is not sustainable” (see Tables 6 and 7).

Interpretive Evaluation

This stage is usually not considered a type of FE but deserves separate attention, given its role in the illumination of the black box of implementation/change. Specifically, FE data provide alternative explanations for results, help to clarify the meaning of success in implementation, and enhance understanding of an implementation strategy's impact or “worth.” Such “black box” interpretation occurs through the end point triangulation of qualitative and quantitative FE data, including associational relationships with impacts.

Interpretive FE uses the results of all other FE stages. In addition, interpretive information can be collected at the end of the project about key stakeholder experiences. Stakeholders include individuals expected to put evidence into practice as well as those individuals expected to support that effort. These individuals can be asked about their perceptions of the implementation program, its interventions, and changes required of them and their colleagues.10,13,27,38,46,54 Information can be obtained on stakeholder views regarding (a) usefulness or value of each intervention, (b) satisfaction or dissatisfaction with various aspects of the process, (c) reasons for their own program-related action or inaction, (d) additional barriers and facilitators, and (e) recommendations for further refinements.

Information can also be obtained regarding the degree to which stakeholders believe the implementation project was successful, as well as the overall “worth” of the implementation effort. Statistical significance will be calculated using the summative data. However, as inferential statistical significance does not necessarily equate with clinical significance, it is useful to obtain perceptions of stakeholders relative to the “meaning” of statistical findings. For some stakeholders, this meaning will be placed in the context of the cost of obtaining the change relative to its perceived benefits (see Tables 810).

Table 8.

Interpretive FE

Mental health (MH) QUERI
Implementation goal: an antipsychotic treatment improvement program (ATIP) was designed to implement an evidence-based treatment model for patients with schizophrenia. A second program, TIDES/WAVES, was designed to improve care for patients with depression.
FE activity: for both projects a series of formative evaluations continuously assessed the extent and nature of facilitation by the research team, intervention activities conducted by site personnel, and related barriers and facilitators.
Value:
i. In the ATIP project, FE demonstrated a correlation between the amount of external facilitation and improvement in antipsychotic dosing, the extent to which lack of clinician time inhibits dissemination and QI activities, and the acceptability and utility of MH QUERI tools/products.
ii. In the depression project, FE has generated information about factors influencing caseload capacity for nurse “care managers” who maintain regular contact with depressed patients to monitor treatment adherence, assess symptomatic improvement, provide patient education, and facilitate communication among patients, primary care clinicians, and mental health specialists.
Substance use disorder QUERI
Implementation goal: the opioid agonist therapy effectiveness initiative was designed to implement evidence-based treatment for patients with opioid dependence.50,51
FE activity: during the project, the research team took notes on all facilitation-related telephone calls. They also collected monthly, semi-structured interview data from program leaders. As the project moved forward, it was clear that some sites were very successful in applying the treatment plan while others stagnated.
Value: while the FE provided some suggestions about how to modify the intervention and tools, the data were also useful to accurately “tell the story” of the individual clinics and to illustrate differences between clinics that were successful with implementation versus those that were not.

Formative evaluation, as a descriptive assessment activity, does not per se test hypotheses. However, within an experimental study, in-depth data from a concurrent FE can provide working hypotheses to explain successes or failures, particularly when the implementation and evaluation plans are grounded in a conceptual framework.5557 In this respect, interpretive FE may be considered as case study data that contribute to theory building.58 Overall, FE data may provide evidence regarding targeted components of a conceptual framework, insights into the determinants of behavior or system change, and hypotheses for future testing.

CHALLENGES OF CONDUCTING FE

Formative evaluation is a new concept as applied to health services research and as such presents multiple challenges. Some researchers may need help in understanding how FE can be incorporated into a study design. Formative evaluation is also a time-consuming activity and project leaders may need to be convinced of its utility before committing study resources. In addition, much is yet to be learned about effective approaches to the following types of issues:

  1. In the well-controlled RCT, researchers do not typically modify an experimental intervention once approved. However, in improvement-oriented research, critical problems that prevent an optimal test of the planned implementation can be identified and resolved. Such actions may result in alterations to the original plan. The challenge for the researcher is to identify that point at which modifications create a different intervention or add an additional intervention. Likewise, when the researcher builds in “local adaptation,” the challenge is to determine its limits or clarify statistical methods available to control for the differences. An implementation framework and clear identification of the underlying conceptual nature of each intervention can facilitate this process. As Hawe et al.43 suggest, the researcher has to think carefully about the “essence of the intervention” in order to understand the actual nature of implementation and the significance of formative modifications.
  2. Implementation and QI researchers may encounter the erroneous view that FE involves only qualitative research or that it is not rigorous, e.g., that it consists of “just talking to a few people”. However, FE does not lack rigor nor is it simply a matter of qualitative research or a specific qualitative methodology. Rather, FE involves selecting among rigorous qualitative and quantitative methods to accomplish a specific set of aims, with a plan designed to produce credible data relative to explicit formative questions.61
  3. A critical challenge for measurement planning is selection or development of methods that yield quantitative data for the following types of issues: (a) assessment of associations between outcome findings and the intensity, dose, or exposure to interventions and (b) measurement of the adaptations of a “standard” protocol across diverse implementation settings.62 Whether flexibility is a planned or unplanned component of a study, it should be measured in some consistent, quantifiable fashion that enables cross-site comparisons. Goal attainment scaling is 1 possibility.47,48
  4. A final issue facing implementation researchers is how to determine the degree to which FE activities influence the results of an implementation project. If FE itself is an explicit intervention, it will need to be incorporated into recommendations for others who wish to replicate the study's results. More specifically, the researcher must systematically reflect upon why formative data were collected, how they were used, by whom they were used, and to what end. For example, to what extent did FE enable refinement to the implementation intervention such that the likelihood of encountering barriers in the future is adequately diminished? Or, in examining implementation issues across study sites, to what extent did FE provide information that led to modifications at individual sites? If the data and subsequent adjustments at individual sites were deemed critical to project success, upon broader dissemination to additional sites, what specific FE activities should be replicated, and by whom?

SUMMARY

Formative evaluation is a study approach that is often key to the success, interpretation, and replication of the results of implementation/QI projects. Formative evaluation can save time and frustration as data highlight factors that impede the ability of clinicians to implement best practices. It can also identify at an early stage whether desired outcomes are being achieved so that implementation strategies can be refined as needed; it can make the realities and black box nature of implementation more transparent to decision makers; and it can increase the likelihood of obtaining credible summative results about effectiveness and transferability of an implementation strategy. Formative evaluation helps to meet the many challenges to effective implementation and its scientific study, thereby facilitating integration of research findings into practice and improvement of patient care.

Acknowledgments

The work reported here was supported by the Department of Veterans Affairs, Veterans Health Administration, Health Services Research and Development Service. The views expressed in this article are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs.

REFERENCES