How reliable is peer review? An examination of operating grant proposals simultaneously submitted to two similar peer review systems (original) (raw)

Peering at peer review revealed high degree of chance associated with funding of grant applications

Journal of Clinical Epidemiology, 2006

Background and Objectives: There is a persistent degree of uncertainty and dissatisfaction with the peer review process underlining the need to validate the current grant awarding procedures. This study compared the CLassic Structured Scientific In-depth two reviewer critique (CLASSIC) with an all panel members' independent ranking method (RANKING). Eleven reviewers, reviewed 32 applications for a pilot project competition at a major university medical center.

Peer Review of Grant Applications: Criteria Used and Qualitative Study of Reviewer Practices

PLoS ONE, 2012

Background: Peer review of grant applications has been criticized as lacking reliability. Studies showing poor agreement among reviewers supported this possibility but usually focused on reviewers' scores and failed to investigate reasons for disagreement. Here, our goal was to determine how reviewers rate applications, by investigating reviewer practices and grant assessment criteria.

Peer review of grant applications in biology and medicine. Reliability, fairness, and validity

Scientometrics, 2009

This paper examines the peer review procedure of a national science funding organization (Swiss National Science Foundation) by means of the three most frequently studied criteria reliability, fairness, and validity. The analyzed data consists of 496 applications for projectbased funding from biology and medicine from the year 1998. Overall reliability is found to be fair with an intraclass correlation coefficient of 0.41 with sizeable differences between biology (0.45) and medicine (0.20). Multiple logistic regression models reveal only scientific performance indicators as significant predictors of the funding decision while all potential sources of bias (gender, age, nationality, and academic status of the applicant, requested amount of funding, and institutional surrounding) are non-significant predictors. Bibliometric analysis provides evidence that the decisions of a public funding organization for basic project-based research are in line with the future publication success of applicants. The paper also argues for an expansion of approaches and methodologies in peer review research by increasingly focusing on process rather than outcome and by including a more diverse set of methods e.g. content analysis. Such an expansion will be necessary to advance peer review research beyond the abundantly treated questions of reliability, fairness, and validity.

Individual versus general structured feedback to improve agreement in grant peer review: a randomized controlled trial

Research Integrity and Peer Review

Background Vast sums are distributed based on grant peer review, but studies show that interrater reliability is often low. In this study, we tested the effect of receiving two short individual feedback reports compared to one short general feedback report on the agreement between reviewers. Methods A total of 42 reviewers at the Norwegian Foundation Dam were randomly assigned to receive either a general feedback report or an individual feedback report. The general feedback group received one report before the start of the reviews that contained general information about the previous call in which the reviewers participated. In the individual feedback group, the reviewers received two reports, one before the review period (based on the previous call) and one during the period (based on the current call). In the individual feedback group, the reviewers were presented with detailed information on their scoring compared with the review committee as a whole, both before and during the r...

'Your comments are meaner than your score': score calibration talk influences intra-and inter-panel variability during scientific grant peer review

In scientific grant peer review, groups of expert scientists meet to engage in the collaborative decision-making task of evaluating and scoring grant applications. Prior research on grant peer review has established that inter-reviewer reliability is typically poor. In the current study, experienced reviewers for the National Institutes of Health (NIH) were recruited to participate in one of four constructed peer review panel meetings. Each panel discussed and scored the same pool of recently reviewed NIH grant applications. We examined the degree of intra-panel variability in panels' scores of the applications before versus after collaborative discussion, and the degree of inter-panel variability. We also analyzed videotapes of reviewers' interactions for instances of one particular form of discourse-Score Calibration Talk-as one factor influencing the variability we observe. Results suggest that although reviewers within a single panel agree more following col-laborative discussion, different panels agree less after discussion, and Score Calibration Talk plays a pivotal role in scoring variability during peer review. We discuss implications of this variability for the scientific peer review process. As the primary means by which scientists secure funding for their research programs, grant peer review is a keystone of scientific research. The largest funding agency for biomedical, behavioral, and clinical research in the USA, the National Institutes of Health (NIH), spends more than 80% of its $30.3 billion annual budget on funding research grants evaluated via peer review (NIH 2016). As part of the mechanism by which this money is allocated to scientists, collaborative peer review panels of expert scientists (referred to as 'study sections' at NIH) convene to evaluate grant applications and assign scores that inform later funding decisions by NIH govern-ance. Thus, deepening our understanding of how peer review ostensibly identifies the most promising, innovative research is crucial for the scientific community writ large. The present study builds upon existing work evaluating the reliability of peer review by examining how the discourse practices of reviewers during study section meetings may contribute to low reliability in peer review outcomes. The NIH peer review process is structured around study sections that engender distributed expertise (Brown et al. 1993), as reviewers evaluate applications based on their particular domain(s) of expertise but then share their specialized knowledge with others who have related but distinct expertise. The very structure of study sections

Low agreement among reviewers evaluating the same NIH grant applications

Obtaining grant funding from the National Institutes of Health (NIH) is increasingly competitive, as funding success rates have declined over the past decade. To allocate relatively scarce funds, scientific peer reviewers must differentiate the very best applications from comparatively weaker ones. Despite the importance of this determination , little research has explored how reviewers assign ratings to the applications they review and whether there is consistency in the reviewers' evaluation of the same application. Replicating all aspects of the NIH peer-review process, we examined 43 individual reviewers' ratings and written critiques of the same group of 25 NIH grant applications. Results showed no agreement among reviewers regarding the quality of the applications in either their qualitative or quantitative evaluations. Although all reviewers received the same instructions on how to rate applications and format their written critiques, we also found no agreement in how reviewers "translated" a given number of strengths and weaknesses into a numeric rating. It appeared that the outcome of the grant review depended more on the reviewer to whom the grant was assigned than the research proposed in the grant. This research replicates the NIH peer-review process to examine in detail the qualitative and quantitative judgments of different reviewers examining the same application, and our results have broad relevance for scientific grant peer review. peer review | social sciences | interrater reliability | linear mixed-effects models I n the past decade, funding at the National Institutes of Health (NIH) has increased at a much slower rate (1) than the number of grant applications (2), and consequently, success rates have steadily declined (3). There are more deserving grant applications than there are available funds, so it is critical to ensure that the process responsible for awarding such funds-grant peer review-reliably differentiates the very best applications from the comparatively weaker ones. Research on grant peer review is inconclusive: Some studies suggest that it is unreliable (4-13) and potentially biased (14-17), whereas others show the validity of review systems and final outcomes (18-20). However, even if peer review effectively discriminates the good applications from the bad, it is now imperative to empirically assess whether, in this culture of decreasing funding rates, it can discriminate the good from the excellent within a pool of high-quality applications. As Chubin and Hackett (21) argue, intensified competition for resources harms peer review because funding decisions rely on an evaluation process that is not designed to distinguish among applications of similar quality-a scenario that they argue is most prevalent at the NIH. Indeed, the findings in the present paper suggest that, in fact, reviewers are unable to differentiate excellent applications (i.e., those funded by the NIH in the first round) from good applications (i.e., those unfunded but later funded by the NIH after subsequent revisions). Because the grant peer-review process at NIH is confidential, the only way to systematically examine it is to replicate the process outside of the NIH in a highly realistic manner. This is precisely what we did in the research reported in this paper. We recruited 43 oncology researchers from across the United States to participate in one of four peer-review panels (called "study sections" at NIH), each composed of 8-12 reviewers. Fig. 1 presents a deidentified image from one study section meeting. We solicited 25 oncology grant applications submitted to NIH as R01s-the most competitive and highly sought after type of grant at NIH-between 1 and 4 y before our study. Sixteen of these were funded in the first round (i.e., the best applications), whereas 9 of these were funded only after subsequent resubmission (i.e., the good applications). The NIH uses a two-stage review process. In the first stage, two to five reviewers individually evaluate each grant application by assigning a preliminary rating using the NIH's reverse 9-point scale (1 = exceptional, 9 = poor) and writing a critique describing the application's strengths and weaknesses. Most typically, three reviewers are assigned to an application: a primary, a secondary, and a tertiary reviewer, ranked in order of the relevance of their expertise. Reviewers then convene in study section meetings, where they discuss the applications that received preliminary ratings in the top half of all applications evaluated. After sharing their preliminary ratings and critiques, the two to five assigned reviewers discuss the application with all other study section members, all of whom assign a final rating to the application. This final rating from all members is averaged into a final "pri-ority score." In the second stage, members of NIH's advisory councils use this priority score and the written critiques to make Significance Scientific grant peer reviewers must differentiate the very best applications from comparatively weaker ones. Despite the importance of this determination in allocating funding, little research has explored how reviewers derive their assigned ratings for the applications they review or whether this assessment is consistent when the same application is evaluated by different sets of reviewers. We replicated the NIH peer-review process to examine the qualitative and quantitative judgments of different reviewers examining the same grant application. We found no agreement among reviewers in evaluating the same application. These findings highlight the subjectivity in reviewers' evaluations of grant applications and underscore the difficulty in comparing the evaluations of different applications from different reviewers-which is how peer review actually unfolds.

What do we know about grant peer review in the health sciences?

F1000Research, 2017

Background: Peer review decisions award >95% of academic medical research funding, so it is crucial to understand how well they work and if they could be improved. Methods: This paper summarises evidence from 105 relevant papers identified through a literature search on the effectiveness and burden of peer review for grant funding. Results: There is a remarkable paucity of evidence about the overall efficiency of peer review for funding allocation, given its centrality to the modern system of science. From the available evidence, we can identify some conclusions around the effectiveness and burden of peer review. The strongest evidence around effectiveness indicates a bias against innovative research. There is also fairly clear evidence that peer review is, at best, a weak predictor of future research performance, and that ratings vary considerably between reviewers. There is some evidence of age bias and cronyism. Good evidence shows that the burden of peer review is high and th...

What do we know about grant peer review in the health sciences? An updated review of the literature and six case studies

2018

In 2009, RAND Europe conducted a literature review in order to assess the effectiveness and efficiency of peer review for grant funding. This report presents an update to that review to reflect new literature on the topic, and adds case studies exploring peer review practice at six international funders. This report was produced with funding from the Canadian Institutes of Health Research. It will be of interest to government officials dealing with research funding policy, research funders including governmental and charitable funders, research institutions, researchers, and research users. Although the case studies focus on biomedical and health research, the literature review takes a broader scope and it is likely the findings may be of relevance to wider research fields.

The Necessity of Commensuration Bias in Grant Peer Review

Ergo, 2022

Peer reviewers at many funding agencies and scientific journals are asked to score submissions both on individual criteria and overall. The overall scores should be some kind of aggregate of the criteria scores. Carole Lee identifies this as a potential locus for bias to enter the peer review process, which she calls commensuration bias. Here I view the aggregation of scores through the lens of social choice theory. I argue that in many situations, especially when reviewing grant proposals, it is impossible to avoid commensuration bias.