How reliable is peer review? An examination of operating grant proposals simultaneously submitted to two similar peer review systems (original) (raw)

Peering at peer review revealed high degree of chance associated with funding of grant applications

Journal of Clinical Epidemiology, 2006

Background and Objectives: There is a persistent degree of uncertainty and dissatisfaction with the peer review process underlining the need to validate the current grant awarding procedures. This study compared the CLassic Structured Scientific In-depth two reviewer critique (CLASSIC) with an all panel members' independent ranking method (RANKING). Eleven reviewers, reviewed 32 applications for a pilot project competition at a major university medical center.

Peer Review of Grant Applications: Criteria Used and Qualitative Study of Reviewer Practices

PLoS ONE, 2012

Background: Peer review of grant applications has been criticized as lacking reliability. Studies showing poor agreement among reviewers supported this possibility but usually focused on reviewers' scores and failed to investigate reasons for disagreement. Here, our goal was to determine how reviewers rate applications, by investigating reviewer practices and grant assessment criteria.

Peer review of grant applications in biology and medicine. Reliability, fairness, and validity

Scientometrics, 2009

This paper examines the peer review procedure of a national science funding organization (Swiss National Science Foundation) by means of the three most frequently studied criteria reliability, fairness, and validity. The analyzed data consists of 496 applications for projectbased funding from biology and medicine from the year 1998. Overall reliability is found to be fair with an intraclass correlation coefficient of 0.41 with sizeable differences between biology (0.45) and medicine (0.20). Multiple logistic regression models reveal only scientific performance indicators as significant predictors of the funding decision while all potential sources of bias (gender, age, nationality, and academic status of the applicant, requested amount of funding, and institutional surrounding) are non-significant predictors. Bibliometric analysis provides evidence that the decisions of a public funding organization for basic project-based research are in line with the future publication success of applicants. The paper also argues for an expansion of approaches and methodologies in peer review research by increasingly focusing on process rather than outcome and by including a more diverse set of methods e.g. content analysis. Such an expansion will be necessary to advance peer review research beyond the abundantly treated questions of reliability, fairness, and validity.

Individual versus general structured feedback to improve agreement in grant peer review: a randomized controlled trial

Research Integrity and Peer Review

Background Vast sums are distributed based on grant peer review, but studies show that interrater reliability is often low. In this study, we tested the effect of receiving two short individual feedback reports compared to one short general feedback report on the agreement between reviewers. Methods A total of 42 reviewers at the Norwegian Foundation Dam were randomly assigned to receive either a general feedback report or an individual feedback report. The general feedback group received one report before the start of the reviews that contained general information about the previous call in which the reviewers participated. In the individual feedback group, the reviewers received two reports, one before the review period (based on the previous call) and one during the period (based on the current call). In the individual feedback group, the reviewers were presented with detailed information on their scoring compared with the review committee as a whole, both before and during the r...

'Your comments are meaner than your score': score calibration talk influences intra-and inter-panel variability during scientific grant peer review

In scientific grant peer review, groups of expert scientists meet to engage in the collaborative decision-making task of evaluating and scoring grant applications. Prior research on grant peer review has established that inter-reviewer reliability is typically poor. In the current study, experienced reviewers for the National Institutes of Health (NIH) were recruited to participate in one of four constructed peer review panel meetings. Each panel discussed and scored the same pool of recently reviewed NIH grant applications. We examined the degree of intra-panel variability in panels' scores of the applications before versus after collaborative discussion, and the degree of inter-panel variability. We also analyzed videotapes of reviewers' interactions for instances of one particular form of discourse-Score Calibration Talk-as one factor influencing the variability we observe. Results suggest that although reviewers within a single panel agree more following col-laborative discussion, different panels agree less after discussion, and Score Calibration Talk plays a pivotal role in scoring variability during peer review. We discuss implications of this variability for the scientific peer review process. As the primary means by which scientists secure funding for their research programs, grant peer review is a keystone of scientific research. The largest funding agency for biomedical, behavioral, and clinical research in the USA, the National Institutes of Health (NIH), spends more than 80% of its $30.3 billion annual budget on funding research grants evaluated via peer review (NIH 2016). As part of the mechanism by which this money is allocated to scientists, collaborative peer review panels of expert scientists (referred to as 'study sections' at NIH) convene to evaluate grant applications and assign scores that inform later funding decisions by NIH govern-ance. Thus, deepening our understanding of how peer review ostensibly identifies the most promising, innovative research is crucial for the scientific community writ large. The present study builds upon existing work evaluating the reliability of peer review by examining how the discourse practices of reviewers during study section meetings may contribute to low reliability in peer review outcomes. The NIH peer review process is structured around study sections that engender distributed expertise (Brown et al. 1993), as reviewers evaluate applications based on their particular domain(s) of expertise but then share their specialized knowledge with others who have related but distinct expertise. The very structure of study sections

Low agreement among reviewers evaluating the same NIH grant applications

Obtaining grant funding from the National Institutes of Health (NIH) is increasingly competitive, as funding success rates have declined over the past decade. To allocate relatively scarce funds, scientific peer reviewers must differentiate the very best applications from comparatively weaker ones. Despite the importance of this determination , little research has explored how reviewers assign ratings to the applications they review and whether there is consistency in the reviewers' evaluation of the same application. Replicating all aspects of the NIH peer-review process, we examined 43 individual reviewers' ratings and written critiques of the same group of 25 NIH grant applications. Results showed no agreement among reviewers regarding the quality of the applications in either their qualitative or quantitative evaluations. Although all reviewers received the same instructions on how to rate applications and format their written critiques, we also found no agreement in how reviewers "translated" a given number of strengths and weaknesses into a numeric rating. It appeared that the outcome of the grant review depended more on the reviewer to whom the grant was assigned than the research proposed in the grant. This research replicates the NIH peer-review process to examine in detail the qualitative and quantitative judgments of different reviewers examining the same application, and our results have broad relevance for scientific grant peer review. peer review | social sciences | interrater reliability | linear mixed-effects models I n the past decade, funding at the National Institutes of Health (NIH) has increased at a much slower rate (1) than the number of grant applications (2), and consequently, success rates have steadily declined (3). There are more deserving grant applications than there are available funds, so it is critical to ensure that the process responsible for awarding such funds-grant peer review-reliably differentiates the very best applications from the comparatively weaker ones. Research on grant peer review is inconclusive: Some studies suggest that it is unreliable (4-13) and potentially biased (14-17), whereas others show the validity of review systems and final outcomes (18-20). However, even if peer review effectively discriminates the good applications from the bad, it is now imperative to empirically assess whether, in this culture of decreasing funding rates, it can discriminate the good from the excellent within a pool of high-quality applications. As Chubin and Hackett (21) argue, intensified competition for resources harms peer review because funding decisions rely on an evaluation process that is not designed to distinguish among applications of similar quality-a scenario that they argue is most prevalent at the NIH. Indeed, the findings in the present paper suggest that, in fact, reviewers are unable to differentiate excellent applications (i.e., those funded by the NIH in the first round) from good applications (i.e., those unfunded but later funded by the NIH after subsequent revisions). Because the grant peer-review process at NIH is confidential, the only way to systematically examine it is to replicate the process outside of the NIH in a highly realistic manner. This is precisely what we did in the research reported in this paper. We recruited 43 oncology researchers from across the United States to participate in one of four peer-review panels (called "study sections" at NIH), each composed of 8-12 reviewers. Fig. 1 presents a deidentified image from one study section meeting. We solicited 25 oncology grant applications submitted to NIH as R01s-the most competitive and highly sought after type of grant at NIH-between 1 and 4 y before our study. Sixteen of these were funded in the first round (i.e., the best applications), whereas 9 of these were funded only after subsequent resubmission (i.e., the good applications). The NIH uses a two-stage review process. In the first stage, two to five reviewers individually evaluate each grant application by assigning a preliminary rating using the NIH's reverse 9-point scale (1 = exceptional, 9 = poor) and writing a critique describing the application's strengths and weaknesses. Most typically, three reviewers are assigned to an application: a primary, a secondary, and a tertiary reviewer, ranked in order of the relevance of their expertise. Reviewers then convene in study section meetings, where they discuss the applications that received preliminary ratings in the top half of all applications evaluated. After sharing their preliminary ratings and critiques, the two to five assigned reviewers discuss the application with all other study section members, all of whom assign a final rating to the application. This final rating from all members is averaged into a final "pri-ority score." In the second stage, members of NIH's advisory councils use this priority score and the written critiques to make Significance Scientific grant peer reviewers must differentiate the very best applications from comparatively weaker ones. Despite the importance of this determination in allocating funding, little research has explored how reviewers derive their assigned ratings for the applications they review or whether this assessment is consistent when the same application is evaluated by different sets of reviewers. We replicated the NIH peer-review process to examine the qualitative and quantitative judgments of different reviewers examining the same grant application. We found no agreement among reviewers in evaluating the same application. These findings highlight the subjectivity in reviewers' evaluations of grant applications and underscore the difficulty in comparing the evaluations of different applications from different reviewers-which is how peer review actually unfolds.

A new approach to grant review assessments: score, then rank

Research integrity and peer review, 2023

Background In many grant review settings, proposals are selected for funding on the basis of summary statistics of review ratings. Challenges of this approach (including the presence of ties and unclear ordering of funding preference for proposals) could be mitigated if rankings such as top-k preferences or paired comparisons, which are local evaluations that enforce ordering across proposals, were also collected and incorporated in the analysis of review ratings. However, analyzing ratings and rankings simultaneously has not been done until recently. This paper describes a practical method for integrating rankings and scores and demonstrates its usefulness for making funding decisions in real-world applications. We first present the application of our existing joint model for rankings and ratings, the Mallows-Binomial, in obtaining an integrated score for each proposal and generating the induced preference ordering. We then apply this methodology to several theoretical "toy" examples of rating and ranking data, designed to demonstrate specific properties of the model. We then describe an innovative protocol for collecting rankings of the top-six proposals as an add-on to the typical peer review scoring procedures and provide a case study using actual peer review data to exemplify the output and how the model can appropriately resolve judges' evaluations. For the theoretical examples, we show how the model can provide a preference order to equally rated proposals by incorporating rankings, to proposals using ratings and only partial rankings (and how they differ from a ratings-only approach) and to proposals where judges provide internally inconsistent ratings/rankings and outlier scoring. Finally, we discuss how, using real world panel data, this method can provide information about funding priority with a level of accuracy in a well-suited format for research funding decisions. Conclusions A methodology is provided to collect and employ both rating and ranking data in peer review assessments of proposal submission quality, highlighting several advantages over methods relying on ratings alone. This method leverages information to most accurately distill reviewer opinion into a useful output to make an informed funding decision and is general enough to be applied to settings such as in the NIH panel review process.

The Necessity of Commensuration Bias in Grant Peer Review

Ergo, 2022

Peer reviewers at many funding agencies and scientific journals are asked to score submissions both on individual criteria and overall. The overall scores should be some kind of aggregate of the criteria scores. Carole Lee identifies this as a potential locus for bias to enter the peer review process, which she calls commensuration bias. Here I view the aggregation of scores through the lens of social choice theory. I argue that in many situations, especially when reviewing grant proposals, it is impossible to avoid commensuration bias.

Criteria for assessing grant applications: a systematic review

Palgrave Communications, 2020

Criteria are an essential component of any procedure for assessing merit. Yet, little is known about the criteria peers use to assess grant applications. In this systematic review we therefore identify and synthesize studies that examine grant peer review criteria in an empirical and inductive manner. To facilitate the synthesis, we introduce a framework that classifies what is generally referred to as ‘criterion’ into an evaluated entity (i.e., the object of evaluation) and an evaluation criterion (i.e., the dimension along which an entity is evaluated). In total, the synthesis includes 12 studies on grant peer review criteria. Two-thirds of these studies examine criteria in the medical and health sciences, while studies in other fields are scarce. Few studies compare criteria across different fields, and none focus on criteria for interdisciplinary research. We conducted a qualitative content analysis of the 12 studies and thereby identified 15 evaluation criteria and 30 evaluated entities, as well as the relations between them. Based on a network analysis, we determined the following main relations between the identified evaluation criteria and evaluated entities. The aims and outcomes of a proposed project are assessed in terms of the evaluation criteria originality, academic relevance, and extra-academic relevance. The proposed research process is evaluated both on the content level (quality, appropriateness, rigor, coherence/justification), as well as on the level of description (clarity, completeness). The resources needed to implement the research process are evaluated in terms of the evaluation criterion feasibility. Lastly, the person and personality of the applicant are assessed from a ‘psychological’ (motivation, traits) and a ‘sociological’ (diversity) perspective. Furthermore, we find that some of the criteria peers use to evaluate grant applications do not conform to the fairness doctrine and the ideal of impartiality. Grant peer review could therefore be considered unfair and biased. Our findings suggest that future studies on criteria in grant peer review should focus on the applicant, include data from non- Western countries, and examine fields other than the medical and health sciences.

Grant Peer Review: Improving Inter-Rater Reliability with Training

PLOS ONE, 2015

This study developed and evaluated a brief training program for grant reviewers that aimed to increase inter-rater reliability, rating scale knowledge, and effort to read the grant review criteria. Enhancing reviewer training may improve the reliability and accuracy of research grant proposal scoring and funding recommendations. Seventy-five Public Health professors from U.S. research universities watched the training video we produced and assigned scores to the National Institutes of Health scoring criteria proposal summary descriptions. For both novice and experienced reviewers, the training video increased scoring accuracy (the percentage of scores that reflect the true rating scale values), inter-rater reliability, and the amount of time reading the review criteria compared to the no video condition. The increase in reliability for experienced reviewers is notable because it is commonly assumed that reviewers-especially those with experience-have good understanding of the grant review rating scale. The findings suggest that both experienced and novice reviewers who had not received the type of training developed in this study may not have appropriate understanding of the definitions and meaning for each value of the rating scale and that experienced reviewers may overestimate their knowledge of the rating scale. The results underscore the benefits of and need for specialized peer reviewer training.

Studying the Study Section: How Group Decision Making in Person and via Videoconferencing Affects the Grant Peer Review Process. WCER Working Paper No. 2015-6

2015

One of the cornerstones of the scientific process is securing funding for one's research. A key mechanism by which funding outcomes are determined is the scientific peer review process. Our focus is on biomedical research funded by the U.S. National Institutes of Health (NIH). NIH spends $30.3 billion on medical research each year, and more than 80% of NIH funding is awarded through competitive grants that go through a peer review process (NIH, 2015). Advancing our understanding of this review process by investigating variability among review panels and the efficiency of different meeting formats has enormous potential to improve scientific research throughout the nation. NIH's grant review process is a model for federal research foundations, including the National Science Foundation and the U.S. Department of Education's Institute of Education Sciences. It involves panel meetings in which collaborative decision making is an outgrowth of socially mediated cognitive tasks. These tasks include summarization, argumentation, evaluation, and critical discussion of the perceived scientific merit of proposals with other panel members. Investigating how grant review panels function thus allows us not only to better understand processes of collaborative decision making within a group of distributed experts (Brown et al., 1993) that is within a community of practice (Lave & Wenger, 1991), but also to gain insight into the effect of peer review discussions on outcomes for funding scientific research. Theoretical Framework A variety of research has investigated how the peer review process influences reviewers' scores, including the degree of inter-rater reliability among reviewers and across panels, and the impact of discussion on changes in reviewers' scores. In addition, educational theories of distributed cognition, communities of practice, and the sociology of science frame the peer review process as a collaborative decision-making task involving multiple, distributed experts. The following sections review each of these bodies of literature.

Measuring interdisciplinarity of research grant applications An indicator developed to model this selection criterion in the ERC's peer-review process

Le Centre pour la Communication Scientifique Directe - HAL - memSIC, 2013

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

How Do I Review Thee? Let Me Count the Ways: A Comparison of Research Grant Proposal Review Criteria Across US Federal Funding Agencies

The journal of research administration, 2015

While Elizabeth Barrett Browning counted 25 ways in which she loves her husband in her poem, "How Do I Love Thee? Let me Count the Ways," we identified only eight ways to evaluate the potential for success of a federal research grant proposal. This may be surprising, as it seems upon initial glance of the review criteria used by various federal funding agencies that each has its own distinct set of "rules" regarding the review of grant proposals for research and scholarship. Much of the grantsmanship process is dependent upon the review criteria, which represent the funders' desired impact of the research. But since most funders that offer research grants share the overarching goals of supporting research that (1) fits within its mission and (2) will bring a strong return on its financial investment, the review criteria used to evaluate research grant proposals are based on a similar set of fundamental questions. In this article, we compare the review criteri...

THE USE OF SOCIETAL IMPACTS CONSIDERATIONS IN GRANT PROPOSAL PEER REVIEW: A COMPARISON OF FIVE MODELS

Technology &# 38; Innovation, 2010

Increasing demands on the part of the public for a demonstrable return on their investment in scientific and technical research have led to the widespread introduction of considerations of societal impacts into the peer review processes at public science and technology funding agencies. This answer to the accountability challenge also introduces a peculiar strain on peer review: expertise in particular areas of scientific and technical research is no guarantee of expertise in addressing the societal impacts of proposed research. Presenting preliminary results of a larger study, this article describes five current models of the peer review of grant proposals and shows that different agencies have very different ways of incorporating societal impacts considerations. The article also elucidates a notion of theoretical adequacy, which will be used to determine whether and how some peer review processes are better than others. The objectives of this article are to lay out the description of the agencies and to offer a preliminary assessment of each model's theoretical adequacy. The objective of our larger study is to determine the best ways to incorporate societal impacts considerations into the peer review of grant proposals, thus helping funding agencies respond to the demand for demonstrable results.

The Success of Peer Review Evaluation in University Research Funding – the Case Study from Slovakia

2018

Public funding mechanism for excellence is widely used mostly due its aim to raise the performance of higher education institutions to an excellent level since the reallocation is based on competitiveness of institutions or researchers. The approaches which are currently used for research evaluation are either peer review or bibliometric techniques. Peer review is based on deep expertise of committees and experts. However, its application is questioned to some extent, especially due to its ineffectiveness and inefficiency. In Slovakia, peer review process is applied to selection of projects by the Scientific Grant Agency. This paper identifies whether there is a relationship between peer review score of project proposals and research productivity. Case study is applied to the Scientific Grant Agency and its grant selection in year 2009 when the results of peer review process are for the first time available to the public. Our results show that peer review in most fields failed to pr...