Expertise, Domains, and the Consensual Assessment Technique (original) (raw)

Furious Activity vs. Understanding: How much expertise is needed to evaluate creative work?

Psychology of Aesthetics, Creativity and the Arts, Vol. 7, Issue 4, 2013

What is the role of expertise in evaluating creative products? Novices and experts do not assess creativity similarly, indicating domain-specific knowledge's role in judging creativity. We describe two studies that examine how quasi-experts (people who have more experience in a domain than novices but also lack recognized standing as experts) compare to novices and experts in rating creative work. In Study One, we compare different types of quasi-experts with novices and experts in rating short stories. In Study Two, we compared experts, quasi-experts, and novices in evaluating an engineering product (a mousetrap design). Quasi-experts (regardless of type) seem to be appropriate raters for short stories, yet results were mixed for the engineers quasi-experts. Some domains may require more expertise than others to properly evaluate creative work.

ExpertiseDomains&CAT.pdf

The Consensual Assessment Technique (CAT) argues that the most valid judgments of the creativity are those of the combined opinions of experts in the field. Yet who exactly qualifies as an expert to evaluate a creative product such as a short story? This study examines both novice and expert judgments of student short fiction. Results indicate a need for caution in using non-expert raters. Although there was only a small (but statistically significant) difference between experts' and novices' mean ratings, the correlation between the two sets of ratings was just .71. Experts were also far more consistent in their ratings compared to novices, whose level of inter-rater reliability was potentially problematic.

Extension of the Consensual Assessment Technique to Nonparallel Creative Products

Creativity Research Journal, 2004

The consensual technique for assessing creativity is widely used in research, but its validation has been limited to assessing the creativity of artifacts produced under tightly constrained experimental conditions. Typically, only artifacts produced in response to very similar instructions have been compared. This has allowed researchers to compare such things as the effects of different motivational conditions on creative performance, but it has not allowed many other kinds of comparisons. It has also limited the use of the technique to artifacts gathered for specific experimental purposes, as opposed to already-existing artifacts produced under less controlled conditions. For this study, samples of writings collected by the National Assessment of Educational Progress that were written in response to a very wide variety of assignments and under varying conditions were rated for creativity by 13 expert judges. Judges compared the creativity of 103 stories, 103 personal narratives, and 102 poems, all written by 8th-grade students. Very high levels of interrater reliability were obtained, demonstrating that the consensual method can be validly extended to such samples. New avenues for future research made possible by these findings are then discussed.

The Gold Standard for Assessing Creativity

International Journal of Quality Assurance in Engineering and Technology Education, 2014

The most widely used creativity assessments are divergent thinking tests, but these and other popular creativity measures have been shown to have little validity. The Consensual Assessment Technique is a powerful tool used by creativity researchers in which panels of expert judges are asked to rate the creativity of creative products such as stories, collages, poems, and other artifacts. The Consensual Assessment Technique is based on the idea that the best measure of the creativity of a work of art, a theory, a research proposal, or any other artifact is the combined assessment of experts in that field. Unlike other measures of creativity, the Consensual Assessment Technique is not based on any particular theory of creativity, which means that its validity (which has been well established empirically) is not dependent upon the validity of any particular theory of creativity. The Consensual Assessment Technique has been deemed the “gold standard” in creativity research and can be ve...

Rater effects in creativity assessment: A mixed methods investigation

Thinking Skills and Creativity, 2015

Rater effects in assessment are defined as the idiosyncrasies that exist in rater behaviors and cognitive process. They are composed of two aspects: the analysis of raw rating and rater cognition. This study employed mixed methods research to examine the two aspects of rater effects in creativity assessment that relies on raters' personal judgment. Quantitative data were collected from 2160 raw ratings made by 45 raters in three group and were analyzed by generalizability theory. Qualitative data were collected from raters' explanation of rationales for rating and their answers for questions about rating process as well as from 12 in-depth interviews and both were analyzed by framing analysis. The results indicated that the dependability coefficients were low for all the three rater groups, which were further explained by the variations and inconsistencies in raters' rating procedure, use of rating scales, and their beliefs about creativity.

Learning to judge creativity: The underlying mechanisms in creativity training for non-expert judges

Learning and Individual Differences, 2014

Evaluating individual creativity is an important challenge in creativity research. We developed a training module for non-expert judges in which participants learned the definitions of components of creativity and received expert feedback in an interactive creativity judgment exercise. We aimed to test whether and how the training module would increase the reliability and validity of non-expert ratings. Study 1 (N = 79) showed that the training had a positive effect on the test-retest reliability and validity of creativity ratings. Study 2 (N = 126) replicated the results on test-retest reliability and validity but with low absolute values, indicating that trained participants cannot substitute experts. In addition, Study 2 showed that the effect of the training module on the validity of creativity ratings was mediated by increased validity of ratings of novelty and elaboration. The results are discussed in terms of theoretical and practical relevance.

Differences in Judgments of Creativity: How Do Academic Domain, Personality, and Self-Reported Creativity Influence Novice Judges’ Evaluations of Creative Productions?

Journal of Intelligence, 2015

Intelligence assessment is often viewed as a narrow and ever-narrowing field, defined (as per IQ) by the measurement of finely distinguished cognitive processes. It is instructive, however, to remember that other, broader conceptions of intelligence exist and might usefully be considered for a comprehensive assessment of intellectual functioning. This article invokes a more holistic, systems theory of intelligence-the theory of successful intelligence-and examines the possibility of including in intelligence assessment a similarly holistic measure of creativity. The time and costs of production-based assessments of creativity are generally considered prohibitive. Such barriers may be mitigated by applying the consensual assessment technique using novice raters. To investigate further this possibility, we explored the question: how much do demographic factors such as age and gender and psychological factors such as domain-specific expertise, personality or self-perceived creativity affect novices' unidimensional ratings of creativity? Fifty-one novice judges from

Captions, consistency, creativity, and the consensual assessment technique: New evidence of reliability

Thinking Skills and Creativity, 2007

The consensual assessment technique (CAT) is a measurement tool for creativity research in which appropriate experts evaluate creative products . Creativity in context: Update to the social psychology of creativity. Boulder, CO: Westview]. However, the CAT is hampered by the time-consuming nature of the products (asking participants to write stories or draw pictures) and the ratings (getting appropriate experts). This study examined the reliability of ratings of sentence captions. Specifically, four raters evaluated 12 captions written by 81 undergraduates. The purpose of the study was to see whether the CAT could provide reliable ratings of captions across raters and across multiple captions and, if so, how many such captions would be required to generate reliable scores, and how many judges would be needed? Using generalizability theory, we found that captions appear to be a useful way of measuring creativity with a reasonable level of reliability in the frame of CAT.