Validity of assessment centers (original) (raw)

Validity of assessment centers for personnel selection

a r t i c l e i n f o a b s t r a c t Research and practice in the application of assessment centers (AC) for personnel selection are reviewed and critiqued. Several examples of the use of ACs for external screening, internal promotion, and certification are described. Several types of evidence of validity of ACs for selection are reviewed, including representativeness of the content of dimensions and exercises in relation to job requirements, relationships among ratings within an AC, relationships of AC ratings and criteria of work effectiveness, and consequences of assessments including candidates' reactions to assessments and subgroup differences in ratings. Several controversies in research findings and practices of ACs are noted. Further research to address these controversies and new research to study emerging issues are suggested. Conclusions about the validity, fairness, and legal defensibility of ACs for personnel selection are offered. The assessment center method (ACM) has been used for many purposes in human resource management including selection, diagnosis, and development since its introduction over 50 years ago, (Thornton & Rupp, 2006). In this chapter, we review research and practice of the method for selection purposes. We use " selection " in a broad sense to mean the use of overall assessment ratings to aid in selection of: • external candidates into organizations, • internal candidates into supervisory and managerial ranks, • individuals into a pool of high potentials who will get special training, • exemplary staff members to receive certification of competence in job skills, or • employees for retention when there is a reduction in force and reorganization. In all of these applications, the overall assessment rating is used as a measure of competence to be successful in some new assignment. We begin with a definition of an assessment center (AC), and then give examples of several applications in selection. Next, we evaluate the literature to provide summaries of what is known about the validity of the method and controversies over theory, research, and practice. We conclude with a set of suggestions for research, including a new broad proposal for integrating a number of key issues percolating in the field, along with some specific research needs. In this paper, we intend to provide a summary of what will be useful to both scholars seeking key issues to investigate and practitioners facing challenges in applications.

The dynamics of assessment center validity: results of a seven year study

2001

We investigated tempora1 trends in the validity of an assessment center consisting of a group discussion and an analysis/presentation exercise, for predicting career advancement as measured by average salary growth over a 7-year period, for a sample of 679 academic graduates. The validity of the overall assessment rating (OAR) for persons with tenure of 7 years, corrected for initial differences in starting salaries, restriction in range, was .39. There was a considerable time variation in the validity of both the OAR and assessment center dimensions. In accordance with findings hom research in managerial effectiveness and development, the dimension interpersonal effectiveness only became valid after a number of years, while the dimension fknness was predictive in the whole period and increased in time. For comparison, validity trends for two types of interviews and a mental test were also studied. 2 The Dynamics of Assessment Center Validity: Results of a Seven Year Study An extensive body of knowledge exists about the predictive validity of assessment centers. Assessment centers have predictive validity for work-related criteria such as career potential or appraisal of overall job performance (Gaugler et al., 1987; Schmitt et al. 1984). However, a genera1 problem for evaluating assessment center validity, especially in the case of career advancement, is criterion contamination. In that case, later promotion decisions are indirectly and inadvertently based on the initial assessment center rating (Klimoski & Brickner, 1987). Therefore research into Zong-ferm, uncontaminated assessment center validity is needed. But, the variation in validity of assessment centers with time is unclear (Gaugler et al., 1987). Particularly, it is unclear whether the generally observed decline in predictive validity of selection instruments (Hulin, Henry & Noon, 1990), also is the case for the assessment center (Barre& Alexander & Doverspike, 1992). Predictive validity of assessment centers has been found to remain more or less unaffected, decrease or increase with time (Tziner, Ronen & Hacohen, 1993). In addition, there exists little research in which tempora1 trends in assessment center validity are compared to validity pattems of other predictors used in the same sample (Hunter & Hunter, 1984). The present study investigates time-dependent pattems in assessment center validity. In addition, the incremental validity of the assessment center with respect to such other predictors as the interview and a mental test is investigated. LONG TERM ASSESSMENT CENTER VALIDITY Assessment centers are in particular predictive for 'advancement criteria' such as career progress, salw advancement, long-term promotion, and potential development (Bray, Campbell & Grant, 1974; Ritchie, 1994; Scholz & Schuier, 1993). An obvious advancement criterion is salary growth (Tziner et al., 1993). Salary leve1 or current job grade (which correlates highly with salary leve1 in most organizations) are used as criterion of management success by, for example, Hinrichs (1978) and Mitchel(l975). It is also used in the present study. The Management Progress Study has produced mixed results (Thomton & Byham, 1982) with respect to the variation of assessment center validity with time. For the sample of 207 college graduates recruited in 195 7-1960 as management trainees for a telephone company, the validity of assessment result and leve1 actually achieved, has decreased fi-om a maximum of .46 in the early years (personal communication by Howard, 1981, in T'homton & Byham, 1982, p.254) to .33 in the sixteenth year. So the trend in validity appears to be inverted-U shaped. However, for the non-college group (n=148) the validity only decreased fiom .46 to .40 indicating a flattening validity curve with decreasing gains in predictive power. Also Slivinski, Grant, Bourgeois & Pederson (1977) found mixed results with respect to the predictive validity over time of the overall assessment rating (OAR) for salary. In the metaanalysis by Gaugler et al. (1987), there was no significant relation between assessment center validities and the time at which criterion measures were taken. In contrast with this, Tziner et al. (1993) investigating the validity of a managerial assessment center for a yearly rating of potential for upper management, found that the validity of the OAR for potential for upper-leve1 management decreased with time. Finally, an increase in the validity of the OAR for advancement criteria such as rank attained or salary growth was found in the longitudinal studies by Anstey (1977), Hinrichs (1978), McEvoy & Beatty (1989), Mitchel(1975), and Moses (1971; see Huck, 1977). Concluding, studies on long-term assessment center validity have produced equivocal results; in addition, al1 were conducted in North-American or British organizations (Feltham, 1988). Mitchel(l975) proposed two explanations for the variability of validity with time. The fïrst explanation concentrates on changes in critical work elements as a consequente of organizational and societal developments, the second on job changes during an individual's career.

A SURVEY OF ASSESSMENT CENTER PRACTICES IN ORGANIZATIONS IN THE UNITED STATES

Personnel Psychology, 1997

Two hundred fifteen organizations in the United States provided information about multiple aspects of their assessment centers, including design, usage, and their adherence to professional guidelines and research-based suggestions for the use of this method. Results reveal that centers are usually conducted for selection, promotion, and development purposes. Supervisor recommendation plays a sizable role in choosing center participants. Most often, line managers act as assessors; they typically arrive at participant ratings through a consensus process. In general, respondents indicate close adherence to recommendations for center design and assessor training. Recommendations involving other practices (e.g., informing participants, evaluating assessors, validating center results) are frequently not followed. Furthermore, methods thought to improve predictive validity of center ratings are underutilized. Variability in center practices according to industry and center purpose was revealed. We encourage practitioners to follow recommendations for center usage, and researchers to work to better understand moderators of center validity.

Two field tests of an explanation of assessment centre validity

Journal of Occupational and …, 1995

described two sets of constructs underlying assessment centre racings. The trait explanation holds that dimensional ratings capture a candidate's personal characteristics, skills and abilities. The performance consistency/role congruency explanation holds that dimensional ratings are predictions of how well the candidate will perform various tasks and/or roles in the target job. While past research has failed to find support for the trait explanation, no studies have explicitly examined the validity of assessment centres designed to make task or role-based dimensional ratings. We report two field evaluations of this explanation. In Study 1 assessor training was mmlitied to have assessors view traditional assessment dimensions as rote requirements. Concurrent validation of assessor evaluations of retail store managers resulted in correlations ranging from ,22 to ,28 with superiors' performance appraisal ratings and .32 to .35 with store profit. Study 2 evaluated the criterion-re la ted validity of ratings on both job requirements and traits. Findings indicate chat cask-based racings demonstrate concurrenc validity in a sample of entry level unic managers while the traditional traic-based ratings do noc. Implications for the construct validity and design of assessment centres are drawn. Klimoski & Brickner (1987) described six alternative explanations for the construct validity of assessment centre ratings. The traditional trait explanation Holmes, 1977; Standards for Ethical Considerations for Assessment Center Operations, 1977), that assessment centre dimensional ratings capture individual differences in candidates' skills and abilities, has been the subject of numerous empirical investigations. For example, when assessment centres are designed to yield trait ratings of dimensions immediately after each exercise, ratings of the same dimension are expected to be highly correlated with one another regardless of the exercise. Post-exercise dimensional ratings would also be expected to yield low correlations between ratings of different dimensions obtained within the same exercise.

APA handbook of testing and assessment in psychology, Vol. 1: Test theory and testing and assessment in industrial and organizational psychology

American Psychological Association eBooks, 2013

Previous chapters in Part I of this volume have provided comprehensive coverage of issues critical to test quality, including validity, reliability, sensitivity to change, and the consensus standards associated with psychological and educational testing. In-depth analysis of test quality represents a daunting area of scholarship that many test users may understand only partially. The complexity of modern test construction procedures creates ethical challenges for users who are ultimately responsible for the appropriate use and interpretation of tests they administer. Concerns over the ethical use of tests intensify as the high stakes associated with the results grow. Test manuals should provide detailed information regarding test score validity and applicability for specific purposes. However, translating the technical information into usable information may be difficult for test developers, who may be more expert at psychometric research than at making their findings accessible to the general reader. Additionally, the information presented in test manuals can be accurate and understandable but insufficient for appropriate evaluation by users. Fortunately, several sources of review and technical critique are available for most commercially available tests. These critiques are independent of the authors or publishers of the tests. Although test authors and publishers are required by current standards to make vital psychometric information available, experts may be necessary as translators and evaluators of such information. The most wellestablished source of test reviews is the Buros Institute of Mental Measurements. Established more

Criterion and construct validation of an assessment centre

Journal of Occupational and Organizational Psychology, 1996

This study is an attempt in assessment centre research to apply both criterion and construct validation strategies to a single sample and examine, simultaneously, a relatively comprehensive set of variables including assessor ratings, psychological test measures, supervisory ratings of job performance and actual promotions, hence allowing more direct comparisons of a variety of validities and explorations of previously unexamined issues. Results showed a lack of both internal construct validity as demonstrated by multitrait-mulcimethod analyses and factor analysis and external construct validity when placed in a nomological network of constructs independent of the centre. Assessment centre ratings were found to be predictive of subsequent promotion (r = .59, p < ,001) but not of concurrent supervisory ratings of performance (r = .Oh, n.s.). Logistic regression analyses showed that assessment centre ratings produced a significant and substantial increment in validity in predicting promotion over and above current supervisory ratings of job performance (Ax'(1) = 2 0. 0 6 ,~ < ,001). which is an important relationship that has not been previously examined. Implications of these findings for the nature ofconstructs in assessment centres and future research are discussed in the context of Klimoski & Brickner's (1987) 'performance consistency' explanation and 'subtle' criterion contamination explanation as to why assessment centres work.

The Utility of Assessment Centers for Career Development

1990

During the past 30 years, assessment centers have become an increasingly popular method for evaluating employees and potential employees in work organizations. This study was conducted to evaluate an assessment center which was part of a managenent development program for state government employees. The study examined: (1) whether assessment center feedback is accepted and acted on by participants; (2) whether this feedback improves managerial performance; and (3) the utility of assessment centers for career development. Participants (N=102) were middle-managers in state government who took part in an assessment center program. The findings revealed that participants appeared to accept and act upon the assessment feedback they received. Poor performers followed the assessment center's recommendations to the same extent as did high performers, and the participants believed they could improve their managerial performance by following the recommendations. After receiving feedback, participants were able to improve their performance on a measure of managerial effectiveness. Utility analysis indicated that the gain in productivity from the assessment far oul:weighed the costs.