The Impact of Item Characteristics on Item and Scale Validity (original) (raw)
Related papers
Understanding item parameters in personality scales: An explanatory item response modeling approach
Personality and Individual Differences, 2018
Although item parameters are essential in psychometric frameworks, such as item response theory (IRT), few theories are available to guide their psychological interpretation. Moreover, the meaning of item parameters is generally harder to interpret for personality scales than for ability scales. The current study provides a comprehensive way to interpret personality item parameters, Generalized Linear and Nonlinear Models (GLNMMs). The GLNMMs model the effect of item features on psychometric properties, such as item discrimination and item location. Distinct from previous studies which use a two-step approach, the GLNMMs produce smaller standard errors of item features' coefficients, and allow examination of person covariates. The current study examines four item featuresnegative wording, subtlety, social desirability, and miscomprehension. In general, item discrimination negatively correlated with item subtlety, social desirability, and miscomprehension; item easiness positively correlated with subtlety and miscomprehension, and negatively correlated with negative wording.
The impact of item characteristics
This study describes the relation between personality items' validities, defined as the items' correlations with acquaintance ratings on the Big 5 personality factors, and other itemmetric properties including ambiguity, syntactic complexity, social desirability, content, and trait indicativity. Five external validity coefficients for each item on the California Psychological Inventory were correlated with a number of itemmetric variables often assumed to affect item validity. Item validity correlated positively with social desirability and trait indicativity and negatively with ambiguity across the five factors. Other characteristics had a more limited influence on item validity. Multiple regression analyses revealed trait indicativity -how obviously an item response indicates a trait -to be the most important determinant of item validity. Scales built from itemmetrically sound versus poor items showed differential validity in two additional samples. Implications for the psychological processes underlying responses to personality items are discussed.
Psychometric Properties of a Revised Version of the Ten Item Personality Inventory
European Journal of Psychological Assessment, 2014
developed the Ten-Item Personality Inventory (TIPI) to meet the need of very short measures of the Big Five for time-limited contexts or large survey questionnaires. In this paper we show the inadequacy of the Italian version downloadable from Gosling's website and we report the results of four studies in which the psychometric properties of a revised version (I-TIPI-R) were investigated in student and general population samples. This new version showed adequate factor structure, test-retest reliability, self-observer agreement and convergent and discriminant validity with the Big Five Inventory (BFI). Moreover, I-TIPI-R and BFI scores did not differ in their correlations with measures of affect, self-esteem, optimism, emotion regulation, and social desirability. Overall, the results suggest that the I-TIPI-R can be considered a valid and reliable alternative to the BFI for the assessment of basic personality traits when very short measures are needed.
Personality test item validity: insights from “self” and “other” research and theory
Personality and Individual Differences, 1998
This research examined the effects of item-evoked cognitive response strategy and test-taking instructional set on the psychometric properties of personality scales. University undergraduate roommate pairs (IV= 192) responded to subscales from typical personality inventories under either standard, trait-focused, or behaviour-focused instructions. Results indicated that behavioural-based instructions combined with behaviouralfocused items significantly decreased indices of subscale reliability, validity, and a composite measure of goodness. Findings are interpreted with respect to social cognition theories concerning perception of the "self" and the "other". Implications for personality test item writing and item pool selection are discussed.
A Short Version of the Big Five Inventory (BFI-20): Evidence on Construct Validity
Revista Interamericana de Psicología/Interamerican Journal of Psychology
Several measures were developed in the past decades to measure personality, focusing on the Big Five Factor Model (BFFM; Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). Despite the relevance of their findings in different countries, a shared limitation of such measures is their length, demanding time from researchers and participants, which might cause boredom or fatigue, biasing the final results. This research aimed to provide a shorter version for the 44-Item Big Five Inventory (BFI), through two studies (NTotal = 8,119). The structure was assessed using a range of techniques (e.g., PAF analysis, Procrustes rotation). The best 20 items (4 per factor) were chosen to compose the final version of the BFI-20, which presented suitable psychometric evidences across the samples. Thus, due the growing need for shorter measures without losing their psychometric quality, our findings indicate the adequacy of the 20-item BFI and its potential applicability in res...
Ego-syntonicity in responses to items in the California Psychological Inventory
Journal of Research in Personality, 2006
Responding to items on a personality questionnaire can evoke a variety of feelings, from discomfort to indiVerence to pleasure. Harrison Gough reported that when he wrote items for the California Psychological Inventory (CPI; Gough & Bradley, 1996), he tried to make the items as ego-syntonic as possible. Ego-syntonic items are those "which a respondent Wnds congenial, and on which giving an opinion is a rewarding act" (Gough & Bradley, 1996, p. 10). The present study asked 79 respondents to report how they felt after answering each CPI item. Average aVect ratings were above neutral for a majority of items, indicating that Gough had some success in writing ego-syntonic items. DiVerences in item ego-syntonicity were attributable to other item characteristics. Respondents disliked responding to relatively odd and ambiguous items, items with linguistic negations, and items referring to negative feelings and situations. As predicted by Gough, respondents enjoyed responding to items on the communality scale, items with which most people agree. They also enjoyed items that referred to positive emotions and attitudes and to items indicating extraversion, conscientiousness, low neuroticism, and openness to experience. Highly ego-syntonic items were found to be more valid than less egosyntonic items. Individuals who reported disliking many items were found to be socially anxious. The relation between reports of liking or disliking items, identity, and reputation are discussed, and further research on item response dynamics and validity is proposed.
Fitting Item Response Theory Models to Two Personality Inventories: Issues and Insights
Multivariate Behavioral Research, 2001
The present study compared the fit of several IRT models to two personality assessment instruments. Data from 13,059 individuals responding to the US-English version of the Fifth Edition of the Sixteen Personality Factor Questionnaire (16PF) and 1,770 individuals responding to Goldberg's 50 item Big Five Personality measure were analyzed. Various issues pertaining to the fit of the IRT models to personality data were considered. We examined two of the most popular parametric models designed for dichotomously scored items (i.e., the two-and three-parameter logistic models) and a parametric model for polytomous items (Samejima's graded response model). Also examined were Levine's nonparametric maximum likelihood formula scoring models for dichotomous and polytomous data, which were previously found to provide good fits to several cognitive ability tests . The two-and threeparameter logistic models fit some scales reasonably well but not others; the graded response model generally did not fit well. The nonparametric formula scoring models provided the best fit of the models considered. Several implications of these findings for personality measurement and personnel selection were described.
PLOS ONE, 2017
The aim of this study was to construct a short, 30-item personality questionnaire that would be, in terms of content and meaning of the scores, as comparable as possible with longer, well-established inventories such as NEO PI-R and its clones. To do this, we shortened the formerly constructed 60-item "Short Five" (S5) by half so that each subscale would be represented by a single item. We compared all possibilities of selecting 30 items (preserving balanced keying within each domain of the five-factor model) in terms of correlations with wellestablished scales, self-peer correlations, and clarity of meaning, and selected an optimal combination for each domain. The resulting shortened questionnaire, XS5, was compared to the original S5 using data from student samples in 6 different countries (Estonia, Finland, UK, Germany, Spain, and China), and a representative Finnish sample. The correlations between XS5 domain scales and their longer counterparts from well-established scales ranged from 0.74 to 0.84; the difference from the equivalent correlations for full version of S5 or from meta-analytic short-term dependability coefficients of NEO PI-R was not large. In terms of prediction of external criteria (emotional experience and self-reported behaviours), there were no important differences between XS5, S5, and the longer well-established scales. Controlling for acquiescence did not improve the prediction of criteria, self-peer correlations, or correlations with longer scales, but it did improve internal reliability and, in some analyses, comparability of the principal component structure. XS5 can be recommended as an economic measure of the five-factor model of personality at the level of domain scales; it has reasonable psychometric properties, fair correlations with longer well-established scales, and it can predict emotional experience and self-reported behaviours no worse than S5. When subscales are essential, we would still recommend using the full version of S5.
Dredging the OCEAN.20 : an item response theory analysis of a shortened personality scale
2013
Valid and reliable personality assessments are important tools for personnel selection, so long as they are efficient and free from bias (Mount & Barrick, 1995). Undergraduate students (N = 503) completed the OCEAN.20, a brief 20-item self-report measure of the five factors of personality (O'Keefe, Kelloway, & Francis, 2012). Classical test theory methods had already established the scale's reliability and validity, as replicated in the present study, but item response theory analyses identified nine problematic items. Three items displayed differential item functioning, three items had a truncated range of responses, and three more items had low precision. The potential for bias or insufficient information offered by each item is cause for concern, as it could have serious consequences in determining a job applicant's fate, so it is advised that these items either be removed or revised prior to operational use. Limitations and recommendations for future research are discussed.
European Journal of Psychological …, 2007
The five-factor model (FFM) is currently the predominant model in trait psychology. To meet the need for an extremely brief measure of the FFM, developed the Ten-Item Personality Inventory (TIPI), which can be administered in about a minute. Here we describe the development and construct validation of a German version of the TIPI (the TIPI-G). Using a multijudge (self and peer), multiinstrument (TIPI-G and the German version of the NEO-PI-R) design, we evaluated the TIPI-G in terms of internal consistency, factor structure, convergent and discriminant validity, and coverage of the NEO-PI-R facets. Together the analyses suggest that the 10 unipolar items of the TIPI-G can provide an efficient approximation for longer measures of the FFM personality constructs. As such, the TIPI-G is recommended for research where time is limited, where the primary theoretical focus is on other constructs, or where it is desirable to reduce the testing burden on participants.