From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation (original) (raw)

Cronbach’s α, Revelle’s β, and Mcdonald’s ω H : their relations with each other and two alternative conceptualizations of reliability

Psychometrika, 2005

We make theoretical comparisons among five coefficients—Cronbach’s α, Revelle’s β, McDonald’s ωh , and two alternative conceptualizations of reliability. Though many end users and psychometricians alike may not distinguish among these five coefficients, we demonstrate formally their nonequivalence. Specifically, whereas there are conditions under which α, β, and ωh are equivalent to each other and to one of the two conceptualizations of reliability considered here, we show that equality with this conceptualization of reliability and between α and ωh holds only under a highly restrictive set of conditions and that the conditions under which β equals ωh are only somewhat more general. The nonequivalence of α, β, and ωh suggests that important information about the psychometric properties of a scale may be missing when scale developers and users only report α as is almost always the case

A Comparison of Reliability Coefficients for Ordinal Rating Scales

Journal of Classification, 2021

Kappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients...

Estimating ordinal reliability for Likert-type and ordinal item response data: a conceptual, empirical, and practical guide

This paper provides a conceptual, empirical, and practical guide for estimating ordinal reliability coefficients for ordinal item response data (also referred to as Likert, Likert-type, ordered categorical, or rating scale item responses). Conventionally, reliability coefficients, such as Cronbach’s alpha, are calculated using a Pearson correlation matrix. Ordinal reliability coefficients, such as ordinal alpha, use the polychoric correlation matrix (Zumbo, Gadermann, & Zeisser, 2007). This paper presents (i) the theoretical-psychometric rationale for using an ordinal version of coefficient alpha for ordinal data; (ii) a summary of findings from a simulation study indicating that ordinal alpha more accurately estimates reliability than Cronbach’s alpha when data come from items with few response options and/or show skewness; (iii) an empirical example from real data; and (iv) the procedure for calculating polychoric correlation matrices and ordinal alpha in the freely available software program R. We use ordinal alpha as a case study, but also provide the syntax for alternative reliability coefficients (such as beta or omega).

The single-item need for consistency scale

Individual Differences Research, 2014

Despite the applicability of the preference for consistency (PFC) scale to multiple real-world settings, the large number of items limits its use in field studies. To ease this restriction, we constructed and tested a single-item measure (i.e., the single item need for consistency scale—SIN-C). Through three studies (N ~ 1000), we examined the concurrent validity of a single-item need for consistency scale with the PFC in a student sample (Study 1), the test–retest reliability of the SIN-C across four months (Study 2), and the construct validity of the SIN-C in a diverse international sample (Study 3). Overall, the SIN-C showed good reliability and validity, supporting its use in future research.

Reliability: Arguments for Multiple Perspectives and Potential Problems with Generalization across Studies

Educational and Psychological Measurement, 2002

The present article addresses reliability issues in light of recent studies and debates focused on psychometrics versus datametrics terminology and reliability generalization (RG) introduced by Vacha-Haase. The purpose here was not to moderate arguments presented in these debates but to discuss multiple perspectives on score reliability and how they may affect research practice, editorial policies, and RG across studies. Issues of classical error variance and reliability are discussed across models of classical test theory, generalizability theory, and item response theory. Potential problems with RG across studies are discussed in relation to different types of reliability, different test forms, different number of items, misspecifications, and confounding independent variables in a single RG analysis.