Volume 6 Number 5 (2012) - HealthMED Journal (original) (raw)
Related papers
Assessing unidimensionality: A comparison of Rasch Modeling, Parallel Analysis, and TETRAD
The evaluation of assessment dimensionality is a necessary stage in the gathering of evidence to support the validity of interpretations based on a total score, particularly when assessment development and analysis are conducted within an item response theory (IRT) framework. In this study, we employ polytomous item responses to compare two methods that have received increased attention in recent years (Rasch model and Parallel analysis) with a method for evaluating assessment structure that is less well-known in the educational measurement community (TETRAD). The three methods were all found to be reasonably effective. Parallel Analysis successfully identified the correct number of factors and while the Rasch approach did not show the item misfit that would indicate deviation from clear unidimensionality, the pattern of residuals did seem to indicate the presence of correlated, yet distinct, factors. TETRAD successfully confirmed one dimension in the single-construct data set and was able to confirm two dimensions in the combined data set, yet excluded one item from each cluster, for no obvious reasons. The outcomes of all three approaches substantiate the conviction that the assessment of dimensionality requires a good deal of judgment.
Whole-person variables may describe behaviours, perceptions, knowledges, attitudes. These are the outcomes of the most various health-care interventions. Such variables can only be observed through samples of representatives behaviours (items in questionnaires). The amount of the variables is usually inferred through counts of the scores arbitrarily assigned to the various items. Rasch Analysis (RA) is a novel statistical method allowing the estimate of true linear measures of item difficulty and subject ability subtended by raw counts. This also allows the estimate of stability of the items hierarchy (which item is more difficult, which is less) across time, raters and diagnostic or cultural subgroups. If this hierarchy is unstable (Differential Item Functioning -DIF) the same questionnaire actually depicts qualitatively distinct, non comparable, conditions. RA may help to either detect the problem, or re-calibrate the subject measures by accounting for DIF. This may warrant real metric equivalence across medical questionnaires applied to different diagnostic groups and/or different linguistic/cultural contests, thus fostering multicentric, international trials.
The R Journal, 2019
The unival package is designed to help researchers decide between unidimensional and correlatedfactors solutions in the factor analysis of psychometric measures. The novelty of the approach is its use of external information, in which multiple factor scores and general factor scores are related to relevant external variables or criteria. The unival package's implementation comes from a series of procedures put forward by Ferrando and Lorenzo-Seva (2019) and new methodological developments proposed in this article. We assess models fitted using unival by means of a simulation study extending the results obtained in the original proposal. Its usefulness is also assessed through a real-world data example. Based on these results, we conclude unival is a valuable tool for use in applications in which the dimensionality of an item set is to be assessed.
Journal of Patient-Reported Outcomes
Background: This paper is part of a series comparing different psychometric approaches to evaluate patient-reported outcome (PRO) measures using the same items and dataset. We provide an overview and example application to demonstrate 1) using item response theory (IRT) to identify poor and well performing items; 2) testing if items perform differently based on demographic characteristics (differential item functioning, DIF); and 3) balancing IRT and content validity considerations to select items for short forms. Methods: Model fit, local dependence, and DIF were examined for 51 items initially considered for the Patient-Reported Outcomes Measurement Information System® (PROMIS®) Depression item bank. Samejima's graded response model was used to examine how well each item measured severity levels of depression and how well it distinguished between individuals with high and low levels of depression. Two short forms were constructed based on psychometric properties and consensus discussions with instrument developers, including psychometricians and content experts. Calibrations presented here are for didactic purposes and are not intended to replace official PROMIS parameters or to be used for research. Results: Of the 51 depression items, 14 exhibited local dependence, 3 exhibited DIF for gender, and 9 exhibited misfit, and these items were removed from consideration for short forms. Short form 1 prioritized content, and thus items were chosen to meet DSM-V criteria rather than being discarded for lower discrimination parameters. Short form 2 prioritized well performing items, and thus fewer DSM-V criteria were satisfied. Short forms 1-2 performed similarly for model fit statistics, but short form 2 provided greater item precision. Conclusions: IRT is a family of flexible models providing item-and scale-level information, making it a powerful tool for scale construction and refinement. Strengths of IRT models include placing respondents and items on the same metric, testing DIF across demographic or clinical subgroups, and facilitating creation of targeted short forms. Limitations include large sample sizes to obtain stable item parameters, and necessary familiarity with measurement methods to interpret results. Combining psychometric data with stakeholder input (including people with lived experiences of the health condition and clinicians) is highly recommended for scale development and evaluation.
PubMed, 2019
Background: Factor-analysis based dimensional assessment of psychometric measures is a key step in the development of tests. However, current practices for deciding between a multiple-correlated or essentially unidimensional solution are clearly improvable. Method: A series of recent studies are reviewed, and an approach is proposed that combines multiple sources of information, which is expected to be used to make an informed judgement about the most appropriate dimensionality for the measure being studied. It uses both internal and external sources of information, and focuses on the properties of the scores derived from each of the solutions compared. Results: The proposal is applied to a re-analysis of a measure of symptoms of psychological distress. The results show that a clear and informed judgement about the most appropriate dimensionality of the measure in the target population can be obtained. Discussion: The proposal is useful and can be put into practice by using user-friendly, non-commercial software. We hope that this availability will result in good practice in the future.
Current issues in psychometric assessment of outcome measures
2012
medicina fluminensis 2012, Vol. 48, No. 4, p. 463-470 463 Abstract. In recent years there has been an increasing use of outcome measures in clinical practice, audit procedures and quality control. The psychometric assessment of these measures is still largely based on classical test theory (CTT), including analysis of internal consistency, reproducibility, and criterion-related validity. But this approach neglects standard criteria and practical attributes that need to be considered when evaluating the fundamental properties of a measurement tool. Conversely, Rasch analysis (RA) is an original item-response theory analysis based on latent-trait modelling, and provides a statistical model that prescribes how data should be in order to comply with theoretical requirements of measurement. RA gives psychometric information not obtainable through CTT, namely: (i) the functioning of rating scale categories; (ii) the measure’s validity, e.g. how well an item performs in terms of its releva...
Value in Health, 2015
To provide comparisons and a worked example of itemand scale-level evaluations based on three psychometric methods used in patient-reported outcome development-classical test theory (CTT), item response theory (IRT), and Rasch measurement theory (RMT)-in an analysis of the National Eye Institute Visual Functioning Questionnaire (VFQ-25). Methods: Baseline VFQ-25 data from 240 participants with diabetic macular edema from a randomized, double-masked, multicenter clinical trial were used to evaluate the VFQ at the total score level. CTT, RMT, and IRT evaluations were conducted, and results were assessed in a head-to-head comparison. Results: Results were similar across the three methods, with IRT and RMT providing more detailed diagnostic information on how to improve the scale. CTT led to the identification of two problematic items that threaten the validity of the overall scale score, sets of redundant items, and skewed response categories. IRT and RMT additionally identified poor fit for one item, many locally dependent items, poor targeting, and disordering of over half the response categories. Conclusions: Selection of a psychometric approach depends on many factors. Researchers should justify their evaluation method and consider the intended audience. If the instrument is being developed for descriptive purposes and on a restricted budget, a cursory examination of the CTT-based psychometric properties may be all that is possible. In a high-stakes situation, such as the development of a patient-reported outcome instrument for consideration in pharmaceutical labeling, however, a thorough psychometric evaluation including IRT or RMT should be considered, with final item-level decisions made on the basis of both quantitative and qualitative results.