Item Response Theory Research Papers (original) (raw)

The main objective of this study is to develop a self-report instrument to measure pre-service teachers’ perceptions of the extent to which they experience the necessary support and training in order to integrate technology into classroom... more

The main objective of this study is to develop a self-report instrument to measure pre-service teachers’ perceptions of the extent to which they experience the necessary support and training in order to integrate technology into classroom activities. The questionnaire items of this instrument were drawn up on the basis of a synthesis of 19 qualitative studies (Authors et al., 2012) and were reviewed by experts in the field. In order to study its reliability and aspects of validity, data were collected and analysed consisting of a sample of 688 pre-service teachers in Flanders (Belgium). The resulting scale showed highly satisfactory psychometric properties. Item response theory revealed a good fit of the measurement to a Rating Scale Model for 22 out of 24 items. The results also indicate that the items differ in their degree of difficulty. It seems that helping pre-service teachers to design ICT-rich lessons and providing adequate feedback can be considered more challenging for teacher training institutions. Recommendations are given regarding how the new scale can be useful for both teacher training institutions and schools in developing approaches to equip pre-service teachers with the competencies needed to integrate technology in teaching and learning processes.

Item Response Theory is a measurement framework used in the design and analysis of educational and psychological assessments (achievement tests, rating scales, inventories, or other instruments) that measure mental traits. This is... more

Item Response Theory is a measurement framework used in the design and analysis of educational and psychological assessments (achievement tests, rating scales, inventories, or other instruments) that measure mental traits. This is becoming increasingly popular among educational assessment experts to analyse cognitive measurement but little is known about the theory to analyse noncognitive measurement. (E.g. personality, attitude, and psychopathology) This paper provides information on models, assumptions and application of IRT in the analysis of cognitive and no-cognitive measurements. The paper concludes that, IRT is a better framework that can be exploited by researchers in analyzing cognitive data for assessment and evaluation research, and non-cognitive data, for sociological, psychological and psychopathological assessments. However, all the statistical assumptions must be met, and the test data must fit the IRT model for valid, reliable and credible results.

Objective: To date, no measure of social support has been developed specifically for either palliative care or oncology settings. The present study examined the psychometric properties of the Duke-University of North Carolina Functional... more

Objective: To date, no measure of social support has been developed specifically for either palliative care or oncology settings. The present study examined the psychometric properties of the Duke-University of North Carolina Functional Social Support Questionnaire (DUFSS) in order to (1) assess the adequacy of the scale in the context of severe medical illness and (2) evaluate whether a brief subset of items might generate roughly comparable utility. Method: The 14-item DUFSS was administered to 1,362 individuals with advanced cancer or AIDS. Classical test theory (CTT) and item response theory (IRT) analyses were utilized to develop an abbreviated version of the DUFSS that maintained adequate reliability and validity and might increase the feasibility of its administration in a palliative care setting. The reliability and concurrent validity of the DUFSS-5 were evaluated in a separate validation sample of patients with advanced cancer. Results: Analyses generated a five-item versi...

The problem of dimensionality with respect to Raven’s Advanced Progressive Matrices (APM) specifically and, more generally, g or fluid intelligence, has been a long-standing issue. The present article reports two studies examining the... more

The problem of dimensionality with respect to Raven’s Advanced Progressive Matrices (APM) specifically and, more generally, g or fluid intelligence, has been a long-standing issue. The present article reports two studies examining the dimensionality of both the original Set II of the APM ( n = 506) and a short form ( n = 644), using principal component analysis and Rasch analysis. Although the results from the principal component analysis were equivocal, results from the Rasch analyses more strongly suggested that both forms of the test are best described as being multidimensional. Furthermore, comparison of items common to both forms indicated a context effect, thus making adaptive testing versions of this test difficult.

European Organisation for Research and Treatment of Cancer (EORTC) has developed a new multidimensional instrument measuring cancer-related fatigue that can be used in conjunction with the quality of life core questionnaire, EORTC... more

European Organisation for Research and Treatment of Cancer (EORTC) has developed a new multidimensional instrument measuring cancer-related fatigue that can be used in conjunction with the quality of life core questionnaire, EORTC QLQ-C30. The paper focuses on the development of the phase III module, collaborating with seven European countries, including a patient sample of 318 patients. The methodology followed the EORTC guidelines for developing phase III modules. Patients were assessed by questionnaires (EORTC QLQ-C30 with the EORTC Fatigue Module FA15) followed by an interview, asking for their opinions on the difficulty in understanding, on annoyance and on intrusiveness. The phase II FA15 was revised on the basis of qualitative analyses (comments of the patients), quantitative results (descriptive statistics) as well as the multi-item response theory analyses. The three dimensions (physical, emotional and cognitive) of the scale could be confirmed. As a result, EORTC QLQ-FA13 ...

Past research has shown that Filipino cancer patients report lower levels of quality of life (QoL) than other ethnic groups. One possible explanation for this is that Filipinos do not define QoL in the same manner as others, resulting in... more

Past research has shown that Filipino cancer patients report lower levels of quality of life (QoL) than other ethnic groups. One possible explanation for this is that Filipinos do not define QoL in the same manner as others, resulting in bias in their assessments. Hence, Filipinos would not necessarily have lower QoL. Item response theory methods were used to assess differential item functioning (DIF) in the quality of life (measured by the EORTC QLQ-C30) of cancer patients across four ethnic groups (Caucasian, Filipino, Hawaiian, and Japanese). The sample consisted of 359 cancer patients. Results showed the presence of DIF on several items, indicating ethnic differences in the assessment of quality of life. Relative to the Caucasian and Japanese groups, items related to physical functioning, cognitive functioning, social functioning, nausea and vomiting, and financial difficulties exhibited DIF for Filipinos. On these items Filipinos exhibited either higher or lower QoL scores, eve...

In recent years, a growing number of schools, colleges, universities and other learning institutions have been converting their existing courses into E-Learning applications. To implement this, the authors are developing a Macromedia... more

In recent years, a growing number of schools, colleges, universities and other learning institutions have been converting their existing courses into E-Learning applications. To implement this, the authors are developing a Macromedia Flash E-Learning Web application able to fully include item data input and adaptive testing capabilities using item response theory. The project’s goal is to maximize the capabilities and the reusability of the multimedia content produced by using the IMS-QTI standard. The application’s adaptive testing functionalities will be implemented by proposing new IMS-QTI sub-standards for item parameters and interfacing parameters’ characterization.

Purpose: This study presented the construct of Work-Health Balance and the design and validation of the Work-Health balance questionnaire (WHBq). More and more workers have a long-standing health problem or a disability (LSHPD). The... more

Purpose: This study presented the construct of Work-Health Balance and the design and validation of the Work-Health balance questionnaire (WHBq). More and more workers have a long-standing health problem or a disability (LSHPD). The management of health needs and work demands is crucial for the quality of working life and work retention of these workers. However, no instrument exists measuring this process. The WHBq assesses key factors in the process of adjusting between health needs and work demands. Method: We tested the reliability and validity of 38 items with cross-sectional data from a sample of 321 Italian workers (mean age = 45 ± 11) with EFA, Rasch analyses and the correlations with other relevant variables. Results: The instrument ultimately consisted of 17 items that reliably measured three factors: Work-Health incompatibility, Health climate, and External support. These dimensions were associated with well-being in the workplace, dysfunctional behaviors at work, and general psychological health. A higher level on the Work-Health balance index was associated with lower levels of presenteeism, emotional exhaustion, workaholism, and psychological distress and with higher levels of job satisfaction and work engagement, supporting the construct validity of the instrument. Conclusion: The WHBq shows good psychometric characteristics and strong and theoretically consistent relationships with important and well-known variables. These results make the WHBq a promising tool in the study and management of health of employees, especially for the work continuation of employees returning to work with LSHPD.

Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the... more

Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM-based data with a linear attribute structure. The study utilizes a procedure to make the IRM and CDM frameworks comparable and investigates how estimation accuracy is affected by test diagnosticity and the match between the true and fitted models. The study shows that comparable results can be obtained when highly diagnostic IRM data are retrofitted with CDM, and vice versa, retrofitting CDMs to IRM-based data in some conditions can result in considerable examinee misclassification, and model fit indices provide limited indication of the accuracy of item parameter estimation and attribute classification.

Narcissism as a psychological construct has had a contentious past both in its conceptualization and measurement. There is an emerging consensus that narcissism consists of grandiose and vulnerable subtypes, which share a common core. In... more

Narcissism as a psychological construct has had a contentious past both in its conceptualization and measurement. There is an emerging consensus that narcissism consists of grandiose and vulnerable subtypes, which share a common core. In the present research (N = 1002), we constructed a new measure of unified narcissism that reflects these contemporary understandings using items from the most widely used measures of grandiose and vulnerable narcissism: the Narcissistic Personality Inventory (NPI; Raskin & Terry, 1988, https://doi.org/10.1037/0022-3514.54.5.890), and the Pathological Narcissism Inventory (PNI; Pincus et al., 2009, https://doi-org/10.1037/a0016530). We used classical test theory and item response theory approaches to devise a 29-item Unified Narcissism Scale. The scale showed good internal consistency, and convergent and discriminant validity, and showed evidence of measurement invariance between men and women. This research gave strong support for the structure, reliability, and validity of the unified measure, which offers a promising avenue for further enhancing our knowledge of narcissism.

Testing for measurement invariance can be done within the context of multigroup latent class analysis. Latent class analysis can model any type of discrete level data, which makes it an obvious choice when nominal indicators are used or... more

Testing for measurement invariance can be done within the context of multigroup latent class analysis. Latent class analysis can model any type of discrete level data, which makes it an obvious choice when nominal indicators are used or when a researcher's aim is at classifying respondents in latent classes. The multigroup latent class (LC) model can be specified in three different ways, i.e. by adopting a probabilistic, a log-linear or a logistic parameterization. We define and compare these different forms of parameterization. The starting point is the standard LC model in which indicators and latent variables are defined at the nominal level. Additionally, we focus on LC models with ordinal indicators as well as LC factor models with ordinal indicators. Testing for measurement invariance involves estimating LC models with different degrees of homogeneity. We explain the procedure for investigating measurement invariance at both the scale as well as the item level. We illustrate the approach with two examples. The first example is a multigroup LC analysis with nominal indicators; the second a multigroup LC factor analysis with ordinal indicators.

Stress is concomitant with students' life and can have a significant impact on their lives, and even how they go about their academic work. Globally, in every five visits by patients to the doctor, three are stress-related problems. This... more

Stress is concomitant with students' life and can have a significant impact on their lives, and even how they go about their academic work. Globally, in every five visits by patients to the doctor, three are stress-related problems. This study examined stress and its impact on the academic and social life among students of a university in Ghana. The descriptive cross-sectional survey design was employed. Using the stratified and simple random (random numbers) sampling methods, 500 regular undergraduate students were engaged in the study. A questionnaire made up of Perceived Stress Scale and Students' Life Satisfaction Scale was used to gather data for the study. Frequencies, percentages, means and standard deviation, and Structural Equation Modeling (SEM), with AMOS were used for the analyses. It was found that majority of the students were moderately stressed. Paramount among the stressors were academic stressors, followed by institutional stressors, and external stressors. Stress had a significant positive impact on the academic and social life of students. It was concluded that undergraduate students, in one way or the other, go through some kind of stress during the course of their study. It was recommended that the university, through its Students' Affairs, and Counselling Sections, continue to empower students on how to manage and deal with stress in order to enhance their academic life.

This study examined the effect of various sample sizes (200, 500, 1000, 5000, 10000, and 20000) and test lengths (15, 30, and 60) on the accuracy of item response theory item-parameters estimation using real test data. Estimates of item... more

This study examined the effect of various sample sizes (200, 500, 1000, 5000, 10000, and 20000) and test lengths (15, 30, and 60) on the accuracy of item response theory item-parameters estimation using real test data. Estimates of item parameters were obtained by fitting the three-parameter logistic model. The main findings of this study confirmed those findings in previous studies which used simulated data in that longer tests resulted in more accurate estimates of all item parameters across different sample sizes and across different ability levels, especially at ability levels lower than zero. Item difficulty parameter appeared to be the most sensitive to fluctuations in sample size and test length; whereas, item guessing parameter appeared to be the least sensitive. On the other hand, different samples yielded comparable results in terms of accuracy in estimating the three item parameters. Finally, the minimum requirements for accurate parameters estimation tended to be 500 for sample size and 30 for test length. However, sample sizes as small as 200 can still yield acceptable estimates when combined with test lengths longer than 15.

This study reviews and evaluates the PROC IRT procedure in the SAS/STAT 14.1 software that provides item response theory analysis to unidimensional and multidimensional data that is dichotomous, polytomous, or some mixture of item-types.... more

This study reviews and evaluates the PROC IRT procedure in the SAS/STAT 14.1 software that provides item response theory analysis to unidimensional and multidimensional data that is dichotomous, polytomous, or some mixture of item-types. A brief summary of the features, documentation, and availability are provided. A simulation study was used to evaluate the efficiency and accuracy of the procedure.

Study purpose was to revise and examine the validity of the Medication Adherence Self-Efficacy Scale (MASES) in an independent sample of 168 hypertensive African Americans: mean age 54 years (SD = 12.36); 86% female; 76% high school... more

Study purpose was to revise and examine the validity of the Medication Adherence Self-Efficacy Scale (MASES) in an independent sample of 168 hypertensive African Americans: mean age 54 years (SD = 12.36); 86% female; 76% high school education or greater. Participants provided demographic information; completed the MASES, self-report and electronic measures of medication adherence at baseline and three months. Confirmatory (CFA), exploratory (EFA) factor analyses, and classical test theory (CTT) analyses suggested that MASES is unidimensional and internally reliable. Item response theory (IRT) analyses led to a revised 13-item version of the scale: MASES-R. EFA, CTT, and IRT results provide a foundation of support for MASES-R reliability and validity for African Americans with hypertension. Research examining MASES-R psychometric properties in other ethnic groups will improve generalizability of findings and utility of the scale across groups. The MASES-R is brief, quick to administer, and can capture useful data on adherence self-efficacy.

This study examines the effect that individuals’ perceptions of police have on their adoption of crime prevention measures. Unlike past research that conceptualized police perceptions as inversely associated with crime prevention, we... more

This study examines the effect that individuals’ perceptions of police have on their adoption of crime prevention measures. Unlike past research that conceptualized police perceptions as inversely associated with crime prevention, we introduce a framework that distinguishes between the traditional policing and community policing/procedural justice models. We analyze multilevel data from Canada’s General Social Survey for 13 crime prevention measures (e.g. locking doors, installing burglar alarms), and estimate Item Response Theory models to account for differing levels of difficulty in the implementation of these measures. Results show that the effect of police perceptions on the adoption of crime prevention measures varies by policing model. Residents who have favorable perceptions of the police as to the performance of traditional policing duties are less inclined to take measures against crime. In contrast, those with favorable perceptions of the police as engaging in community policing/procedural justice are more inclined to take such measures.

Although the Moral Growth Mindset (MGM) Measure was tested and validated in general, whether it measures MGM consistently across people with different political perspectives, which are associated with moral foundations, has not been... more

Although the Moral Growth Mindset (MGM) Measure was tested and validated in general, whether it measures MGM consistently across people with different political perspectives, which are associated with moral foundations, has not been tested. We examined measurement invariance (MI) and differential item functioning (DIF) across different political affiliations to test whether the MGM Measure functioned consistently. We also examined the relationship between MGM, moral foundations, and political affiliation with t-tests and regression analyses. The findings reported that first, at the test level, the strictest MI was achieved, so the measurement structure was consistent between the different political groups. Second, no item showed significant DIF, so the MGM Measure was not biased at the item level. Third, t-tests and regression analyses reported that MGM and its relationship with moral foundations were not significantly associated with political affiliation.

The reported number of individuals affected by Autism Spectrum Disorders, has increased rapidly in recent years and the overwhelming majority of those diagnosed are males. Literature has provided many speculations regarding the increase... more

The reported number of individuals affected by Autism Spectrum Disorders, has increased rapidly in recent years and the overwhelming majority of those diagnosed are males. Literature has provided many speculations regarding the increase in diagnosis rates and the gender disparity, however, there is currently no consensus. This study proposes that some of this variance is attributable to the lack of available research, poor sampling, and bias in assessment tools. The Ritvo Autism and Aspergers Diagnostic Scale-Revised (RAADS-R) and the Gilliam Autism Rating Scale (GARS-3), were selected as the two standardized assessment measures. An item analysis will be conducted on the two assessments, to identify potential biases in the item wording. Individuals who have been diagnosed with ASD will complete the assessment, and their responses will be compared to the items from the item analysis to determine if there is a positive correlation between their responses and the assessment items that where rated as being biased towards a specific gender. The assessments will also be scored to determine if there is variance in over all assessment scores between genders. Finally mean age of diagnosis of both genders will be compared.