Testing Parameter Invariance for Questionnaire Indices Using Confirmatory Factor Analysis and Item Response Theory (original) (raw)

One of the most important goals of international studies in educational research is the comparison of learning outcomes across participating countries. In order to compare results it is necessary to collect data using comparable measures. Studies like TIMSS, CIVED, PIRLS or PISA invest considerable efforts in attempts to develop tests which are appropriately translated into the test languages, culturally unbiased and suitable for the diverse educational systems across participating countries. Typically, IRT (Item Response Theory) scaling methodology (see Hambleton, Swaminathan and Rogers, 1991) is used to review Differential Item Functioning (DIF) for countries and detect country-specific item misfit (see examples in Adams, 2002; Schulz and Sibberns, 2004). Likewise, it is of great importance to achieve similar levels of comparability for measures derived from contextual questionnaires. Data collected from contextual questionnaires are often used to explain variation in student performance. However, many constructs measured in student questionnaire (for example self-related cognitions regarding areas of learning, classroom climate etc.) can often also be regarded as important learning outcomes. In the OECD PISA, for example, study contextual data are collected through student and school questionnaires. Questionnaire items are treated in three different ways (see OECD, 2005, pp. 271-319): • They are reported as single items (for example gender, grade). • They are converted into "simple indices" through the arithmetical transformation or recoding of one or more items. • They are scaled. Typically, Item Response Theory (IRT) is used as scaling methodology in order to obtain individual student scores (Weighted Likelihood Estimates). Language differences can have a powerful effect on equivalence (or nonequivalence). Typically, source versions (in English or French) are translated into the language used in a country. In most international studies, reviews of national adaptations and thorough translation verifications are implemented in order to ensure a maximum of "linguistic equivalence" (see Grisay, 2002; Chrostowski and Malak, 2004). However, it is well known that even slight deviations in wording (sometimes necessary due to linguistic differences between source and target language) may lead to differences in item responses (