The Effects of Age-based and Grade-based Sampling on the Relative Standing of Countries in International Comparative Studies of Student Achievement (original) (raw)
Related papers
Item-by-country interactions in PISA 2003: Country specific profiles of science achievement
Correspondence to r.v.olsen@ils.uio.no ABSTRACT The cognitive items covering the domain of scientific literacy in the Programme for International Student Assessment (PISA) are explored through a cluster analysis of the item p-value residuals. Such residuals are often referred to as item-by-country interactions. The analysis clearly indicates distinct clusters of countries with similar profiles. The most stable country clusters have been labelled 'English speaking countries', 'East-Asian countries', 'German speaking countries' and 'South American countries'. A more detailed inspection is done of the profiles for the Nordic countries, and they are shown to be members of a larger group of countries which is labelled North-West European countries. Some detailed features of the profiles are described using item characteristics such as the categories used in the operational definition of scientific literacy given in the framework. In projects like TIMSS and...
This chapter argues that PISA is more than a driver for policy decisions in many countries. The study also provides unique data with the potential to engage educational researchers across the world in conducting a range of secondary analyses. The first section of the chapter describes how the primary purpose of such studies in general has gradually evolved. This description reflects how the studies have typically related to educational research. This section of the chapter is used as the general background for the second and major section, which presents a rationale for why educational researchers could or should be motivated to engage in analytical work relating to these studies. This is followed up by a provisional framework for how educational researchers may approach and make use of the data from these studies in secondary analyses. This framework is based on six generic analytical approaches derived from the study of a large number of examples of published secondary analyses.
Irish Educational Studies, 2001
Data from the Third International Mathematics and Science Study (TIMSS) were examined to determine the extent to which the rank ordering of countries based on pupil test performance was consistent across three different item formats: multiple-choice, short-answer, and extended-response. Findings from the analysis are used to make the case that international comparative studies are very complex and that the data they generate cannot be taken at face value but need close examination before firm conclusions can be drawn about a country's relative performance. The focus was the science performance of Irish second year secondary school students (Grade 8) in TIMSS across different item types, comparing this with the performance of similar cohorts in 11 other countries. Irish student performance was close to the international averages for short-answer and multiple-choice items, but performance on extended-response items was significantly above the international average. An examination of the match of these test items to the Irish curriculum was not good, and the Irish curriculum was judged to encourage higher-order thinking less than in other countries. Both of these factors made the good performance on extended-response items surprising. In many respects, these findings confirm the suspicion of W. Cooley and G. Leinhart (1980) that frequent exposure to test format will make a difference in performance. In Ireland there is a tradition of more open-ended essay type tests, and this may account for students' success with extended-response items. These findings also demonstrate the difficulties involved in making international comparisons of academic performance. An appendix contains a table of science averages for TIMSS participants. (Contains 63 references.) (SLD)
Large-scale Assessments in Education, 2013
The PISA 2009 results for Ireland indicated a large decline in reading literacy scores since PISA 2000 (the largest of 38 countries). The decline in mathematics scores since PISA 2003 was the second largest of 39 countries. In contrast, there was no change in science achievement since PISA 2006. These results prompted detailed investigations into possible reasons for the declines, particularly in reading. This paper considers the changes in achievement observed for Ireland in PISA 2009 under two themes: implementation of PISA in Ireland and changes in the cohort of students participating in PISA, and response patterns on the PISA test (as measures of student engagement). It is argued that the case of Ireland represents the 'perfect storm', since a range of factors appear to have been in operation to produce the results. The discussion attempts to show how the case of Ireland can be relevant to other countries which may have experienced changes in PISA test scores over time. Some of the findings have relevance to international practice in large-scale surveys of educational achievement more generally.
International surveys of educational achievement: how robust are the findings?
Journal of the Royal Statistical Society: Series A (Statistics in Society), 2007
International surveys of educational achievement and functional literacy are increasingly common. We consider two aspects of the robustness of their results. First, we compare results from four surveys: the Trends in International Maths and Science Study, the Programme for International Student Assessment, the Progress in International Reading Literacy Study and the International Adult Literacy Survey. This contrasts with the standard approach which is to analyse just one survey in isolation. Second, we investigate whether results are sensitive to the choice of item response model that is used by survey organizers to aggregate respondents' answers into a single score. In both cases we focus on countries' average scores, the within-country differences in scores and on the association between the two.
When the International Association for the Evaluation of Educational Achievement (IEA) was established in 1959 the basic idea was to use comparative analysis of country differences in achievement to take advantage of the world as an educational laboratory. A large number of comparative studies involving substantial numbers of countries has since then been conducted and much has indeed been learned. However, it has also been learned that causal inference from cross-sectional comparative data is a weak method for gaining knowledge about which factors are conducive to educational achievement, because of the impossibility to control for the large number of differences between countries. During the 1990s a new generation of IEA studies was launched. These studies (e. g. TIMSS) were designed to give information about within-country trends of achievement in addition to information about between-country differences. The paper proposes that analysis of within-country differences over time is a powerful method of finding out which educational factors are related to achievement, and particularly so when the analysis involves several countries. This suggestion is illustrated through analyses of data from the TIMSS study of mathematics achievement in 1995 and 2003 for grades 4 and 8, investigating effects of age and class size on achievement.
This chapter raises many important questions about the PISA project, focused on two critical arguments with implications for education policymaking. The first argument relates to the PISA project itself and is that basic structural problems are inherent in the PISA undertaking and, hence, cannot be fixed. I will argue that it is impossible to construct a test that can be used across countries and cultures to assess the quality of learning in real life situations with authentic texts. Problems arise when the intentions of the PISA framework are translated into concrete test items to be used in a great variety of languages, cultures, and countries. The requirement of “fair testing” implies by necessity that local, current, and topical issues must be excluded if items are to transfer objectively across cultures, languages, and customs. This runs against most current thinking in science education, where “science in context” and “localized curricula” are ideals promoted by UNESCO and many educators, as well as in national curricula. My second argument relates to some of the rather intriguing results that emerge from analyses of PISA data. It seems that pupils in high-scoring countries also develop the most negative attitudes toward the subjects on which they are tested. It also seems that PISA scores are unrelated to educational resources, funding, class size, and similar factors. PISA scores also seem to be negatively related to the use of active teaching methods, inquiry-based instruction, and computer technology. PISA scores seem to function like a kind of IQ test on school systems. A most complex issue is reduced to simple numbers that may be ranked with high accuracy. But, as with IQ scores, there are serious concerns about the validity of the PISA scores. Whether one believes in the goals and results of PISA, such issues need to be discussed.