Automatic Scoring of Paper-And-Pencil Figural Responses (original) (raw)

A Comparison of Multiple-Choice and Constructed Figural Response Items

Journal of Educational Measurement, 1991

In contrast to multiple-choice test questions, figural response items call for constructed responses and rely upon figural material, such as illustrations and graphs, as the response medium. Figural response questions in various science domains were created and administered to a sample of 4th-, 8th-, and 12th-grade students. Item and test statistics from parallel sets of figural response and multiple-choice questions were compared. Figural response items were generally more difficult, especially for questions that were difficult (p < .5) in their constructed-response forms. Figural response questions were also slightly more discriminating and reliable than their multiple-choice counterparts, but they had higher omit rates. This article addresses the relevance of guessing to figural response items and the diagnostic value of the item type. Plans for future research on figural response items are discussed.

Evaluation of Procedure-Based Scoring for Hands-On Science Assessment

Journal of Educational Measurement, 1992

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Format Effects of Empirically Derived Multiple-Choice Versus Free-Response Instruments When Assessing Graphing Abilities

2017

Prior graphing research has demonstrated that clinical interviews and free- response instruments produce very different results than multiple-choice instruments, indicating potential validity problems when using multiple-choice instruments to assess graphing skills (Berg & Smith in Science Education, 78(6), 527–554, 1994). Extending this inquiry, we studied whether empirically derived, participant-generated graphs used as choices on the multiple-choice graphing instrument produced results that corresponded to participants’ responses on free-response instruments. The 5 – 8 choices on the multiple- choice instrument came from graphs drawn by 770 participants from prior research on graphing (Berg, 1989; Berg & Phillips in Journal of Research in Science Teaching, 31(4), 323–344, 1994; Berg & Smith in Science Education, 78(6), 527–554, 1994). Statistical analysis of the 736 7th – 12th grade participants indicate that the empirically derived multiple-choice format still produced significantly more picture-of-the-event responses than did the free-response format for all three graphing questions. For two of the questions, participants who drew graphs on the free-response instruments produced significantly more correct responses than those who answered multiple-choice items. In addition, participants having “low classroom performance” were affected more significantly and negatively by the multiple-choice format than participants having “medium” or “high classroom performance.” In some cases, prior research indicating the prevalence of “picture-of-the-event” and graphing treatment effects may be spurious results, a product of the multiple-choice item format and not a valid measure of graphing abilities. We also examined how including a picture of the scenario on the instrument versus only a written description affected responses and whether asking participants to add marker points to their constructed or chosen graph would overcome the short-circuited thinking that multiple-choice items seem to produce.

Exploring visual material in PISA and school-based examination tests

Skhôlé – Cahiers de la Recherche et du Développement

One of the keystones of science education is teaching the communication of science: written, oral, or visual. This study aims at comparing visual material included in PISA science test items related to the field of biological systems and in Biology test items set at the end-of-year school-based advancement and discharge examinations, intended for 7th and 9th grade Greek students of the Greek 'Gymnasium'. More particularly, visual material is investigated along the following dimensions: (a) The frequency of its inclusion in evaluation test items; (b) Its type (photographs, drawings, flowcharts, cutaway exhibitions, maps, graphs, tables)

The Effects of Images on Multiple-choice Questions in Computer-based Formative Assessment

Digital Education Review , 2015

Current learning and assessment are evolving into digital systems that can be used, stored, and processed online. In this paper, three different types of questionnaires for assessment are presented. All the questionnaires were filled out online on a web-based format. A study was carried out to determine whether the use of images related to each question in the questionnaires affected the selection of the correct answer. Three questionnaires were used: two questionnaires with images (images used during learning and images not use during learning) and another questionnaire with no images, text-only. Ninety-four children between seven and eight years old participated in the study. The comparison of the scores obtained on the pre-test and on the post-test indicates that the children increased their knowledge after the training, which demonstrates that the learning method is effective. When the post-test scores for the three types of questionnaires were compared, statistically significant differences were found in favour of the two questionnaires with images versus the text-only questionnaire. No statistically significant differences were found between the two types of questionnaires with images. Therefore, to a great extent, the use of images in the questionnaires helps students to select the correct answer. Since this encourages students, adding images to the questionnaires could be a good strategy for formative assessment.

Automated Scoring of Constructed-Response Science Items: Prospects and Obstacles

Educational Measurement: Issues and Practice, 2014

Content-based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept-based scoring tool for content-based scoring, c-rater TM , for four science items with rubrics aiming to differentiate among multiple levels of understanding. The items showed moderate to good agreement with human scores. The findings suggest that automated scoring has the potential to score constructed-response items with complex scoring rubrics, but in its current design cannot replace human raters. This article discusses sources of disagreement and factors that could potentially improve the accuracy of concept-based automated scoring.

Performance Assessments in Science: Hands-On Tasks and Scoring Guides

1996

In 1992, RAND received a grant from the National Science Foundation to study the technical quality of performance assessments in science and to evaluate their fPasibility for use.in large-scale testing programs. The specific goals of the project were to assess the reliability and validity of hands-on science testing and to investigate the cost and practicality of these types of measures for large-scale assessment. The purpose of this monograph is to make the science tasks and scoring guides developed as part of the project available to other researchers and educational practitioners. This collection of measures is designed to provide researchers with a basic set of tasks they can build upon when studying student performance in science and investigating alternative approaches to science assessment. For this reason, information is reported about the conditions under which the tasks were administered and the reliability of the scoring guides (inter-reader correlations). The tasks should also be useful to practitioners in their discussions about measuring student performance in science, the types of activities that may be used in future state and national assessment systems, and the changes that need to take place in staff development. The document contains a complete description of each task used in the study, including the shell (or testing blueprint) from which the task was developed and copies or photos of the task booklet, the materials or apparatus that accompanied the task, the scoring guide, and the form used to record scores. The task topics studied include incline, force, friction, pendulum, lever, classification of animals, classification of materials, acids and basesvinegar, acids and bases-alien, radiation, rate of cooling, heat, temperature, rosion and pollution. Contains nine tables and Preface In 1992, RAND received a grant from the National Science Foundation to study the technical quality of performance assessments in science and to evaluate their feasibility for use in large-scale testing programs. The specific goals of the project were to assess the reliability and validity of hands-on science testing and to investigate the cost and practicality of these types of measures for large-scale assessment. RAND collaborated with researchers from the University of California, Santa Barbara; Stanford University; the Far West Laboratory; and the California State Department of Education to develop and admiaister several science exercises to students in elementary, middle, and high schools in 1993 and 1994. Findings regarding the development, quality, and feasibility of hands-on science assessments have been reported in a number of papers and journal articles (see References). The purpose of this monograph is to make the science tasks and scoring guides developed as part of the project available to other researchers and educational practitioners. This collection of measures should provide researchers with a basic set of tasks they can build upon when studying student performance in science and investigating alternative approaches to science assessment. For this reason, we report information about the conditions under which the tasks were administered and the reliability of the scoring guides (inter-reader correlations). The tasks should also be useful to practitioners in their discussions about measuzing student performance in science, the types of activities that may be used in future state and national assessment systems, and the changes that need to take place in staff development. This document contains a complete description of each task used in the study, including the shell (or testing blueprint) from which the task was developed and copies or photos of the task booklet, the materials or apparatus that accompanied the task, the scoring guide, and the form those who scored the tests used to record scores. We anticipate that this information will allow the interested reader to reproduce all the tasks used in the project. rk.

A computer-aided environment for construction of multiple-choice tests

2005

Multiple-choice tests have proved to be an efficient tool for measuring students' achievement and are used on a daily basis both for assessment and diagnostics worldwide. Statistics suggest that the Question Mark Computing Ltd.'s testing software Perception alone has had more ...

Comparing international and national science assessment: what we learn about the use of visual representations

Educational Journal of the University of Patras UNESCO Chair, 2(1), 96-110.

Within the research community in science education, there has been a tendency to show limited interest in examining PISA results with reference to the national context of participating countries although this approach can give valuable insight into a country’s students’ achievement. Since the interpretations of PISA results could be based on a thorough analysis of the actual items used in international and national contexts, the main issue addressed in this study is to compare PISA test items with assessment tasks used in the Greek school context. 281 PISA science test items as well as 947 assessment tasks included in science school textbooks and 4,248 science examination test items in Greece, were analysed in regard to the frequency of inclusion, the type and the functional role of visual representations within this assessment tasks. The results demonstrate that while PISA test items use visual material in order to communicate scientific information in everyday life contexts by means of specialised graphs and photographs of familiar entities, schooling does not familiarize Greek students with visual representations widely used in science and embedded in real-life situations.