Automatic Grading of Portuguese Short Answers Using a Machine Learning Approach (original) (raw)
Related papers
Portuguese Automatic Short Answer Grading
Anais do XXIX Simpósio Brasileiro de Informática na Educação (SBIE 2018)
Automatic Short Answer Grading is the study field that addresses the assessment of students' answers to questions in natural language. Besides length, it differs from automatic essay grading by focusing on the evaluation of content instead of answer's style. The grading of the answers is generally seen as a typical classification supervised learning. Many works have been recently developed, but most of them deal with data in the English language. In this paper, we present a new Portuguese dataset and system for automatic short answer grading. The data was collected with the participation of 13 teachers, 12 undergraduate students and 245 elementary school students. Results achieved 69% accuracy in four-class classification and 85% on binary classification.
Knowledge and Information Systems
Automatic short answer grading (ASAG), a hot field of natural language understanding, is a research area within learning analytics. ASAG solutions are conceived to offload teachers and instructors, especially those in higher education, where classes with hundreds of students are the norm and the task of grading (short)answers to open-ended questionnaires becomes tougher. Their outcomes are precious both for the very grading and for providing students with “ad hoc” feedback. ASAG proposals have also enabled different intelligent tutoring systems. Over the years, a variety of ASAG solutions have been proposed, still there are a series of gaps in the literature that we fill in this paper. The present work proposes GradeAid, a framework for ASAG. It is based on the joint analysis of lexical and semantic features of the students’ answers through state-of-the-art regressors; differently from any other previous work, (i) it copes with non-English datasets, (ii) it has undergone a robust va...
Exploring Distinct Features for Automatic Short Answer Grading
Anais do XV Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2018)
Automatic short answer grading is the study field that addresses the assessment of students’ answers to questions in natural language. The grading of the answers is generally seen as a typical classification supervised learning. To stimulate research in the field, two datasets were publicly released in the SemEval 2013 competition task “Student Response Analysis”. Since then, some works have been developed to improve the results. In this context, the goal of this work is to tackle such task by implementing lessons learned from the literature in an effective way and report results for both datasets and all of its scenarios. The proposed method obtained better results in most scenarios of the competition task and, therefore, higher overall scores when compared to recent works.
A scoring rubric for automatic short answer grading system
TELKOMNIKA Telecommunication Computing Electronics and Control, 2019
During the past decades, researches about automatic grading have become an interesting issue. These studies focuses on how to make machines are able to help human on assessing students' learning outcomes. Automatic grading enables teachers to assess student's answers with more objective, consistent, and faster. Especially for essay model, it has two different types, i.e. long essay and short answer. Almost of the previous researches merely developed automatic essay grading (AEG) instead of automatic short answer grading (ASAG). This study aims to assess the sentence similarity of short answer to the questions and answers in Indonesian without any language semantic's tool. This research uses pre-processing steps consisting of case folding, tokenization, stemming, and stopword removal. The proposed approach is a scoring rubric obtained by measuring the similarity of sentences using the string-based similarity methods and the keyword matching process. The dataset used in this study consists of 7 questions, 34 alternative reference answers and 224 student's answers. The experiment results show that the proposed approach is able to achieve a correlation value between 0.65419 up to 0.66383 at Pearson's correlation, with Mean Absolute Error () value about 0.94994 until 1.24295. The proposed approach also leverages the correlation value and decreases the error value in each method.
Automated Short Answer Grading: A Simple Solution for a Difficult Task
2019
English. The task of short answer grading is aimed at assessing the outcome of an exam by automatically analysing students’ answers in natural language and deciding whether they should pass or fail the exam. In this paper, we tackle this task training an SVM classifier on real data taken from a University statistics exam, showing that simple concatenated sentence embeddings used as features yield results around 0.90 F1, and that adding more complex distance-based features lead only to a slight improvement. We also release the dataset, that to our knowledge is the first freely available dataset of this kind in Italian.1
Intelligent Short Answer Assessment using Machine Learning
International Journal of Engineering and Advanced Technology, 2020
Education is fundamental for human progress. A student is evaluated by the mark he/she scores. The evaluation of student’s work is a central aspect of the teaching profession that can affect students in significant ways. Though teachers use multiple criteria for assessing student work, it is not known if emotions are a factor in their grading decisions. Also, there are several mistakes that occur on the department's side like totaling error, marking mistakes. So, we are developing software to automate the evaluation of answers using Natural Language Processing and Machine Learning. There are two modules, in the first module, we use Optical Character Recognition to extract a handwritten font from the uploaded file and the second module evaluates the answer based on various factors and the mark is awarded. For every answer being entered, evaluation is done based on the usage of word, their importance and grammatical meaning of the sentence. With this approach we can save the cost ...
Indonesian automatic short answer grading system
Bulletin of Electrical Engineering and Informatics , 2022
Short answer question is one of the methods used to evaluate student cognitive abilities, including memorizing, designing, and freely expressing answers based on their thoughts. Unfortunately, grading short answers is more complicated than grading multiple choices answers. For that problem, several studies have tried to build an artificial intelligence system called automatic short answer grading (ASAG). We tried to improve the accuracy of the ASAG system at scoring student answers in Indonesian by enhancing the earlier stateof-the-art models and methods. They were the bidirectional encoder representations from transformer (BERT) with fine-tuning approach and ridge regression models utilizing advanced feature extraction. We conducted this study by doing stages of literature review, data set preparation, model development, implementation, and comparison. Using two different ASAG data sets, the best result of this study was an achievement of 0.9508 in pearson's correlation and 0.4138 in root-mean-square error (RMSE) by the BERT-based model with the fine-tuning approach. This result outperformed the results of the previous studies using the same evaluation metrics. Thus, it proved our ASAG system using the BERT model with fine-tuning approach can improve the accuracy of grading short answers.
Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
Lecture Notes in Computer Science, 2020
This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-theart models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen's Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater's correctness and consistency of judgement.
Automatic short answer grading and feedback using text mining methods
Procedia Computer Science, 2020
Automatic grading is not a new approach but the need to adapt the latest technology to automatic grading has become very important. As the technology has rapidly became more powerful on scoring exams and essays, especially from the 1990s onwards, partially or wholly automated grading systems using computational methods have evolved and have become a major area of research. In particular, the demand of scoring of natural language responses has created a need for tools that can be applied to automatically grade these responses. In this paper, we focus on the concept of automatic grading of short answer questions such as are typical in the UK GCSE system, and providing useful feedback on their answers to students. We present experimental results on a dataset provided from the introductory computer science class in the University of North Texas. We first apply standard data mining techniques to the corpus of student answers for the purpose of measuring similarity between the student answers and the model answer. This is based on the number of common words. We then evaluate the relation between these similarities and marks awarded by scorers. We consider an approach that groups student answers into clusters. Each cluster would be awarded the same mark, and the same feedback given to each answer in a cluster. In this manner, we demonstrate that clusters indicate the groups of students who are awarded the same or the similar scores. Words in each cluster are compared to show that clusters are constructed based on how many and which words of the model answer have been used. The main novelty in this paper is that we design a model to predict marks based on the similarities between the student answers and the model answer. We argue that computational methods be used to enhance the reliability of human scoring, and not replace it. Humans are required to calibrate the system, and to deal with situations that are challenging. Computational methods can provide insight into which student answers will be found challenging and thus be a place human judgement is required.
UKARA: A Fast and Simple Automatic Short Answer Scoring System for Bahasa Indonesia
ICEAP Proceeding Book Vol 2
This paper presents UKARA, a fast and simple automatic short-answer scoring system for Bahasa Indonesia. Automatic short-answer scoring holds an important role in speeding up automatic assessment process. Although this area has been widely explored, only very limited number of previous work have studied Bahasa Indonesia. One of the major challenges in this field is the different type of questions which require different assessments. We are addressing this problem by implementing a combination of Natural Language Processing (NLP) and supervised machine learning techniques. Our system works by training a classifier model on human-labeled data. Using three different types of Programme for International Student Assessment (PISA) student responses, our system successfully produced the F1-score above 97% and 70% on dichotomous and polytomous scoring types respectively. Moreover, UKARA provides a user-friendly interface which is simple and easy to use. UKARA offers a flexibility for human grader to do re-scoring and retraining the model until the optimal performance is obtained.