Investigating neural architectures for short answer scoring (original) (raw)
Related papers
A Neural Approach to Automated Essay Scoring
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016
Traditional automated essay scoring systems rely on carefully designed features to evaluate and score essays. The performance of such systems is tightly bound to the quality of the underlying features. However, it is laborious to manually design the most informative features for such a system. In this paper, we develop an approach based on recurrent neural networks to learn the relation between an essay and its assigned score, without any feature engineering. We explore several neural network models for the task of automated essay scoring and perform some analysis to get some insights of the models. The results show that our best system, which is based on long short-term memory networks, outperforms a strong baseline by 5.6% in terms of quadratic weighted Kappa, without requiring any feature engineering.
Automatic Short Answer Scoring based on Paragraph Embeddings
International Journal of Advanced Computer Science and Applications, 2018
Automatic scoring systems for students' short answers can eliminate from instructors the burden of grading large number of test questions and facilitate performing even more assessments during lectures especially when number of students is large. This paper presents a supervised learning approach for short answer automatic scoring based on paragraph embeddings. We review significant deep learning based models for generating paragraph embeddings and present a detailed empirical study of how the choice of paragraph embedding model influences accuracy in the task of automatic scoring.
Neural Automated Essay Scoring Incorporating Handcrafted Features
2021
Automated essay scoring (AES) is the task of automatically assigning scores to essays as an alternative to grading by human raters. Conventional AES typically relies on handcrafted features, whereas recent studies have proposed AES models based on deep neural networks (DNNs) to obviate the need for feature engineering. Furthermore, hybrid methods that integrate handcrafted features in a DNN-AES model have been recently developed and have achieved state-of-the-art accuracy. One of the most popular hybrid methods is formulated as a DNN-AES model with an additional recurrent neural network (RNN) that processes a sequence of handcrafted sentencelevel features. However, this method has the following problems: 1) It cannot incorporate effective essay-level features developed in previous AES research. 2) It greatly increases the numbers of model parameters and tuning parameters, increasing the difficulty of model training. 3) It has an additional RNN to process sentence-level features, enabling extension to various DNN-AES models complex. To resolve these problems, we propose a new hybrid method that integrates handcrafted essay-level features into a DNN-AES model. Specifically, our method concatenates handcrafted essay-level features to a distributed essay representation vector, which is obtained from an intermediate layer of a DNN-AES model. Our method is a simple DNN-AES extension, but significantly improves scoring accuracy.
Coherence based automatic short answer scoring using sentence Embedding
Research Square (Research Square), 2023
Automatic essay scoring is an essential educational application in natural language processing (NLP). This automated process will alleviate the burden and increase the reliability and consistency of the assessment. With the advance in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy. However, the actual goals are not attained, like embedding essays into vectors with cohesion and coherence, and providing feedback to students is still challenging. In this paper, we proposed coherence-based embedding of an essay into vectors using sentence-BERT (Bidirectional Encoder Representations from Transformers). We trained these vectors on Long Short-Term Memory (LSTM) and Bi-LSTM (Bidirectional Long Short-Term Memory) to capture sentence connectivity with other sentences' semantics. We used two different datasets; one is standard ASAP Kaggle, and another is a domain-speci c dataset with almost 2500 responses from 650 students. Our model performed well on both datasets, with an average QWK (Quadratic Weighted Kappa) score of 0.76. Furthermore, we achieved good results compared to other prescribed models, and we also tested our model on adversarial responses of both datasets and observed decent outcomes.
Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018
We demonstrate that current state-of-theart approaches to Automated Essay Scoring (AES) are not well-suited to capturing adversarially crafted input of grammatical but incoherent sequences of sentences. We develop a neural model of local coherence that can effectively learn connectedness features between sentences, and propose a framework for integrating and jointly training the local coherence model with a state-of-the-art AES model. We evaluate our approach against a number of baselines and experimentally demonstrate its effectiveness on both the AES task and the task of flagging adversarial input, further contributing to the development of an approach that strengthens the validity of neural essay scoring models.
Feature Enhanced Capsule Networks for Robust Automatic Essay Scoring
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track
Automatic Essay Scoring (AES) Engines have gained popularity amongst a multitude of institutions for scoring test-taker's responses and therefore witnessed rising demand in recent times. However, several studies have demonstrated that the adversarial attacks severely hamper existing state-of-the-art AES Engines' performance. As a result, we propose a robust architecture for AES systems that leverages Capsule Neural Networks, contextual BERT-based text representation, and key textually extracted features. This end-to-end pipeline captures semantics, coherence, and organizational structure along with fundamental rule-based features such as grammatical and spelling errors. The proposed method is validated by extensive experimentation and comparison with the stateof-the-art baseline models. Our results demonstrate that this approach performs significantly better on 6 out of 8 prompts on the Automated Student Assessment Prize (ASAP) dataset. In addition, it shows an overall best performance with a Quadratic Weighted Kappa (QWK) metric of 81%. Moreover, we empirically demonstrate that it is successful in identifying adversarial responses and scoring them lower.
Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems
5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD), 2022
Automatic scoring engines have been used for scoring approximately fifteen million test takers in just the last three years. This number is increasing further due to COVID-19 and the associated automation of education and testing. Despite such wide usage, the AI based testing literature of these 'intelligent' models is highly lacking. Most of the papers proposing new models rely only on quadratic weighted kappa (QWK) based agreement with human raters for showing model efficacy. However, this effectively ignores the highly multi-feature nature of essay scoring. Essay scoring depends on features like coherence, grammar, relevance, sufficiency, vocabulary, etc., and till date, there has been no study testing Automated Essay Scoring (AES) systems holistically on all these features. With this motivation, we propose a model agnostic adversarial evaluation scheme and associated metrics for AES systems to test their natural language understanding capabilities and overall robustness. We evaluate the current state-of-the-art AES models using the proposed scheme and report the results on five recent models. These models range from feature-engineering based approaches to the latest deep learning algorithms. We find that AES models are highly overstable such that even heavy modifications (as much as 25%) with content unrelated to the topic of the questions does not decrease the score produced by the models. On the other hand, unrelated content, on average, increases the scores, thus showing that the models' evaluation strategy and rubrics should be reconsidered. We also ask 200 human raters to score both an original and adversarial response to see if humans are able to detect differences between the two and whether they agree with the scores assigned by autoscorers. CCS Concepts: • Computing methodologies → Natural language processing; Discourse, dialogue and pragmatics; • Applied computing → Education.
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications, 2018
In this paper we present a qualitatively enhanced deep convolution recurrent neural network for computing the quality of a text in an automatic essay scoring task. The novelty of the work lies in the fact that instead of considering only the word and sentence representation of a text, we try to augment the different complex linguistic, cognitive and psychological features associated within a text document along with a hierarchical convolution recurrent neural network framework. Our preliminary investigation shows that incorporation of such qualitative feature vectors along with standard word/sentence embeddings can give us better understanding about improving the overall evaluation of the input essays.
The effects of data size on Automated Essay Scoring engines
ArXiv, 2021
We study the effects of data size and quality on the performance on Automated Essay Scoring (AES) engines that are designed in accordance with three different paradigms; A frequency and hand-crafted feature-based model, a recurrent neural network model, and a pretrained transformer-based language model that is fine-tuned for classification. We expect that each type of model benefits from the size and the quality of the training data in very different ways. Standard practices for developing training data for AES engines were established with feature-based methods in mind, however, since neural networks are increasingly being considered in a production setting, this work seeks to inform us as to how to establish better training data for neural networks that will be used in production.
Empowering Short Answer Grading: Integrating Transformer-Based Embeddings and BI-LSTM Network
Big Data and Cognitive Computing
Automated scoring systems have been revolutionized by natural language processing, enabling the evaluation of students’ diverse answers across various academic disciplines. However, this presents a challenge as students’ responses may vary significantly in terms of length, structure, and content. To tackle this challenge, this research introduces a novel automated model for short answer grading. The proposed model uses pretrained “transformer” models, specifically T5, in conjunction with a BI-LSTM architecture which is effective in processing sequential data by considering the past and future context. This research evaluated several preprocessing techniques and different hyperparameters to identify the most efficient architecture. Experiments were conducted using a standard benchmark dataset named the North Texas Dataset. This research achieved a state-of-the-art correlation value of 92.5 percent. The proposed model’s accuracy has significant implications for education as it has the...