Comparing human versus automatic feature extraction for fine-grained elementary readability assessment (original) (raw)
Related papers
A comparison of features for automatic readability assessment
Proceedings of the 23rd International Conference on Computational Linguistics Posters, 2010
Several sets of explanatory variables-including shallow, language modeling, POS, syntactic, and discourse features-are compared and evaluated in terms of their impact on predicting the grade level of reading material for primary school students. We find that features based on in-domain language models have the highest predictive power. Entity-density (a discourse feature) and POS-features, in particular nouns, are individually very useful but highly correlated. Average sentence length (a shallow feature) is more useful-and less expensive to compute-than individual syntactic features. A judicious combination of features examined here results in a significant improvement over the state of the art.
Ranking-based readability assessment for early primary children’s literature
Determining the reading level of children's literature is an important task for providing educators and parents with an appropriate reading trajectory through a curriculum. Automating this process has been a challenge addressed before in the computational linguistics literature, with most studies attempting to predict the particular grade level of a text. However, guided reading levels developed by educators operate at a more fine-grained level, with multiple levels corresponding to each grade. We find that ranking performs much better than classification at the fine-grained leveling task, and that features derived from the visual layout of a book are just as predictive as standard text features of level; including both sets of features, we find that we can predict the reading level up to 83% of the time on a small corpus of children's books.
A multivariate model for classifying texts' readability
We report on results from using the multi-variate readability model SVIT to classify texts into various levels. We investigate how the language features integrated in the SVIT model can be transformed to values on known criteria like vocabulary, grammatical fluency and propositional knowledge. Such text criteria, sensitive to content , readability and genre in combination with the profile of a student's reading ability form the base of individually adapted texts. The procedure of levelling texts into different stages of complexity is presented along with results from the first cycle of tests conducted on 8th grade students. The results show that SVIT can be used to classify texts into different complexity levels.
Linguistic Features for Readability Assessment
2020
Readability assessment aims to automatically classify text by the level appropriate for learning readers. Traditional approaches to this task utilize a variety of linguistically motivated features paired with simple machine learning models. More recent methods have improved performance by discarding these features and utilizing deep learning models. However, it is unknown whether augmenting deep learning models with linguistically motivated features would improve performance further. This paper combines these two approaches with the goal of improving overall model performance and addressing this question. Evaluating on two large readability corpora, we find that, given sufficient training data, augmenting deep learning models with linguistically motivated features does not improve state-of-the-art performance. Our results provide preliminary evidence for the hypothesis that the state-of-theart deep learning models represent linguistic features of the text related to readability. Future research on the nature of representations formed in these models can shed light on the learned features and their relations to linguistically motivated ones hypothesized in traditional approaches.
Automatic assessment of children's reading level
… Annual Conference of …, 2007
In this paper, an automatic system for the assessment of reading in children is described and evaluated. The assessment is based on a reading test with 40 words, presented one by one to the child by means of a computerized reading tutor. The score that expresses the ...
A New Text Readability Measure for Fiction Texts
SSRN Electronic Journal
English teachers often have difficulty matching the complexity of fiction texts with students' reading levels. Texts that seem appropriate for students of a given level can turn out to be too difficult. Furthermore, it is difficult to choose a series of texts that represent a smooth gradation of text difficulty. This paper attempts to address both problems by providing a complexity ranking of a corpus of 200 fiction texts consisting of 100 adults' and 100 children's texts. Using machine learning, several standard readability measures are used as variables to create a classifier which is able to classify the corpus with an accuracy of 84%. A classifier created with linguistic variables is able to classify the corpus with an accuracy of 89%. The 'latter classifier is then used to provide a linear complexity rank for each text. The resulting ranking instantiates a fine-grained increase in complexity. This can be used by a reading or ESL teacher to select a sequence of texts that represent an increasing challenge to a student without there being a frustratingly perceptible increase in difficulty.
Reading Level Identification Using Natural Language Processing Techniques
2021
This paper investigates using the Bidirectional Encoder Representations from Transformers (BERT) algorithm and lexical-syntactic features to measure readability. Readability is important in many disciplines, for functions such as selecting passages for school children, assessing the complexity of publications, and writing documentation. Text at an appropriate reading level will help make communication clear and effective. Readability is primarily measured using well-established statistical methods. Recent advances in Natural Language Processing (NLP) have had mixed success incorporating higher-level text features in a way that consistently beats established metrics. This paper contributes a readability method using a modern transformer technique and compares the results to established metrics. This paper finds that the combination of BERT and readability metrics provide a significant improvement in estimation of readability as defined by Crossley et al. [1]. The BERT+Readability mod...
SALTeL Journal (Southeast Asia Language Teaching and Learning)
Selecting suitable reading materials are taxing and challenging for many English instructors. Text readability analysis can be used to automate the process of reading material selection and also the assessment of reading ability for language learners. Readability formulas have been broadly used in determining text difficulty based on learners’ grade level. Based on mathematical calculations, a readability formula examines certain features of a text in order to provide best rough approximations as an indication of difficulty. This paper reflects some aspects and issues of readability analysis.
Use of a New Set of Linguistic Features to Improve Automatic Assessment of Text Readability
2012
The present paper proposes and evaluates a readability assessment method designed for Japanese learners of EFL (English as a foreign language). The proposed readability assessment method is constructed by a regression algorithm using a new set of linguistic features that were employed separately in previous studies. The results showed that the proposed readability assessment method, which used all the linguistic features employed in previous studies, yielded a lower error of assessment than readability assessment methods using only some of these linguistic features.