Readability assessment for text simplification (original) (raw)

A comparison of features for automatic readability assessment

Proceedings of the 23rd International Conference on Computational Linguistics Posters, 2010

Several sets of explanatory variables-including shallow, language modeling, POS, syntactic, and discourse features-are compared and evaluated in terms of their impact on predicting the grade level of reading material for primary school students. We find that features based on in-domain language models have the highest predictive power. Entity-density (a discourse feature) and POS-features, in particular nouns, are individually very useful but highly correlated. Average sentence length (a shallow feature) is more useful-and less expensive to compute-than individual syntactic features. A judicious combination of features examined here results in a significant improvement over the state of the art.

Towards an improved methodology for automated readability prediction

2010

Abstract Readability formulas are often employed to automatically predict the readability of an unseen text. In this article, the formulas and the text characteristics they are composed of are evaluated in the context of large corpora. We describe the behaviour of the formulas and the text characteristics by means of correlation matrices, principal component analysis and a collinearity test. We show methodological shortcomings to some of the existing readability formulas.

Text readability and intuitive simplification: A comparison of readability formulas

Reading in a Foreign Language, 2011

Texts are routinely simplified for language learners with authors relying on a variety of approaches and materials to assist them in making the texts more comprehensible. Readability measures are one such tool that authors can use when evaluating text comprehensibility. This study compares the Coh-Metrix Second Language (L2) Reading Index, a readability formula based on psycholinguistic and cognitive models of reading, to traditional readability formulas on a large corpus of texts intuitively simplified for language learners. The goal of this study is to determine which formula best classifies text level (advanced, intermediate, beginner) with the prediction that text classification relates to the formulas’ capacity to measure text comprehensibility. The results demonstrate that the Coh-Metrix L2 Reading Index performs significantly better than traditional readability formulas, suggesting that the variables used in this index are more closely aligned to the intuitive text processing employed by authors when simplifying texts.

Read-it: Assessing readability of italian texts with a view to text simplification

2011

In this paper, we propose a new approach to readability assessment with a specific view to the task of text simplification: the intended audience includes people with low literacy skills and/or with mild cognitive impairment. READ-IT represents the first advanced readability assessment tool for what concerns Italian, which combines traditional raw text features with lexical, morpho-syntactic and syntactic information. In READ-IT readability assessment is carried out with respect to both documents and sentences where the latter represents an important novelty of the proposed approach creating the prerequisites for aligning the readability assessment step with the text simplification process. READ-IT shows a high accuracy in the document classification task and promising results in the sentence classification scenario.

Simple or not Simple? A Readability Question

Recent Advances in Language Production, Cognition and the Lexicon (Eds.: N. Gala, R. Rapp, G. Bel-Enguix).

Text Simplification (TS) has taken off as an important Natural Language Processing (NLP) application which promises to offer a significant societal impact in that it can be employed to the benefit of users with limited language comprehension skills such as children, foreigners who do not have a good command of a language, and readers with language impairments. With the recent emergence of various TS systems, the question we are faced with is how to automatically evaluate their performance given that access to target users might be difficult. This chapter addresses one aspect of this issue by exploring whether existing readability formulae could be applied to assess the level of simplification offered by a TS system. It focuses on three readability indices for Spanish. The indices are first adapted in a way that allows them to be computed automatically and then applied to two corpora of original and manually simplified texts. The first corpus has been compiled as part of the Simplext project targeting people with Down syndrom, and the second corpus as part of the FIRST project, where the users are people with autism spectrum disorder (ASD). The experiments show that there is a significant correlation between each of the readability indices and eighteen linguistically motivated features which might be seen as reading obstacles for various target populations, thus indicating the possibility of using those indices as a measure of the degree of simplification achieved by TS systems. Various ways they can be used in TS are further illustrated by comparing their values when applied to four different corpora.

Automatic summarization for text simplification: evaluating text understanding by poor readers

2008

In this paper we present experiments on summarization and text simplification for poor readers, more specifically, functional illiteracy readers. We test several summarizers and use summaries as the basis of simplification strategies. We show that each simplification approach has different effects on readers of varied levels of literacy, but that all of them do improve text understanding at some level.

A Simple Post-Processing Technique for Improving Readability Assessment of Texts using Word Mover's Distance

ArXiv, 2021

Assessing the proper difficulty levels of reading materials or texts in general is the first step towards effective comprehension and learning. In this study, we improve the conventional methodology of automatic readability assessment by incorporating the Word Mover’s Distance (WMD) of ranked texts as an additional post-processing technique to further ground the difficulty level given by a model. Results of our experiments on three multilingual datasets in Filipino, German, and English show that the post-processing technique outperforms previous vanilla and ranking-based models using SVM1

Online Readability and Text Complexity Analysis with TextEvaluator

Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 2015

We have developed the TextEvaluator system for providing text complexity and Common Core-aligned readability information. Detailed text complexity information is provided by eight component scores, presented in such a way as to aid in the user's understanding of the overall readability metric, which is provided as a holistic score on a scale of 100 to 2000. The user may select a targeted US grade level and receive additional analysis relative to it. This and other capabilities are accessible via a feature-rich front-end, located at http://texteval-pilot.ets.org/TextEvaluator/.

Text Readability: A Snapshot

SALTeL Journal (Southeast Asia Language Teaching and Learning)

Selecting suitable reading materials are taxing and challenging for many English instructors. Text readability analysis can be used to automate the process of reading material selection and also the assessment of reading ability for language learners. Readability formulas have been broadly used in determining text difficulty based on learners’ grade level. Based on mathematical calculations, a readability formula examines certain features of a text in order to provide best rough approximations as an indication of difficulty. This paper reflects some aspects and issues of readability analysis.