Assessing the readability of clinical documents in a document engineering environment (original) (raw)
Related papers
Automatic Assessment of Health Information Readability
2019
The current digital age increases the dissemination of information to a large number of people, with health being one of the most popular topics on the web. Health information can contain words with specific terminology that negatively impact the readability of medical content. This problem gets worse when a reader has low health literacy. Envisioning a search system that can personalize the readability of medical content to the characteristics of its users, in this dissertation, we built machine learning models to assess the readability of health content in the Portuguese and English languages. As a first step, we evaluated the readability of topics on the web using traditional readability measures. We found that the topic of health is one of the least readable topics according to the metrics used. We also analysed the linguistic differences between the English and Portuguese languages, finding that, in general, Portuguese words have a greater number of syllables. With this, we pro...
Readability Evaluation for Ukrainian Medicine Corpus(UKRMED)
2021
In our work, we decided to demonstrate how to work different readability formulas on our Ukrainian-language corpus (UKRMED) of medical texts. UKRMED contains three types of texts in the medical domain divided by their complexity: “Complex texts”, “Moderate texts”, and “Simple texts”. This research aims to (1) demonstrate the use of the most commonly used readability formulas on written health information in Ukrainian, (2) compare and contrast these different formulas to various texts (simple, complex, and moderate), (3) research different medical text features which will be used for text simplification and classification medical texts and (4) prepare recommendations for using these formulas to the evaluation of readability medical texts in Ukrainian.
An environment for document engineering of clinical guidelines
AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2005
In this paper, we present the G-DEE system, a document engineering environment aimed at clinical guidelines. This system represents an extension of current visual interfaces for guidelines encoding, in that it supports automatic text processing functions which identify linguistic markers of document structure, such as recommendations, thereby decreasing the complexity of operations required by the user. Such markers are identified by shallow parsing of free text and are automatically marked up as an early step of document structuring. From this first representation, it is possible to identify elements of guidelines contents, such as decision variables, and produce elements of GEM encoding, using rules defined as XSL style sheets. We tested our automatic structuring system on a set of sentences extracted from French clinical guidelines. As a result, 97% of the occurrences of deontic operators and their scopes were correctly marked up. G-DEE can be used for various purposes, from rese...
Populating a framework for Readability Analysis
2009
This paper discusses a computational approach to readability that is expected to lead eventually towards a new and configurable metric for text readability. Our research involves the elaboration, implementation and evaluation of an 8-part framework that requires consideration of both textual and cognitive factors such as language, vocabulary, background knowledge, motivation and cognitive load. We will discuss our work to date that has examined the limitations of current measures of readability and made consideration for how the wider textual and cognitive phenomena may be accounted for. This includes techniques for statistical and linguistic approaches to terminology extraction, approaches to lexical and grammatical simplification and considerations of plain language. Our interest in readability is motivated towards making semantic content more readily available and, as a consequence, improving quality of documents, There are already indications that this work will contribute to a British Standard for readability.
Health information text characteristics
PubMed, 2006
Millions of people search online for medical text, but these texts are often too complicated to understand. Readability evaluations are mostly based on surface metrics such as character or words counts and sentence syntax, but content is ignored. We compared four types of documents, easy and difficult WebMD documents, patient blogs, and patient educational material, for surface and content-based metrics. The documents differed significantly in reading grade levels and vocabulary used. WebMD pages with high readability also used terminology that was more consumer-friendly. Moreover, difficult documents are harder to understand due to their grammar and word choice and because they discuss more difficult topics. This indicates that we can simplify many documents by focusing on word choice in addition to sentence structure, however, for difficult documents this may be insufficient.
Readability of Arabic Medicine Information Leaflets: A Machine Learning Approach
Procedia Computer Science, 2016
This paper presents a project that explores the possibility of assessing the readability level of Arabic medicine information leaflets using machine learning techniques. There are a number of popular readability formulas and tools that have been successfully used to assess the readability of health-related information in several languages. However, there is limited work on the readability assessment of health-related information, specifically medicine information leaflets in Arabic. We describe the design of a tool that uses machine learning to assess the readability of medicine information leaflets. We utilize a corpus comprising 1112 medicine information leaflets annotated with three difficulty levels. Based on a study of existing literature, we selected a number of features influencing text difficulty. The tool will help specialized organizations in medicine information leaflets production to produce the leaflets at appropriate level of reading for the majority of leaflets consumers.
A document engineering environment for clinical guidelines
Proceedings of the 2007 ACM symposium on Document engineering - DocEng '07, 2007
In this paper, we present a document engineering environment for Clinical Guidelines (G-DEE), which are standardized medical documents developed to improve the quality of medical care. The computerization of Clinical Guidelines has attracted much interest in recent years, as it could support the knowledge-based process through which they are produced. Early work on guideline computerization has been based on document engineering techniques using mark-up languages to produce structured documents. We propose to extend the document-based approach by introducing some degree of automatic content processing, dedicated to the recognition of linguistic markers, signaling recommendations through the use of "deontic operators". Such operators are identified by shallow parsing using Finite-State Transition Networks, and are further used to automatically generate mark-up structuring the documents. We also show that several guidelines manipulation tasks can be formalized as XSLbased transformations of the original marked-up document. The automatic processing component, which underlies the marking-up process, has been evaluated using two complete clinical guidelines (corresponding to over 300 recommendations). As a result, precision of marker identification varied between 88 and 98% and recall between 81 and 99%.
Coherence and Cohesion for the Assessment of Text Readability
Text readability depends on a variety of variables. While lexico-semantic and syntactic factors have been widely used in the literature, more high-level discursive and cognitive properties such as cohesion and coherence have received little attention. This paper assesses the efficiency of 41 measures of text cohesion and text coherence as predictors of text readability. We compare results manually obtained on two corpora including texts with different difficulty levels and show that some cohesive features are indeed useful predictors.
Journal of Research Design and Statistics in Linguistics and Communication Science, 2016
This article presents some findings which deal with text readability, obtained in a research project sponsored by the Spanish Ministerio de Economía y Competitividad . The main objective of the project was to improve the quality of written texts used to convey information to oncological patients in hospitals in Spain. Among other measurement instruments, it was proposed to use some readability index which allowed to detect the quality of the original texts considered (written in Spanish), and which additionally enabled the evaluation of the improvement in readability achieved as a consequence of the research. Literature review on readability indices, for the case of Spanish language, indicated three possible candidates. Statistical analysis guided the selection and validation processes carried out for the indices in the case of patient information leaflets addressed to oncological patients in two Spanish hospitals
Towards an improved methodology for automated readability prediction
2010
Abstract Readability formulas are often employed to automatically predict the readability of an unseen text. In this article, the formulas and the text characteristics they are composed of are evaluated in the context of large corpora. We describe the behaviour of the formulas and the text characteristics by means of correlation matrices, principal component analysis and a collinearity test. We show methodological shortcomings to some of the existing readability formulas.