Adding a medical lexicon to an English Parser (original) (raw)

Determining the syntactic structure of medical terms in clinical notes

Proceedings of the Workshop on BioNLP 2007 Biological, Translational, and Clinical Language Processing - BioNLP '07, 2007

This paper demonstrates a method for determining the syntactic structure of medical terms. We use a model-fitting method based on the Log Likelihood Ratio to classify three-word medical terms as right or left-branching. We validate this method by computing the agreement between the classification produced by the method and manually annotated classifications. The results show an agreement of 75% -83%. This method may be used effectively to enable a wide range of applications that depend on the semantic interpretation of medical terms including automatic mapping of terms to standardized vocabularies and induction of terminologies from unstructured medical text.

Automatic Mapping Clinical Notes to Medical Terminologies

2000

Automatic mapping of key concepts from clinical notes to a terminology is an im- portant task to achieve for extraction of the clinical information locked in clinical notes and patient reports. The present pa- per describes a system that automatically maps free text into a medical reference terminology. The algorithm utilises Natu- ral Language Processing (NLP) tech- niques to enhance

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications

Journal of the American Medical Informatics Association, 2020

Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.

From Linguistic Resources to Medical Entity Recognition: a Supervised Morpho-syntactic Approach

2015

Due to the importance of the information it conveys, Medical Entity Recognition is one of the most investigated tasks in Natural Language Processing. Many researches have been aiming at solving the issue of Text Extraction, also in order to develop Decision Support Systems in the field of Health Care. In this paper, we propose a Lexicon-grammar method for the automatic extraction from raw texts of the semantic information referring to medical entities and, furthermore, for the identification of the semantic categories that describe the located entities. Our work is grounded on an electronic dictionary of neoclassical formative elements of the medical domain, an electronic dictionary of nouns indicating drugs, body parts and internal body parts and a grammar network composed of morphological and syntactical rules in the form of Finite-State Automata. The outcome of our research is an Extensible Markup Language (XML) annotated corpus of medical reports with information pertaining to t...

Semi-Automated Extension of a Specialized Medical Lexicon for French

Proceedings of the Seventh conference …, 2010

This paper describes the development of a specialized lexical resource for a specialized domain, namely medicine. Based on the observation of a large collection of terms, we highlight the specificities that such a lexicon should take into account, and we show that general resources lack a large part of the words needed to process specialized language. We describe an experiment to feed semi-automatically a medical lexicon and populate it with inflectional information, which increased its coverage of the target vocabulary from 14.1% to 25.7%.

From syntactic-semantic tagging to knowledge discovery in medical texts

International Journal of Medical Informatics, 1998

In the GALEN project, the syntactic-semantic tagger MultiTALE is upgraded to extract knowledge from natural language surgical procedure expressions. In this paper, we describe the methodology applied and show that out of a randomly selected sample of such expressions coming from the procedure axis of Snomed International, 81% could be analysed correctly. The problems encountered fall in three different categories: unusual grammatical configurations within the Snomed terms, insufficient domain knowledge and different categorisation of concepts and semantic links in the domain and linguistic models used. It is concluded that the Multi-TALE system can be used to attach meaning to words that not have been encountered previously, but that an interface ontology mediating between domain models and linguistic models is needed to arrive at a higher level of independence from both particular languages and from particular domains.

A system for industrial-strength linguistic parsing of medical documents

This paper describes SPMED, a system for robust and accurate linguistic parsing of medical documents which is used in several industrial products. The basic design criterion of the system is of providing a set of basic powerful, robust, and generic linguistic knowledge sources and modules which can easily customized for processing different tasks in a flexible manner. The main application is seen in linguistic analysis of medical documents, yet the technology is easily applicable to other domains

Applying semantic-based probabilistic context-free grammar to medical language processing – A preliminary study on parsing medication sentences

Journal of Biomedical Informatics, 2011

Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4,564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser.

Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches

BMC Bioinformatics, 2006

We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches. In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error. When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license.