Medical dictionaries for patient encoding systems: a methodology (original) (raw)

Controlled Vocabularies, Indexing and Medical Language Processing. Medical Language Processing: Medical Language Processing for Knowledge Representation …

Proceedings of the …, 1989

Indexing the content of medical reports requires that the data to be organized into appropriate data structures which are capable of recording information at various levels. SNOMED is presented as a data structure suitable for housing medical information and serving as a knowledge base for medical information. Issues related to classification of data and problems related to the computerized processing and automated indexing of natural language are discussed. Automated natural language systems which accept natural language data input, answer questions about a knowledge base, make inferences, and generate natural language responses are complex, and techniques for designing them are in the early stase of development. Much work is being done on different aspects of the systems to develop more complete grammar, lexicons, and knowledge representation schemes to broaden their domain and improve their efficiency and accessibility. Natural language processing by computers requires that language data be organized into appropriate data structures which are capable of recording information at various levels. Levels include the lexical, morphological, syntactic, and contextual content of a statement to be processed. This paper addresses the processing of natural language data in the restricted domain of diasnostic and therapeutic medical statements. In so doing, it avoids many of the problems encountered in free natural language processing and analysis. Furthermore, processing is focused on indexing) which is the most important intermediate objective of such systems. The Systematized Nomenclature of Medicine (SNOMED) is offered as an example of a multidimensional nomenclature and is a classification system for medical nomenclature. The structure of SNOMED is introduced with emphasis on its capabilities with respect to an artificial medical lansuage.

The Unified Medical Language System SPECIALIST Lexicon and Lexical Tools: Development and applications

Journal of the American Medical Informatics Association, 2020

Natural language processing (NLP) plays a vital role in modern medical informatics. It converts narrative text or unstructured data into knowledge by analyzing and extracting concepts. A comprehensive lexical system is the foundation to the success of NLP applications and an essential component at the beginning of the NLP pipeline. The SPECIALIST Lexicon and Lexical Tools, distributed by the National Library of Medicine as one of the Unified Medical Language System Knowledge Sources, provides an underlying resource for many NLP applications. This article reports recent developments of 3 key components in the Lexicon. The core NLP operation of Unified Medical Language System concept mapping is used to illustrate the importance of these developments. Our objective is to provide generic, broad coverage and a robust lexical system for NLP applications. A novel multiword approach and other planned developments are proposed.

Morphological Analysis and Generation of Monolingual and Bilingual Medical Lexicons

Systems and Frameworks for Computational Morphology, 2015

To efficiently extract and manage extremely large quantities of meaningful data in a delicate sector like healthcare requires sophisticated linguistic strategies and computational solutions. In the research described here we approach the semantic dimension of the formative elements of medical words in monolingual and bilingual environments. The purpose is to automatically build Italian-English medical lexical resources by grounding their analysis and generation on the manipulation of their consituent morphemes. This approach has a significant impact on the automatic analysis of neologisms, typical for the medical domain. We created two electronic dictionaries of morphemes and a morphological finite state transducer, which, together, find all possible combinations of prefixes, confixes, and suffixes, and are able to annotate and translate the terms contained in a medical corpus, according to the meaning of the morphemes that compose these words. In order to enable the machine to "understand" also medical multiword expressions, we designed a syntactic grammar net that includes several paths based on different combinations of nouns, adjectives, and prepositions.

Building Croatian Medical Dictionary from Medical Corpus

2019

The overall objective of this project is to define linguistic models at the lexical and syntactic levels that appear in the health domain, depending on the type of corpus. In the first phase of the project, the texts forming the medical corpus A – MedCorA (2,232 pharmaceutical instructions for medicaments available in Croatia) were prepared. The terminology found in this corpus was analyzed and the semantic subdomains (anatomy, condition, microorganism, chemistry, etc.) within the medical domain were defined and added to the dictionary entries. These dictionary resources were used as the foundation for the second phase in which NooJ morphological grammars were built allowing annotation of medical terminology in the corpus. Said grammars were built to allow for recognizing Latinisms, as well as Latin expressions written with Croatian case endings, not only Croatian words. Prepared resources are made available to a broader scientific community via Sketch Engine for further research in...

A Multi-Lingual Architecture for Building a Normalised Conceptual Representation from Medical Language

1995

The overall goal of MENELAS is to provide better access to the information contained in natural language patient discharge summaries (PDSS), through the design and implementation of a prototype able to analyse medical texts. The approach taken by MENELAS is based on the following key principles: (i) to maximise the usefulness of natural language analysis and the usability of its results, the output of natural language analysis must be a normalised conceptual representation of medical information; and (ii) to maximise the reuse of resources, language analysis should be domainindependent and conceptual representation should be language-independent. This paper discusses the results obtained and the issues raised when implementing these principles during the project. INTRODUCTION Medical language processing is now a welldeveloped field of research (see, e.g., ), and a number of prototypes and systems have been built for various languages and purposes (e.g., ). In these systems, addressing a new language generally requires the development of the corresponding linguistic (lexical, morphological, syntactic) knowledge. Starting from the observation that such general linguistic resources are available for more and more languages, we have studied an architecture which can ease their reuse for medical language processing.

Towards a Unified Medical Lexicon for

Medical Informatics has a constant need for basic Medical Language Processing tasks, e.g., for coding into controlled vocabularies, free text indexing and information retrieval. Most of these tasks involve term matching and rely on lexical resources: lists of words with attached information, including inflected forms and derived words, etc. Such resources are publicly available for the English language with the UMLS Specialist Lexicon, but not in other languages. For the French language, several teams have worked on the subject and built local lexical resources. The goal of the present work is to pool and unify these resources and to add extensively to them by exploiting medical terminologies and corpora, resulting in a unified medical lexicon for French (UMLF). This paper exposes the issues raised by such an objective, describes the methods on which the project relies and illustrates them with experimental results.

Word segmentation processing: a way to exponentially extend medical dictionaries

Medinfo. MEDINFO, 1995

One of the most critical problems of automatic natural language processing (NLP) is the size of the medical lexicons. The set of compound medical words and the continual creation of new terms renders medical lexicons exhaustive beyond question. The structure of such dictionaries usually consists of two parts: 1) the morphological and sometimes syntactical information necessary to identify, on a grapheme level, a given word in a sentence, and 2) the part often devoted to conceptual knowledge associated with the recognized word. It is only when these two prerequisites are fulfilled that an attempt to understand the meaning of a whole expression is possible. The approach developed in this paper is a pragmatic way to rapidly increase the lexico-semantic part of medical dictionaries. We developed a semi-automatic tool, as a prototype to demonstrate the feasibility of this approach. This tool is able to translate almost any diagnosis expressed in French into its equivalent in the ICD-9CM ...

Semi-Automated Extension of a Specialized Medical Lexicon for French

Proceedings of the Seventh conference …, 2010

This paper describes the development of a specialized lexical resource for a specialized domain, namely medicine. Based on the observation of a large collection of terms, we highlight the specificities that such a lexicon should take into account, and we show that general resources lack a large part of the words needed to process specialized language. We describe an experiment to feed semi-automatically a medical lexicon and populate it with inflectional information, which increased its coverage of the target vocabulary from 14.1% to 25.7%.