UEM-UC3M: An Ontology-based named entity recognition system for biomedical texts (original) (raw)

Drug name recognition and classification in biomedical texts

Drug Discovery Today, 2008

This article presents a system for drug name recognition and classification in biomedical texts. The system combines information obtained by the Unified Medical Language System (UMLS) MetaMap Transfer (MMTx) program and nomenclature rules recommended by the World Health Organization (WHO) International Nonproprietary Names (INNs) Program to identify and classify pharmaceutical substances. Moreover, the system is able to detect possible candidates for drug names that have not been detected by MMTx program by applying these rules, achieving, in this way, a broader coverage. This work is the first step in a method for automatic detection of drug interactions from biomedical texts, a specific type of adverse drug event of special interest in patient safety.

Chemical named entities recognition: a review on approaches and applications

Journal of Cheminformatics, 2014

The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracted.

Drug name recognition and classification in biomedical texts: A case study outlining approaches underpinning automated systems

Drug discovery today, 2008

This article presents a system for drug name recognition and classification in biomedical texts. The system combines information obtained by the Unified Medical Language System (UMLS) MetaMap Transfer (MMTx) program and nomenclature rules recommended by the World Health Organization (WHO) International Nonproprietary Names (INNs) Program to identify and classify pharmaceutical substances. Moreover, the system is able to detect possible candidates for drug names that have not been detected by MMTx program by applying these rules, achieving, in this way, a broader coverage. This work is the first step in a method for automatic detection of drug interactions from biomedical texts, a specific type of adverse drug event of special interest in patient safety.

PharmaCoNER: Pharmacological Substances, Compounds and proteins Named Entity Recognition track

Proceedings of The 5th Workshop on BioNLP Open Shared Tasks

sults with F-measures above 0.91. These results indicate that there is a real interest in promoting biomedical text mining efforts beyond English. We foresee that the PharmaCoNER annotation guidelines, corpus and participant systems will foster the development of new resources for clinical and biomedical text mining systems of Spanish medical data.

Analysis of biomedical text for chemical names: a comparison of three methods

Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 1999

At the National Library of Medicine (NLM), a variety of biomedical vocabularies are found in data pertinent to its mission. In addition to standard medical terminology, there are specialized vocabularies including that of chemical nomenclature. Normal language tools including the lexically based ones used by the Unified Medical Language System (UMLS) to manipulate and normalize text do not work well on chemical nomenclature. In order to improve NLM's capabilities in chemical text processing, two approaches to the problem of recognizing chemical nomenclature were explored. The first approach was a lexical one and consisted of analyzing text for the presence of a fixed set of chemical segments. The approach was extended with general chemical patterns and also with terms from NLM's indexing vocabulary, MeSH, and the NLM SPECIALIST lexicon. The second approach applied Bayesian classification to n-grams of text via two different methods. The single lexical method and two statisti...

UMCC_DLSI: Semantic and Lexical features for detection and classification Drugs in biomedical texts

In this paper we describe UMCC_DLSI-(DDI) system which attempts to detect and classify drug entities in biomedical texts. We discuss the use of semantic class and words relevant domain, extracted with ISR-WN (Integration of Semantic Resources based on WordNet) resource to obtain our goal. Following this approach our system obtained an F-Measure of 27.5% in the DDIExtraction 2013 (SemEval 2013 task 9).

CheNER: a tool for the identification of chemical entities and their classes in biomedical literature

Journal of cheminformatics, 2015

Small chemical molecules regulate biological processes at the molecular level. Those molecules are often involved in causing or treating pathological states. Automatically identifying such molecules in biomedical text is difficult due to both, the diverse morphology of chemical names and the alternative types of nomenclature that are simultaneously used to describe them. To address these issues, the last BioCreAtIvE challenge proposed a CHEMDNER task, which is a Named Entity Recognition (NER) challenge that aims at labelling different types of chemical names in biomedical text. To address this challenge we tested various approaches to recognizing chemical entities in biomedical documents. These approaches range from linear Conditional Random Fields (CRFs) to a combination of CRFs with regular expression and dictionary matching, followed by a post-processing step to tag those chemical names in a corpus of Medline abstracts. We named our best performing systems CheNER. We evaluate the...

Recognizing Chemical Compounds and Drugs : a Rule-Based Approach Using Semantic Information

2013

This paper presents a system for recognizing chemical compounds and drug names. It is a rule-based system that utilizes semantic information from the ChEBI ontology and the MeSH Metathesaurus. It also integrates the MetaMap tool, the ANNIE PoS tagger, and pharmacological databases such as DrugBank. We used this system for the CHEMDNER task 2013, and an outcome of this work is the development of non existing resources for recognizing chemical entities (e.g. gazetteers and a list of biochemical affixes), which are available for the research community.

1 Data and Text-Mining CheNER : Chemical Named Entity Recognizer

2013

Motivation: Chemical named entity recognition is used to automatically identify mentions to chemical compounds in text, and is the basis for more elaborate information extraction. However, only a small number of applications are freely available to identify such mentions. Particularly challenging and useful is the identification of IUPAC chemical compounds, which due to the complex morphology of IUPAC names requires more advanced techniques than that of brand names. Results: We present CheNER, a tool for automated identification of systematic IUPAC chemical mentions. We evaluated different systems using an established literature corpus to show that CheNER has a superior performance in identifying IUPAC names specifically, and that it makes better use of computational resources. Availability: http://metres.udl.cat/index.php/9-download/4-chener, http://ubio.bioinfo.cnio.es/biotools/CheNER/ Supplementary information: Both web sites above include the user manual for the software. Supple...

A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text

2016

One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text is special and has unique characteristics. In addition, the medical text mining poses more challenges, e.g., more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug. The mining is even more challenging due to the lack of labeled dataset sources and external knowledge, as well as multiple token representations for a single drug name that is more common in the real application setting. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word emb...