Frontiers of biomedical text mining: current progress (original) (raw)

A survey of current work in biomedical text mining

The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5-10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

Highlights of the BioTM 2010 workshop on advances in bio text mining

BMC Bioinformatics, 2010

This meeting report gives an overview of the keynote lectures, the panel discussion and a selection of the contributed presentations. The workshop was held in Gent, Belgium on May 10-11. It featured a tutorial aimed towards a broad audience of (computational) biologists, (computational) linguists and researchers working purely on text mining.IntroductionRecently, the application of text mining (TM) and natural language processing (NLP) techniques to the biological and medical sciences has received increasing interest. In addition to many new workshops and conferences arising in this domain, recently also a number of community-wide tasks were conducted to benchmark text mining techniques on specific challenges (e.g. BioCreative, BioNLP Shared Task, ...)By discussing the latest developments and potentially new applications in text mining amongst scientists in both academia and industry, this workshop aimed to provide a broad view on text mining research in biology and biomedicine. We rea ...

Text-mining approaches in molecular biology and biomedicine

Drug Discovery Today, 2005

Biomedical articles provide functional descriptions of bioentities such as chemical compounds and proteins. To extract relevant information using automatic techniques, text-mining and information-extraction approaches have been developed. These technologies have a key role in integrating biomedical information through analysis of scientific literature. In this article, important applications such as the identification of biologically relevant entities in free text and the construction of literature-based networks of protein-protein interactions will be introduced. Also, the use of text mining to aid the interpretation of microarray data and the analysis of pathology reports will be discussed. Finally, we will consider the recent evolution of this field and the efforts for community-based evaluations.

Biomedical Text Mining Applied To Document Retrieval and Semantic Indexing

2009

In Biomedical research, the ability to retrieve the adequate information from the ever growing literature is an extremely important asset. This work provides an enhanced and general purpose approach to the process of document retrieval that enables the filtering of PubMed query results. The system is based on semantic indexing providing, for each set of retrieved documents, a network that links documents and relevant terms obtained by the annotation of biological entities (e.g. genes or proteins). This network provides distinct user perspectives and allows navigation over documents with similar terms and is also used to assess document relevance. A network learning procedure, based on previous work from e-mail spam filtering, is proposed, receiving as input a training set of manually classified documents.

Note: a workbench for biomedical text mining

Journal of biomedical informatics, 2009

Biomedical Text Mining (BioTM) is providing valuable approaches to the automated curation of scientific literature. However, most efforts have addressed the benchmarking of new algorithms rather than user operational needs. Bridging the gap between BioTM researchers and biologists’ needs is crucial to solve real-world problems and promote further research.We present @Note, a platform for BioTM that aims at the effective translation of the advances between three distinct classes of users: biologists, text miners and software developers. Its main functional contributions are the ability to process abstracts and full-texts; an information retrieval module enabling PubMed search and journal crawling; a pre-processing module with PDF-to-text conversion, tokenisation and stopword removal; a semantic annotation schema; a lexicon-based annotator; a user-friendly annotation view that allows to correct annotations and a Text Mining Module supporting dataset preparation and algorithm evaluation.@Note improves the interoperability, modularity and flexibility when integrating in-home and open-source third-party components. Its component-based architecture allows the rapid development of new applications, emphasizing the principles of transparency and simplicity of use. Although it is still on-going, it has already allowed the development of applications that are currently being used.

A Survey of State of the Art Biomedical Text Mining Techniques for Semantic Analysis

2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (sutc 2008), 2008

In recent years, a range of text-mining applications have been developed to improve access to knowledge for biologists and database curators. This paper surveys text-mining works published from 2006 to 2008, with the emphasis on named entity recognition, biological relation extraction and currently available online biological text mining services. ABNER [46] Protein/Gene/DNA/RNA/cell tagger http://pages.cs.wisc.edu/\~bsettles/abner/ AIIAGMT [22] Gene and protein name tagger http://140.109.23.113/AIIAGMT/index.html AliasServer [23] Protein alias handler http://cbi.labri.fr/outils/alias/index.php BANNER [33] Gene and protein name tagger http://banner.sourceforge.net/ BioCaster [13] Health protection roles tagger http://biocaster.nii.ac.jp/ BCMS Gene and protein name tagger http://bcms.bioinfo.cnio.es GAPSCORE [6] Protein name tagger

What the papers say: text mining for genomics and systems biology

Human genomics, 2010

Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining -the automated extraction of information from (electronically) published sources -could potentially fulfil an important role -but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare diseasecausing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward.

A framework for the development of Biomedical Text Mining software tools

2008 8th IEEE International Conference on BioInformatics and BioEngineering, 2008

Over the last few years, a growing number of techniques has been successfully proposed to tackle diverse challenges in the Biomedical Text Mining (BioTM) arena. However, the set of available software tools to researchers has not grown in a similar way. This work makes a contribution to close this gap, proposing a framework to ease the development of user-friendly and interoperable applications in this field, based on a set of available modular components. These modules can be connected in diverse ways to create applications that fit distinct user roles. Also, developers of new algorithms have a framework that allows them to easily integrate their implementations with state-of-the-art BioTM software for related tasks.