Facilitating biomedical systematic reviews using text classification and ranked retrieval (original) (raw)

inf.ed.ac.uk

The past decade has seen an exponential growth in the amount of information published on the web. This increase in the amount of data published is directly proportional to the steady growth in the number of data publishing sources. As a result one of the oldest problems in Information Retrieval(IR) the "Vocabulary Mismatch problem" has taken center stage. Over the years a lot of techniques have been proposed to tackle the Vocabulary Mismatch problem faced in various domains. In this work we explore the utility of the different types of latent semantic models for retrieval purposes in the medical domain. Our work focuses on the process of Systematic Reviews 2.3 . Systematic reviews is a high-recall oriented process i.e recall is of higher precedence as compared to precision for this system. An ideal retrieval scenario for Systematic Reviews is the retrieval of all relevant articles at the top of the rankings. The following work is a comparative study of three semantic techniques namely Latent Dirichlet Allocation,

Text Categorization Models for Retrieval of High Quality Articles in Internal Medicine

2000

The discipline of Evidence Based Medicine (EBM) studies formal and quasi-formal methods for identifying high quality medical information and abstracting it in useful forms so that patients receive the best customized care possible [1]. Current computer-based methods for finding high quality information in PubMed and similar bibliographic resources utilize search tools that employ preconstructed Boolean queries. These clinical queries are derived from a combined application of (a) user interviews, (b) ad-hoc manual document quality review, and (c) search over a constrained space of disjunctive Boolean queries. The present research explores the use of powerful text categorization (machine learning) methods to identify content-specific and high-quality PubMed articles. Our results show that models built with the proposed approach outperform the Boolean based PubMed clinical query filters in discriminatory power.

Benchmarking Fully Automated Scholarly Search for Biomedical Systematic Literature Reviews

IEEE access, 2024

Biomedical Systematic Literature Reviews (SLRs) play a fundamental role in evidenceinformed healthcare and can serve as actionable insights for researchers and policy-making organizations in the field. In this paper, we focus on the phase of 'study search' in conducting SLRs, i.e., the process of organizing a comprehensive search via biomedical databases, such as PubMed, in order to obtain all the relevant articles on a certain topic of interest. We introduce FASS-BSLR, a dataset and a benchmark suite to facilitate the development and evaluation of fully automated techniques for study search. We also provide and analyze a set of basic methods along with a number of generative models and report the experiment's results over the introduced dataset. We introduce a simple but effective model based on the recent transformer-based generative model, ChatGPT, for generating Boolean queries over PubMed. Through different experiments, we illustrate that this model is more effective than basic search models, keyword search over PubMed, and existing methods for crafting Boolean queries using ChatGPT. We show that the introduced model is even more effective than manual queries in terms of Precision, Recall, NDCG, and MAP at positions 10 and 100, but falls short of the recall that manual queries achieve at position 1000. We also report the retrieval performance of different models when a number of relevant articles have been provided as seed documents. We demonstrate that, when three documents are used as seed articles, the introduced model outperforms manual queries in all metrics except Recall@1000, on which its performance is comparable to the performance attained by manual queries.

Extending PubMed Searches to ClinicalTrials.gov Through a Machine Learning Approach for Systematic Reviews

Journal of clinical epidemiology, 2018

Despite their essential role in collecting and organizing published medical literature, indexed search engines are unable to cover all relevant knowledge. Hence, current literature recommends the inclusion of clinical trial registries in systematic reviews. This study aims to provide an automated approach to extend a search on PubMed to the ClinicalTrials.gov database, relying on text mining and machine learning techniques. The procedure starts from a literature search on PubMed. Next, it considers the training of a classifier that can identify documents with a comparable word characterization in the ClinicalTrials.gov clinical trial repository. Fourteen systematic reviews, covering a broad range of health conditions, are used as case studies for external validation. A cross-validated support-vector machine model was used as the classifier. The sensitivity was 100% in all systematic reviews except one (87.5%), and the specificity ranged from 97.2 to 99.9%. The ability of the instrum...

Building Systematic Reviews Using Automatic Text Classification Techniques

International Conference on Computational Linguistics, 2010

The amount of information in medical publications continues to increase at a tremendous rate. Systematic reviews help to process this growing body of information. They are fundamental tools for evidence-based medicine. In this paper, we show that automatic text classification can be useful in building systematic reviews for medical topics to speed up the reviewing process. We propose a per-question classification method that uses an ensemble of classifiers that exploit the particular protocol of a systematic review. We also show that when integrating the classifier in the human workflow of building a review the per-question method is superior to the global method. We test several evaluation measures on a real dataset.

Boolean versus ranked querying for biomedical systematic reviews

2010

Background The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user (s) the requirement of examining every returned result.

Text Categorization Models for High-Quality Article Retrieval in Internal Medicine

Journal of the American Medical Informatics Association, 2004

A b s t r a c t Objective: Finding the best scientific evidence that applies to a patient problem is becoming exceedingly difficult due to the exponential growth of medical publications. The objective of this study was to apply machine learning techniques to automatically identify high-quality, content-specific articles for one time period in internal medicine and compare their performance with previous Boolean-based PubMed clinical query filters of Haynes et al.

The Impact of Query Refinement on Systematic Review Literature Search

Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval

The creation of high-quality medical systematic reviews requires the development of a complex Boolean query to retrieve medical literature. An effective query in this context is critical, as it determines how many documents are to be assessed for inclusion in the resulting systematic review, as all retrieved documents must be screened. Therefore an effective query must balance a reasonable assessment workload with an estimate for how many relevant documents exist for a given topic. Getting this balance correct is naturally a difficult challenge, and there is a certain level of intuition involved in how a query should be formulated and refined. This paper reveals such intuitions and behaviours by analysing the query logs of a specialised tool developed to assist expert searchers in refining complex Boolean queries. These query logs contain unique information that permits a deeper understanding of user behaviour than previous studies. The approximately 6,000 queries collected over one year are available for further analysis at https://github.com/ielab/searchrefiner-logs-collection. CCS CONCEPTS • Information systems → Query log analysis.

Facilitating biomedical systematic reviews using text classification and ranked retrieval (original) (raw)

Related papers