Natural Language Processing (NLP) in Qualitative Public Health Research: A Proof of Concept Study (original) (raw)

Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study (Preprint)

2017

BACKGROUND Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. OBJECTIVE The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented ap...

Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study

Journal of Medical Internet Research

Background: Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. Objective: The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. Methods: We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. Results: The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative-or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions.

Challenges and opportunities for public health made possible by advances in natural language processing

Canada communicable disease report = Releve des maladies transmissibles au Canada, 2020

Natural language processing (NLP) is a subfield of artificial intelligence devoted to understanding and generation of language. The recent advances in NLP technologies are enabling rapid analysis of vast amounts of text, thereby creating opportunities for health research and evidence-informed decision making. The analysis and data extraction from scientific literature, technical reports, health records, social media, surveys, registries and other documents can support core public health functions including the enhancement of existing surveillance systems (e.g. through faster identification of diseases and risk factors/at-risk populations), disease prevention strategies (e.g. through more efficient evaluation of the safety and effectiveness of interventions) and health promotion efforts (e.g. by providing the ability to obtain expert-level answers to any health related question). NLP is emerging as an important tool that can assist public health authorities in decreasing the burden o...

A Framework for Applying Natural Language Processing in Digital Health Interventions

Journal of Medical Internet Research, 2020

BackgroundDigital health interventions (DHIs) are poised to reduce target symptoms in a scalable, affordable, and empirically supported way. DHIs that involve coaching or clinical support often collect text data from 2 sources: (1) open correspondence between users and the trained practitioners supporting them through a messaging system and (2) text data recorded during the intervention by users, such as diary entries. Natural language processing (NLP) offers methods for analyzing text, augmenting the understanding of intervention effects, and informing therapeutic decision making.ObjectiveThis study aimed to present a technical framework that supports the automated analysis of both types of text data often present in DHIs. This framework generates text features and helps to build statistical models to predict target variables, including user engagement, symptom change, and therapeutic outcomes.MethodsWe first discussed various NLP techniques and demonstrated how they are implemente...

Discovering Social Determinants of Health from Case Reports using Natural Language Processing: Algorithmic Development and Validation

BackgroundSocial determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information.ObjectiveThe objective of this research is to advance the automatic extraction of SDOH from clinical texts.Setting and DataThe case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation.MethodsAn NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods.ResultsThe proposed NER implementa...

Applying machine-learning to rapidly analyse large qualitative text datasets to inform the COVID-19 pandemic response: Comparing human and machine-assisted topic analysis techniques

Background: Machine-assisted topic analysis (MATA) uses artificial intelligence methods to assist qualitative researchers to analyse large amounts of textual data. This could allow qualitative researchers to inform and update public health interventions 'in real-time', to ensure they remain acceptable and effective during rapidly changing contexts (such as a pandemic). Objective: We aimed to understand the potential for such approaches to support intervention implementation, by directly comparing MATA and 'human-only' thematic analysis techniques when applied to the same dataset (1472 free-text responses from users of the COVID-19 infection control intervention 'Germ Defence'). Methods: In MATA, the analysis process included an unsupervised topic modelling approach to identify latent topics in the text. The human research team then described the topics and identified broad themes. In human-only codebook analysis, an initial codebook was developed by an experi...

Evaluation of a natural language processing tool for extracting gender, weight, ethnicity, and race in the US food and drug administration adverse event reporting system

Frontiers in Drug Safety and Regulation

The US Food and Drug Administration Adverse Event Reporting System (FAERS) contains over 24 million individual case safety reports (ICSRs). In this research project, we evaluated a natural language processing (NLP) tool’s ability to extract four demographic variables (gender, weight, ethnicity, race) from ICSR narratives. Specificity of the NLP algorithm was over 94% for all demographics, while sensitivity varied between the demographics: 98.6% (gender), 45.5% (weight), 100% (ethnicity), and 85.3% (race). Among ICSRs missing weight, ethnicity, and race in the structured field, few cases had this information in the narrative (>95% missing); consequently, the positive predictive value (PPV) for these three demographics had wide 95% confidence intervals. After NLP implementation, the total number of ICSRs missing gender was reduced by 33% (i.e., NLP identified 472 thousand reports having a gender value in the narrative that was not in the structured field), while the total number of...

Reliability of Qualitative Data Using Text Analysis - A Queensland Health Case Study

2014 3rd International Conference on Eco-friendly Computing and Communication Systems, 2014

This paper reports how reliability can be assured in qualitative data using text analytics principles. The paper demonstrates this using a cohort analytics process that used text mining on data collected from 64 interviews conducted in Queensland Health wards. While the focus of the interviews was on implementing a technology, the text analysis was conducted to assure that the themes were exactly the focus of the exploration. Further, the analytics helped to represent a visual view of the data, to imply the reliability of themes. We conducted the analytics to provide additional reliability than standard saturation normally employed in qualitative data analysis.

Practical Considerations for Developing Clinical Natural Language Processing Systems for Population Health Management and Measurement

JMIR Medical Informatics

Experts have noted a concerning gap between clinical natural language processing (NLP) research and real-world applications, such as clinical decision support. To help address this gap, in this viewpoint, we enumerate a set of practical considerations for developing an NLP system to support real-world clinical needs and improve health outcomes. They include determining (1) the readiness of the data and compute resources for NLP, (2) the organizational incentives to use and maintain the NLP systems, and (3) the feasibility of implementation and continued monitoring. These considerations are intended to benefit the design of future clinical NLP projects and can be applied across a variety of settings, including large health systems or smaller clinical practices that have adopted electronic medical records in the United States and globally.

Coding and Categorization of Qualitative Data in the Area of Health

Zenodo (CERN European Organization for Nuclear Research), 2023

The objective of this study was to describe the procedures for coding and categorizing qualitative research data applied in the health area. It describes the steps for the integration of processes to construct categories, the criteria for selection and exclusion of categories, the transformation of qualitative data into quantitative, typification, and the adaptability of codes based on the data. Conclusion: qualitative research in health sciences requires the processes of categorization and coding during the analysis and interpretation of the theoretical content to increase the reliability of the results.