Tanvi Banerjee - Profile on Academia.edu (original) (raw)

Papers by Tanvi Banerjee

2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Heart failure occurs when the heart is not able to pump blood and oxygen to support other organs ... more Heart failure occurs when the heart is not able to pump blood and oxygen to support other organs in the body as it should. Treatments include medications and sometimes hospitalization. Patients with heart failure can have both cardiovascular as well as non-cardiovascular comorbidities. Clinical notes of patients with heart failure can be analyzed to gain insight into the topics discussed in these notes and the major comorbidities in these patients. In this regard, we apply machine learning techniques, such as topic modeling, to identify the major themes found in the clinical notes specific to the procedures performed on 1,200 patients admitted for heart failure at the University of Illinois Hospital and Health Sciences System (UI Health). Topic modeling revealed five hidden themes in these clinical notes, including one related to heart disease comorbidities.

Cornell University - arXiv, Mar 29, 2022

As part of the large number of scientific articles being published every year, the publication ra... more As part of the large number of scientific articles being published every year, the publication rate of biomedical literature has been increasing. Consequently, there has been considerable effort to harness and summarize the massive amount of biomedical research articles. While transformer-based encoderdecoder models in a vanilla source document-to-summary setting have been extensively studied for abstractive summarization in different domains, their major limitations continue to be entity hallucination (a phenomenon where generated summaries constitute entities not related to or present in source article(s)) and factual inconsistency. This problem is exacerbated in a biomedical setting where named entities and their semantics (which can be captured through a knowledge base) constitute the essence of an article. The use of named entities and facts mined from background knowledge bases pertaining to the named entities to guide abstractive summarization has not been studied in biomedical article summarization literature. In this paper, we propose an entity-driven fact-aware framework for training end-to-end transformer-based encoder-decoder models for abstractive summarization of biomedical articles. We call the proposed approach, whose building block is a transformerbased model, EFAS, Entity-driven Fact-aware Abstractive Summarization. We conduct a set of experiments using five state-ofthe-art transformer-based encoder-decoder models (two of which are specifically designed for long document summarization) and demonstrate that injecting knowledge into the training/inference phase of these models enables the models to achieve significantly better performance than the standard source document-tosummary setting in terms of entity-level factual accuracy, N-gram novelty, and semantic equivalence while performing comparably on ROUGE metrics. The proposed approach is evaluated on ICD-11-Summ-1000, a dataset we build for abstractive summarization of biomedical literature, and PubMed-50k, a segment of a large-scale benchmark dataset for abstractive summarization of biomedical literature.

2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2020

Sickle Cell Disease (SCD) is a hereditary disorder of red blood cells in humans. Complications su... more Sickle Cell Disease (SCD) is a hereditary disorder of red blood cells in humans. Complications such as pain, stroke, and organ failure occur in SCD as malformed, sickled red blood cells passing through small blood vessels get trapped. Particularly, acute pain is known to be the primary symptom of SCD. The insidious and subjective nature of SCD pain leads to challenges in pain assessment among Medical Practitioners (MPs). Thus, accurate identification of markers of pain in patients with SCD is crucial for pain management. Classifying clinical notes of patients with SCD based on their pain level enables MPs to give appropriate treatment. We propose a binary classification model to predict pain relevance of clinical notes and a multiclass classification model to predict pain level. While our four binary machine learning (ML) classifiers are comparable in their performance, Decision Trees had the best performance for the multiclass classification task achieving 0.70 in F-measure. Our results show the potential clinical text analysis and machine learning offer to pain management in sickle cell patients.

Amit Sheth, Hong Yung Yip, Utkarshani Jaimini, Dipesh Kadariya, Vaikunth Sridharan, Revathy Venka... more Amit Sheth, Hong Yung Yip, Utkarshani Jaimini, Dipesh Kadariya, Vaikunth Sridharan, Revathy Venkataramanan, Tanvi Banerjee, Krishnaprasad Thirunarayam, Maninder Kalra Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton, OH, USA Dayton Children’s Hospital, Dayton, OH {amit, joey, utkarshani, dipesh, vaikunth, revathy, tanvi, tkprasad}@knoesis.org KalraM@childrensdayton.org

Sleep, 2018

Introduction: Sleep disorders are common in children with asthma and have been implicated in poor... more Introduction: Sleep disorders are common in children with asthma and have been implicated in poor asthma control. Smart wearables such as the Fitbit wristband allow monitoring patients sleep duration and quality in their natural surroundings. However, the utility and efficacy of using wearable devices to monitor sleep quality and sleep duration in pediatric patients with asthma has not been established. Methods: Children, ages 5 yrs. to 18 yrs., participating in kHealth Asthma research study at Dayton Children's Hospital were included. The kHealth kit includes an android tablet with a mobile health application that asks contextually relevant questions as well as collects personalized data from bluetooth connected Fitbit Charge 2, Peak flow meter, and Foobot, an indoor air quality monitor. Fitbit software calculated time in bed, sleep time and time in REM sleep, light sleep and deep sleep was downloaded on all subjects. Sleep efficiency was calculated as sleep time/time in bed and proportion of time in each sleep stage was calculated. Results: Preliminary data from 14 children was analyzed. The average time in bed was 534 ± 77 minutes and average sleep time was 476 ± 70 minutes. The average time in REM sleep was 88 ± 25 minutes and average time in light sleep was 287 ± 37 minutes and in deep sleep was 92.5 ± 27.8 minutes. Sleep efficiency was 89%. Proportion of sleep time spent in REM sleep was 20 % and in light sleep was 60.5% and in deep sleep was 18.5 %. These correlate well with polysomnographic based normative data in children. Conclusion: Our findings support the potential of using wrist-worn devices to continuously monitor sleep duration and quality in children with asthma. Future work seeks to evaluate the effect of sleep on asthma outcomes in children.

How do high school students' genetics progression networks change due to genetics instruction and how do they stabilize years after instruction?

Journal of Research in Science Teaching, 2022

While there has been recent progress in abstractive summarization as applied to different domains... more While there has been recent progress in abstractive summarization as applied to different domains including news articles, scientific articles, and blog posts, the application of these techniques to clinical text summarization has been limited. This is primarily due to the lack of large-scale training data and the messy/unstructured nature of clinical notes as opposed to other domains where massive training data come in structured or semi-structured form. Further, one of the least explored and critical components of clinical text summarization is factual accuracy of clinical summaries. This is specifically crucial in the healthcare domain, cardiology in particular, where an accurate summary generation that preserves the facts in the source notes is critical to the well-being of a patient. In this study, we propose a framework for improving the factual accuracy of abstractive summarization of clinical text using knowledge-guided multiobjective optimization. We propose to jointly optimize three cost functions in our proposed architecture during training: generative loss, entity loss and knowledge loss and evaluate the proposed architecture on 1) clinical notes of patients with heart failure (HF), which we collect for this study; and 2) two benchmark datasets, Indiana University Chest X-ray collection (IU X-Ray), and MIMIC-CXR, that are publicly available. We experiment with three transformer encoder-decoder architectures and demonstrate that optimizing different loss functions leads to improved performance in terms of entity-level factual accuracy.

Caregiving for an individual with dementia-related illness could result in caregiver health conce... more Caregiving for an individual with dementia-related illness could result in caregiver health concerns, such as depression, stress, and sleep disturbance. A daily use caregiver sleep survey (DUCSS) was developed using a mixed-method design to assist in evaluating caregiver sleep difficulty, which can contribute to caregiver burden. A focus group

Background: Sickle cell disease (SCD) is the most common inherited blood disorder affecting milli... more Background: Sickle cell disease (SCD) is the most common inherited blood disorder affecting millions of people worldwide. Most patients with SCD experience repeated, unpredictable episodes of severe pain. These pain episodes are the leading cause of emergency department visits among patients with SCD and may last for several weeks. Arguably, the most challenging aspect of treating pain episodes in SCD is assessing and interpreting a patient's pain intensity level. Objective: This study aims to learn deep feature representations of subjective pain trajectories using objective physiological signals collected from electronic health records. Methods: This study used electronic health record data collected from 496 Duke University Medical Center participants over 5 consecutive years. Each record contained measures for 6 vital signs and the patient's self-reported pain score, with an ordinal range from 0 (no pain) to 10 (severe and unbearable pain). We also extracted 3 features related to medication: medication type, medication status (given or applied, or missed or removed or due), and total medication dosage (mg/mL). We used variational autoencoders for representation learning and designed machine learning classification algorithms to build pain prediction models. We evaluated our results using an accuracy and confusion matrix and visualized the qualitative data representations. Results: We designed a classification model using raw data and deep representational learning to predict subjective pain scores with average accuracies of 82.8%, 70.6%, 49.3%, and 47.4% for 2-point, 4-point, 6-point, and 11-point pain ratings, respectively. We observed that random forest classification models trained on deep represented features outperformed models trained on unrepresented data for all pain rating scales. We observed that at varying Likert scales, our models performed better when provided with medication data along with vital signs data. We visualized the data representations to understand the underlying latent representations, indicating neighboring representations for similar pain scores with a higher resolution of pain ratings. Conclusions: Our results demonstrate that medication information (the type of medication, total medication dosage, and whether the medication was given or missed) can significantly improve subjective pain prediction modeling compared with modeling with only vital signs. This study shows promise in data-driven estimated pain scores that will help clinicians with additional information about the patient's condition, in addition to the patient's self-reported pain scores.

The global outbreak of coronavirus disease (COVID-19) has infected millions of people worldwide. ... more The global outbreak of coronavirus disease (COVID-19) has infected millions of people worldwide. Vaccination against this disease is one of the most effective methods to curb its spread. Numerous vaccines have been developed and distributed worldwide to control the spread of COVID-19. Social media platforms like Twitter have been used by people to discuss their opinion about these vaccines. In this study, we identified the themes being discussed on Twitter about the COVID-19 vaccine and their evolution every month during the period of the initial rollout of the vaccines from December 1, 2020, to February 28, 2021. Additionally, we studied how these themes evolved temporally with the major events on the COVID-19 vaccine timeline. We found five topics were sufficient to identify the themes in our corpus and these topics evolved every month with the events related to the COVID-19 vaccine timeline.

Detailed flow distributions in vascular systems are the key to identifying hemodynamic risk facto... more Detailed flow distributions in vascular systems are the key to identifying hemodynamic risk factors for the development and progression of vascular diseases. Although computational fluid dynamics (CFD) has been widely used in bioengineering research on hemodynamics predictions, not only are high-fidelity CFD simulations time-consuming and computing-expensive, but also not friendly to clinical applications due to comprehensive numerical calculations. Machine learning (ML) algorithums to estimate the flow field in vascular systems based on the angiographic images of the blood flow using existed diagnostic tools are emerging as a new pathway to facilitate the mapping of hemodynamics. In present work, the dye injection in a water flow was simulated as an analogy of the contrast perfusion in blood flow using CFD. In the simulation, the light passes through the flow field and generates projective images, as an analogy of X-ray imaging. The simulations provide both the ground truth velocity field and the projective images of the flow with dye patterns. A rough velocity field was estimated using the optical flow method (OFM) based on projective images. ML algorithums are then trained using the ground truth CFD data and the OFM velocity estimation as the input. Finally, the interpretable (logistic regression) and deep (neural networks, convolutional neural networks, long short term memory) machine learning models are validated by using parallel in vitro experiments on the same flow setup. The validation results showed that employed ML model significantly reduced the error rate from 53.5% to 2.5% in average for the vvelocity estimation.

2020 IEEE International Conference on Big Data (Big Data)

Recent advances in natural language processing have enabled automation of a wide range of tasks, ... more Recent advances in natural language processing have enabled automation of a wide range of tasks, including machine translation, named entity recognition, and sentiment analysis. Automated summarization of documents, or groups of documents, however, has remained elusive, with many efforts limited to extraction of keywords, key phrases, or key sentences. Accurate abstractive summarization has yet to be achieved due to the inherent difficulty of the problem, and limited availability of training data. In this paper, we propose a topic-centric unsupervised multi-document summarization framework to generate extractive and abstractive summaries for groups of scientific articles across 20 Fields of Study (FoS) in Microsoft Academic Graph (MAG) and news articles from DUC-2004 Task 2. The proposed algorithm generates an abstractive summary by developing salient language unit selection and text generation techniques. Our approach matches the state-of-the-art when evaluated on automated extractive evaluation metrics and performs better for abstractive summarization on five human evaluation metrics (entailment, coherence, conciseness, readability, and grammar). We achieve a kappa score of 0.68 between two co-author linguists who evaluated our results. We plan to publicly share MAG-20, a human-validated gold standard dataset of topic-clustered research articles and their summaries to promote research in abstractive summarization.

2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Zika virus has caught the worlds attention, and has led people to share their opinions and concer... more Zika virus has caught the worlds attention, and has led people to share their opinions and concerns on social media like Twitter. Using text-based features, extracted with the help of Parts of Speech (POS) taggers and N-gram, a classifier was built to detect Zika related tweets from Twitter. With a simple logistic classifier, the system was successful in detecting Zika related tweets from Twitter with a 92% accuracy. Moreover, key features were identified that provide deeper insights on the content of tweets relevant to Zika. This system can be leveraged by domain experts to perform sentiment analysis, and understand the temporal and spatial spread of Zika.

Proceedings of the AAAI Conference on Artificial Intelligence

Understanding speed and travel-time dynamics in response to various city related events is an imp... more Understanding speed and travel-time dynamics in response to various city related events is an important and challenging problem. Sensor data (numerical) containing average speed of vehicles passing through a road link can be interpreted in terms of traffic related incident reports from city authorities and social media data (textual), providing a complementary understanding of traffic dynamics. State-of-the-art research is focused on either analyzing sensor observations or citizen observations; we seek to exploit both in a synergistic manner. We demonstrate the role of domain knowledge in capturing the non-linearity of speed and travel-time dynamics by segmenting speed and travel-time observations into simpler components amenable to description using linear models such as Linear Dynamical System (LDS). Specifically, we propose Restricted Switching Linear Dynamical System (RSLDS) to model normal speed and travel time dynamics and thereby characterize anomalous dynamics. We utilize th...

Proceedings of the 2017 ACM on Web Science Conference

With the rapid increase in urban development, it is critical to utilize dynamic sensor streams fo... more With the rapid increase in urban development, it is critical to utilize dynamic sensor streams for tra c understanding, especially in larger cities where route planning or infrastructure planning is more critical. is creates a strong need to understand tra c pa erns using ubiquitous sensors to allow city o cials to be better informed when planning urban construction and to provide an understanding of the tra c dynamics in the city. In this study, we propose our framework ITSKG (Imagery-based Tra c Sensing Knowledge Graph) which utilizes the stationary tra c camera information as sensors to understand the tra c pa erns. e proposed system extracts image-based features from tra c camera images, adds a semantic layer to the sensor data for tra c information, and then labels tra c imagery with semantic labels such as congestion. We share a prototype example to highlight the novelty of our system and provide an online demo to enable users to gain a be er understanding of our system. is framework adds a new dimension to existing tra c modeling systems by incorporating dynamic image-based features as well as creating a knowledge graph to add a layer of abstraction to understand and interpret concepts like congestion to the tra c event detection system.

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020

Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulti... more Dementia is a group of irreversible, chronic, and progressive neurodegenerative disorders resulting in impaired memory, communication, and thought processes. In recent years, clinical research advances in brain aging have focused on the earliest clinically detectable stage of incipient dementia, commonly known as mild cognitive impairment (MCI). Currently, these disorders are diagnosed using a manual analysis of neuropsychological examinations. We measure the feasibility of using the linguistic characteristics of verbal utterances elicited during neuropsychological exams of elderly subjects to distinguish between elderly control groups, people with MCI, people diagnosed with possible Alzheimer's disease (AD) and probable AD. We investigated the performance of both theory-driven psycholinguistic features and data-driven contextual language embeddings in identifying different clinically diagnosed groups. Our experiments show that a combination of contextual and psycholinguistic features extracted by a Support Vector Machine improved distinguishing the verbal utterances of elderly controls, people with MCI, possible AD, and probable AD. This is the first work to identify four clinical diagnosis groups of dementia in a highly imbalanced dataset. Our work shows that machine learning algorithms built on contextual and psycholinguistic features can learn the linguistic biomarkers from verbal utterances and assist clinical diagnosis of different stages and types of dementia, even with limited data.

Improving Pain Assessment Using Vital Signs and Pain Medication for Patients With Sickle Cell Disease: Retrospective Study (Preprint)

BACKGROUND Sickle cell disease (SCD) is the most common inherited blood disorder affecting millio... more BACKGROUND Sickle cell disease (SCD) is the most common inherited blood disorder affecting millions of people worldwide. Most patients with SCD experience repeated, unpredictable episodes of severe pain. These pain episodes are the leading cause of emergency department visits among patients with SCD and may last for several weeks. Arguably, the most challenging aspect of treating pain episodes in SCD is assessing and interpreting a patient’s pain intensity level. OBJECTIVE This study aims to learn deep feature representations of subjective pain trajectories using objective physiological signals collected from electronic health records. METHODS This study used electronic health record data collected from 496 Duke University Medical Center participants over 5 consecutive years. Each record contained measures for 6 vital signs and the patient’s self-reported pain score, with an ordinal range from 0 (no pain) to 10 (severe and unbearable pain). We also extracted 3 features related to me...

2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Cornell University - arXiv, Mar 29, 2022

2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2020

Sleep, 2018

How do high school students' genetics progression networks change due to genetics instruction and how do they stabilize years after instruction?

Journal of Research in Science Teaching, 2022

2020 IEEE International Conference on Big Data (Big Data)

2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Proceedings of the AAAI Conference on Artificial Intelligence

Proceedings of the 2017 ACM on Web Science Conference

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020

Improving Pain Assessment Using Vital Signs and Pain Medication for Patients With Sickle Cell Disease: Retrospective Study (Preprint)

Global organizations like the UNFPA seek real-time data from social media sites to measure public... more Global organizations like the UNFPA seek real-time data from social media sites to measure public attitude towards societal issues such as gender-based violence (GBV). In this study, we examine social data consisting of microblogs collected from Twitter to analyze public opinion regarding GBV, analyzing tweeting practices by pragmatic function (assertion, belief, etc.). We suggest that the ability to mine actionable insight depends on pragmatic function, where belief tweets specifically reveal attitudes (for example regarding sex trafficking or family condoned practices of arranged temporary marriage) that can serve as the target of intervention campaigns. Using a data driven approach that combines both supervised and unsupervised machine learning techniques, we first develop a classifier that distinguishes pragmatic function classes using unigrams. In subsequent processing, we mine topics within each class to demonstrate the presence of content pertinent to intervention within the belief class. This can assist both governmental and non-governmental organizations in shaping the focus of intervention, and the measurement of intervention effectiveness by changes in the prevalence of belief over time.