Automated Bolus Detection in Videofluoroscopic Images of Swallowing Using Mask-RCNN (original) (raw)
Related papers
Deep learning-based auto-segmentation of swallowing and chewing structures in CT
Delineating swallowing and chewing structures in Head and Neck (H&N) CT scans is necessary during radiotherapy treatment (RT) planning to limit the incidence of speech dysfunction, dysphagia, and trismus. Automating this process is desirable so as to reduce the amount of manual labor required and to generate reproducible contours. However, this is a challenging problem due to the low soft tissue contrast and morphological complexity of the structures involved. In this work, we developed deep learning-based models using 194 H&N CT scans from our institution to automate the delineation of the masseters (left and right), medial pterygoids (left and right), larynx, and constrictor muscles using the DeepLabV3+ architecture. An ensemble of three models was developed for each group of structures using axial, coronal, and sagittal images to provide useful contextual information from the three different views. Each of these models was trained in 2.5D by populating the channels with three con...
Scientific Reports
High resolution cervical auscultation is a very promising noninvasive method for dysphagia screening and aspiration detection, as it does not involve the use of harmful ionizing radiation approaches. Automatic extraction of swallowing events in cervical auscultation is a key step for swallowing analysis to be clinically effective. Using time-varying spectral estimation of swallowing signals and deep feed forward neural networks, we propose an automatic segmentation algorithm for swallowing accelerometry and sounds that works directly on the raw swallowing signals in an online fashion. The algorithm was validated qualitatively and quantitatively using the swallowing data collected from 248 patients, yielding over 3000 swallows manually labeled by experienced speech language pathologists. With a detection accuracy that exceeded 95%, the algorithm has shown superior performance in comparison to the existing algorithms and demonstrated its generalizability when tested over 76 completely...
Swallow segmentation with artificial neural networks and multi-sensor fusion
Medical Engineering & Physics, 2009
Swallow segmentation is a critical precursory step to the analysis of swallowing signal characteristics. In an effort to automatically segment swallows, we investigated artificial neural networks (ANN) with information from cervical dual-axis accelerometry, submental MMG, and nasal airflow. Our objectives were (1) to investigate the relationship between segmentation performance and the number of signal sources and (2) to identify the signals or signal combinations most useful for swallow segmentation. Signals were acquired from 17 healthy adults in both discrete and continuous swallowing tasks using five stimuli. Training and test feature vectors were constructed with variances from single or multiple signals, estimated within 200 ms moving windows with 50% overlap. Corresponding binary target labels (swallow or non-swallow) were derived by manual segmentation. A separate 3-layer ANN was trained for each participant-signal combination, and all possible signal combinations were investigated. As more signal sources were included, segmentation performance improved in terms of sensitivity, specificity, accuracy, and adjusted accuracy. The combination of all four signal sources achieved the highest mean accuracy and adjusted accuracy of 88.5% and 89.6%, respectively. A-P accelerometry proved to be the most discriminatory source, while the inclusion of MMG or nasal airflow resulted in the least performance improvement. These findings suggest that an ANN, multi-sensor fusion approach to segmentation is worthy of further investigation in swallowing studies.
IEEE Journal of Biomedical and Health Informatics
Upper esophageal sphincter is an important anatomical landmark of the swallowing process commonly observed through the kinematic analysis of radiographic examinations that are vulnerable to subjectivity and clinical feasibility issues. Acting as the doorway of esophagus, upper esophageal sphincter allows the transition of ingested materials from pharyngeal into esophageal stages of swallowing and a reduced duration of opening can lead to penetration/aspiration and/or pharyngeal residue. Therefore, in this study we consider a non-invasive high resolution cervical auscultation-based screening tool to approximate the human ratings of upper esophageal sphincter opening and closure. Swallows were collected from 116 patients and a deep neural network was trained to produce a mask that demarcates the duration of upper esophageal sphincter opening. The proposed method achieved more than 90% accuracy and similar values of sensitivity and specificity when compared to human ratings even when tested over swallows from an independent clinical experiment. Moreover, the predicted opening and closure moments surprisingly fell within an inter-human comparable error of their human rated counterparts which demonstrates the clinical significance of high resolution cervical auscultation in replacing ionizing radiation-based evaluation of swallowing kinematics.
Estimation of laryngeal closure duration during swallowing without invasive X-rays
Future Generation Computer Systems
Laryngeal vestibule (LV) closure is a critical physiologic event during swallowing, since it is the first line of defense against food bolus entering the airway. Identifying the laryngeal vestibule status, including closure, reopening and closure duration, provides indispensable references for assessing the risk of dysphagia and neuromuscular function. However, commonly used radiographic examinations, known as videofluoroscopy swallowing studies, are highly constrained by their radiation exposure and cost. Here, we introduce a non-invasive sensor-based system, that acquires high-resolution cervical auscultation signals from neck and accommodates advanced deep learning techniques for the detection of LV behaviors. The deep learning algorithm, which combined convolutional and recurrent neural networks, was developed with a dataset of 588 swallows from 120 patients with suspected dysphagia and further clinically tested on 45 samples from 16 healthy participants. For classifying the LV closure and opening statuses, our method achieved 78.94% and 74.89% accuracies for these two datasets, suggesting the feasibility of implementing sensor signals for LV prediction without traditional videofluoroscopy screening methods. The sensor supported system offers a broadly applicable computational approach for
Automatic Detection of Chewing and Swallowing
Sensors, 2021
A series of eating behaviors, including chewing and swallowing, is considered to be crucial to the maintenance of good health. However, most such behaviors occur within the human body, and highly invasive methods such as X-rays and fiberscopes must be utilized to collect accurate behavioral data. A simpler method of measurement is needed in healthcare and medical fields; hence, the present study concerns the development of a method to automatically recognize a series of eating behaviors from the sounds produced during eating. The automatic detection of left chewing, right chewing, front biting, and swallowing was tested through the deployment of the hybrid CTC/attention model, which uses sound recorded through 2ch microphones under the ear and weak labeled data as training data to detect the balance of chewing and swallowing. N-gram based data augmentation was first performed using weak labeled data to generate many weak labeled eating sounds to augment the training data. The detect...
Esophageal Abnormality detection using DenseNet based Faster R-CNN with Gabor features
IEEE Access
Early detection of esophageal abnormalities can help in preventing the progression of the disease into later stages. During esophagus examination, abnormalities are often overlooked due to the irregular shape, variable size and the complex surrounding area which requires a significant effort and experience. In this paper, a novel deep learning model which is based on Faster Region-Based Convolution Neural Network (Faster R-CNN) is presented to automatically detect abnormalities in the esophagus from endoscopic images. The proposed detection system is based on a combination of Gabor handcrafted features with CNN features. The Densely Connected Convolution Networks (DenseNets) architecture is embraced to extract CNN features providing a strengthened feature propagation between the layers and allay the vanishing gradient problem. To address the challenges of detecting abnormal complex regions, we propose fusing extracted Gabor features with CNN features through concatenation to enhance texture details in the detection stage. Our newly designed architecture is validated on two datasets (Kvasir and MICCAI 2015). Regarding the Kvasir, the results show an outstanding performance with a recall of 90.2% and precision of 92.1% with a mean of average precision (mAP) of 75.9%. While for the Miccai 2015 dataset, the model is able to surpass the state-of-the-art performance with 95% recall and 91% precision with mAP value of 84%. Experimental results demonstrate that the system is able to detect abnormalities in endoscopic images with good performance without any human intervention.
Automatic hyoid bone detection in fluoroscopic images using deep learning
Scientific Reports
The displacement of the hyoid bone is one of the key components evaluated in the swallow study, as its motion during swallowing is related to overall swallowing integrity. In daily research settings, experts visually detect the hyoid bone in the video frames and manually plot hyoid bone position frame by frame. This study aims to develop an automatic method to localize the location of the hyoid bone in the video sequence. To automatically detect the location of the hyoid bone in a frame, we proposed a single shot multibox detector, a deep convolutional neural network, which is employed to detect and classify the location of the hyoid bone. We also evaluated the performance of two other state-of-art detection methods for comparison. The experimental results clearly showed that the single shot multibox detector can detect the hyoid bone with an average precision of 89.14% and outperform other autodetection algorithms. We conclude that this automatic hyoid bone tracking system is accurate enough to be widely applied as a pre-processing step for image processing in dysphagia research, as well as a promising development that may be useful in the diagnosis of dysphagia.
2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020
One usage of medical ultrasound imaging is to visualize and characterize human tongue shape and motion during a real-time speech to study healthy or impaired speech production. Due to the low-contrast characteristic and noisy nature of ultrasound images, it might require expertise for non-expert users to recognize tongue gestures in applications such as visual training of a second language. Moreover, quantitative analysis of tongue motion needs the tongue dorsum contour to be extracted, tracked, and visualized. Manual tongue contour extraction is a cumbersome, subjective, and error-prone task. Furthermore, it is not a feasible solution for real-time applications. The growth of deep learning has been vigorously exploited in various computer vision tasks, including ultrasound tongue contour tracking. In the current methods, the process of tongue contour extraction comprises two steps of image segmentation and post-processing. This paper presents a new novel approach of automatic and real-time tongue contour tracking using deep neural networks. In the proposed method, instead of the two-step procedure, landmarks of the tongue surface are tracked. This novel idea enables researchers in this filed to benefits from available previously annotated databases to achieve high accuracy results. Our experiment disclosed the outstanding performances of the proposed technique in terms of generalization, performance, and accuracy.
Non-invasive quantification of human swallowing using a simple motion tracking system
Scientific reports, 2018
The number of patients with dysphagia is rapidly increasing due to the ageing of the population. Therefore, the importance of objectively assessing swallowing function has received increasing attention. Videofluoroscopy and videoendoscopy are the standard clinical examinations for dysphagia, but these techniques are not suitable for daily use because of their invasiveness. Here, we aimed to develop a novel, non-invasive method for measuring swallowing function using a motion tracking system, the Kinect v2 sensor. Five males and five females with normal swallowing function participated in this study. We defined three mouth-related parameters and two larynx-related parameters and recorded data from 2.5 seconds before to 2.5 seconds after swallowing onset. Changes in mouth-related parameters were observed before swallowing and reached peak values at the time of swallowing. In contrast, larynx-related parameters showed little change before swallowing and reached peak values immediately ...