Motif discovery in speech: Application to monitoring Alzheimer’s disease (original) (raw)

2017, Current Alzheimer Research

INTRODUCTION: One of the most common presenting features of Alzheimer's disease (AD) is perseverative behavior: the tendency to make the same statement, ask the same question, or carry out the same action repeatedly over the course of the day. Since this phenomenon is widespread among patients with dementia, it is generally regarded as a sensitive indicator of a cognitive disorder that probably becomes more frequent with the progression of the condition. Having a means of measuring the occurrence of repeated speech episodes as the patients go about their daily lives could be used as a diagnostic aid and a tool for monitoring the condition. Continuous speech recording in a realworld setting would, however, violates privacy, can be prone to contamination by external noise and would be difficult to interpret without manually segmenting the various sources of recorded language (i.e. patient and interlocutors). Recording of energy fluctuations from the vocal apparatus through the bones of the skull (bone-conducted speech), however, enables us to collect data that derive exclusively from the patient. In this study a methodology to record and analyze bone-conducted speech using motif discovery techniques to identify, quantify and assess repeated speech segments from large sets of recorded data, was proposed. In order to evaluate the performance of the adopted method for speech pattern detection, pilot data consisting of both air-and bone-conducted speech recorded by actors using scripted texts with certain short questions and statements embedded several times, have been collected and processed and the preliminary results obtained are reported. METHOD: A proof of concept experimental testing was undertaken on 5 healthy subjects. The subjects were instructed to read aloud 3 predefined scripts containing short, embedded, repeated questions and statements. Boneconducted speech was recorded using an accelerometer attached to the skin above the temporo-mandibular joint, amplified and sampled at a frequency of 16 kHz. Synchronous recording of the air-conducted speech via a conventional headset microphone served as a reference to validate the accelerometer data. The accelerometer data, devoid of any semantic content, was automatically analyzed using signal processing techniques. Speech segments were extracted from the recordings, bandpass filtered, denoised and divided into frames for feature extraction. A set of 95 features, including statistical measures, spectral moments, Mel-frequency cepstral coefficients, Perceptual Linear Predictive Cepstral Coefficients and prosodic measures, were calculated for each frame.