Machine-Learning-Based Closed-Set Text-Independent Speaker Identification Using Speech Recorded During 25 Hours of Prolonged Wakefulness (original) (raw)

We performed machine learning for text-independent speaker identification using speech recorded during the day, evening, and night, from subjects undergoing 25 hours of prolonged wakefulness. Subjects answered casual questions lasting approximately 3 minutes and described pictures presented to them for 0.5 minutes. We extracted 12,515 vocal features using OpenSmile software. For generalization of the training scheme, we segmented the 20 subjects into training and testing sets (10 subjects for each) and repeated testing four times with different subsets. Specifically, we used one set of 10 subjects to find the best feature-sets and the optimal machine-learning method, and the other set of 10 subjects was used to test the trained model. With trained machine-learning models using three speech sessions recorded throughout the day for speaker identification, we obtained 95% and 98.8% for balanced accuracies for daytime and evening speech, respectively, but 84.2% for nighttime-testing speech. With training data from all times of day-daytime, evening, and nighttime-we obtained 97.5%, 98.8%, and 98.1% for balanced accuracies for test data from daytime, evening, and nighttime speech, respectively; the overall accuracy was 98.1%. Prolonged wakefulness deteriorates the performance of machine-learning based speaker identification. This work suggests that machine-learning based speaker identification should be trained using speech data from both daytime and nighttime speech sessions for better overall accuracy. Machine learning can potentially be used for identifying a speaker's voice even when it is affected by tiredness and fatigue which are frequently encountered in scenarios such as the emergency rooms and long-duration repetitive task operations.