ISIS and NISIS: New bilingual dual-channel speech corpora for robust speaker recognition (original) (raw)

It is standard practice to use benchmark datasets for comparing meaningfully the performance of a number of competing speaker identification systems. Generally, such datasets consist of speech recordings from different speakers made at a single point of time, typically in the same language. That is, the training and test sets both consist of speech recorded at the same point of time in the same language over the same recording channel. This is generally not the case in real-life applications. In this paper, we introduce a new database consisting of speech recordings of 105 speakers, made over four sessions, in two languages and simultaneously over two channels. This database provides scope for experimentation regarding loss in efficiency due to possible mismatch in language, channel and recording session. Results of experiments with MFCC-based GMM speaker models are presented to highlight the need of such benchmark datasets for identifying robust speaker identification systems.