Speaker Recognition with Recurrent Neural Networks (original) (raw)

We report on the application of recurrent neural nets in a open- set text-dependent speaker identification task. The motivation for applying recurrent neural nets to this domain is to find out if their ability to take short-term spectral features but yet respond to long-term temporal events is advantageous for speaker identifica- tion. We use a feedforward net architecture adapted from that intro- duced by Robinson et.al. We introduce a fully-connected hidden layer between the input and state nodes and the output. We show that this hidden layer makes the learning of complex classifica- tion tasks more efficient. Training uses back propagation through time. There is one output unit per speaker, with the training tar- gets corresponding to speaker identity. For 12 speakers (a mixture of male and female) we obtain a true acceptance rate 100% with a false acceptance rate 4%. For 16 speakers these figures are 94% and 7% respectively. We also investigate the sensitivity of identification ...