Kernel-Based Probabilistic Neural Networks with Integrated Scoring Normalization for Speaker Verification (original) (raw)

A Comparative Study on Kernel-Based Probabilistic Neural Networks for Speaker Verification

International Journal of Neural Systems, 2002

This paper compares kernel-based probabilistic neural networks for speaker verification based on 138 speakers of the YOHO corpus. Experimental evaluations using probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models were conducted. The original training algorithm of PDBNNs was also modified to make PDBNNs appropriate for speaker verification. Results show that the equal error rate obtained by PDBNNs and GMMs is less than that of EBFNs (0.33% vs. 0.48%), suggesting that GMM- and PDBNN-based speaker models outperform the EBFN ones. This work also finds that the globally supervised learning of PDBNNs is able to find decision thresholds that not only maintain the false acceptance rates to a low level but also reduce their variation, whereas the ad-hoc threshold-determination approach used by the EBFNs and GMMs causes a large variation in the error rates. This property makes the performance of...

Speaker verification with a priori threshold determination using kernel-based probabilistic neural networks

2002

This paper compares kernel-based probabilistic neural networks for speaker verification. Experimental evaluations based on 138 speakers of the YOHO corpus using probabilistic decision-based neural networks (PDBNNs), Gaussian mixture models (GMMs) and elliptical basis function networks (EBFNs) as speaker models were conducted. The original PDBNN training algorithm was also modified to make PDBNNs appropriate for speaker verification. Results show that the equal error rate obtained by PDBNNs and GMMs is about half of that of EBFNs (1.19% vs. 2.73%), suggesting that GMM-and PDBNN-based speaker models outperform the EBFN one. This work also finds that the globally supervised learning of PDBNNs is able to find a set of decision thresholds that reduce the variation in FAR, whereas the ad hoc approach used by the EBFNs and GMMs is not able to do so. This property makes the performance of PDBNN-based systems more predictable.

Evaluation of kernel methods for speaker verification and identification

2002

Support vector machines are evaluated on speaker verification and speaker identification tasks. We compare the polynomial kernel, the Fisher kernel, a likelihood ratio kernel and the pair hidden Markov model kernel with baseline systems based on a discriminative polynomial classifier and generative Gaussian mixture model classifiers. Simulations were carried out on the YOHO database and some promising results were obtained.

A new two-stage scoring normalization approach to speaker verification

2002

In speaker verification, the cohort and world models have been separately used for scoring normalization. In this work, we embed the two models in elliptical basis function networks and propose a two-stage decision procedure for improving verification performance. The procedure begins with normalization of an utterance by a world model. If the difference between the resulting score and a world threshold is sufficiently large, the claimant is accepted or rejected immediately. Otherwise, the score will be normalized by a cohort model, and the resulting score will be compared with a cohort threshold to make a final accept/reject decision. Experimental evaluations based on the YOHO corpus suggest that the two-stage method achieves a lower error rate as compared to the case where only one background model is used.

Neural network based speaker classification and verification systems with enhanced features

2017 Intelligent Systems Conference (IntelliSys), 2017

This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition. With optimized features and model training, it achieves 100% classification rate in classification and less than 6% Equal Error Rate (ERR), using merely about 1 second and 5 seconds of data respectively. Features with stricter Voice Active Detection (VAD) than the regular one for speech recognition ensure extracting stronger voiced portion for speaker recognition, speaker-level mean and variance normalization helps to eliminate the discrepancy between samples from the same speaker. Both are proven to improve the system performance. In building the neural network speaker classifier, the network structure parameters are optimized with grid search and dynamically reduced regularization parameters are used to avoid training terminated in local minimum. It enables the training goes further with lower cost. In speaker verification, performance is improved with prediction score normalization, which rewards the speaker identity indices with distinct peaks and penalizes the weak ones with high scores but more competitors, and speaker-specific thresholding, which significantly reduces ERR in the ROC curve. TIMIT corpus with 8K sampling rate is used here. First 200 male speakers are used to train and test the classification performance. The testing files of them are used as in-domain registered speakers, while data from the remaining 126 male speakers are used as out-of-domain speakers, i.e. imposters in speaker verification.

Speaker verification using sequence discriminant support vector machines

IEEE Transactions on Speech and Audio Processing, 2005

This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system. Index Terms-Fisher kernel, score-space kernel, speaker verification, support vector machine. Vincent Wan received a BA in

TEXT-INDEPENDENT SPEAKER VERIFICATION BASED ON PROBABILISTIC NEURAL NETWORKS

2002

In this paper1, a text-independent Probabilistic Neural Network (PNN)-based Speaker Verification system is presented. Modular structure with a distinct PNN for each enrolled speaker is used. A gender-dependent universal background model is built to represent the impostor speakers. A detailed description of the system, as well as the time required for training and processing all the test trials is given.

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

Odyssey 2020 The Speaker and Language Recognition Workshop

Speaker verification systems usually suffer from the mismatch problem between training and evaluation data, such as speaker population mismatch, the channel and environment variations. In order to address this issue, it requires the system to have good generalization ability on unseen data. In this work, we incorporate Bayesian neural networks (BNNs) into the deep neural network (DNN) x-vector speaker verification system to improve the system's generalization ability. With the weight uncertainty modeling provided by BNNs, we expect the system could generalize better on the evaluation data and make verification decisions more accurately. Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data. Specifically, results show that the system could benefit from BNNs by a relative EER decrease of 2.66% and 2.32% respectively for short-and longutterance in-domain evaluations. Additionally, the fusion of DNN x-vector and Bayesian x-vector systems could achieve further improvement. Moreover, experiments conducted by outof-domain evaluations, e.g. models trained on Voxceleb1 while evaluated on NIST SRE10 core test, suggest that BNNs could bring a larger relative EER decrease of around 4.69%. Index termsspeaker verification, Bayesian neural network, DNN x-vector, uncertainty modelling

A comparison of Gaussian mixture and multiple binary classifier models for speaker verification

Australian New Zealand Conference on Intelligent Information Systems, 1996

A Gaussian mixture model (GMM) is compared to a multiple binary classifier model (MBCM) in two speaker verification experiments conducted on telephone speech. The MBCM consists of 45 Moody-Darken radial basis function neural networks (MD-RBFNs) whose outputs are fused. Furthermore, the model is pruned in order to remove poorly performing MD-RBFNs. In the first experiment, true speakers and impostors are

Using Kernel Discriminant Analysis to Improve the Characterization of the Alternative Hypothesis for Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing, 2000

Speaker verification can be viewed as a task of modeling and testing two hypotheses: the null hypothesis and the alternative hypothesis. Since the alternative hypothesis involves unknown impostors, it is usually hard to characterize a priori. In this paper, we propose improving the characterization of the alternative hypothesis by designing two decision functions based, respectively, on a weighted arithmetic combination and a weighted geometric combination of discriminative information derived from a set of pre-trained background models. The parameters associated with the combinations are then optimized using two kernel discriminant analysis techniques, namely, the Kernel Fisher Discriminant (KFD) and Support Vector Machine (SVM). The proposed approaches have two advantages over existing methods. The first is that they embed a trainable mechanism in the decision functions. The second is that they convert variable-length utterances into fixed-dimension characteristic vectors, which are easily processed by kernel discriminant analysis. The results of speaker-verification experiments conducted on two speech corpora show that the proposed methods outperform conventional likelihood ratio-based approaches.

Kernel-Based Probabilistic Neural Networks with Integrated Scoring Normalization for Speaker Verification (original) (raw)

Related papers