Jahanshah Kabudian | Razi University of Kermanshah, Iran (original) (raw)

Papers by Jahanshah Kabudian

Research paper thumbnail of Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme Learning Machine with a New Weighting Scheme and Spectro-Temporal Features Along with Classical Feature Selection and A New Quantum-Inspired Dimension Reduction Method

arXiv (Cornell University), Nov 13, 2021

In today's world, affective computing is very important in the relationship between man and machi... more In today's world, affective computing is very important in the relationship between man and machine. In this paper, a multi-stage system for speech emotion recognition (SER) based on speech signal is proposed, which uses new techniques in different stages of processing. The system consists of three stages: feature extraction, feature selection/dimension reduction, and finally feature classification. In the first stage, a complex set of long-termstatistics features is extracted from both the speech signal and the glottal-waveform signal using a combination of new and diverse features such as prosodic features, spectral features, and spectro-temporal features. One of the challenges of the SER systems is to distinguish correlated emotions. These features are good discriminators for speech emotions and increase the SER's ability to recognize similar and different emotions. The data augmentation technique is also used to increase the number of training samples. This feature vector with a large number of dimensions naturally has redundancy. In the second stage, using classical feature selection techniques as well as a new quantum-inspired technique to reduce the feature vector dimensionality (proposed by the authors), the number of feature vector dimensions is reduced. In the third stage, the optimized feature vector is classified by a weighted deep sparse extreme learning machine (ELM) classifier. The classifier performs classification in three steps: sparse random feature learning, orthogonal random projection using the singular value decomposition (SVD) technique, and discriminative classification in the last step using the generalized Tikhonov regularization technique. Also, many existing emotional datasets suffer from the problem of data imbalanced distribution, which in turn increases the classification error and decreases system performance. In this paper, a new weighting method has also been proposed to deal with class imbalance, which is more efficient than existing weighting methods. The proposed method is evaluated on three standard emotional databases EMODB, SAVEE, and IEMOCAP. According to our latest information, the system proposed in this paper is more accurate in recognizing emotions than the latest state-of-the-art methods. 1-Introduction Recognition of emotions from speech signal or Speech Emotion Recognition (SER) is one of the research fields in affective computing. The purpose of the SER is to analyze human speech signal and to extract the emotional state of the person (fear, joy, sadness, etc). The SER systems usually have three stages: feature extraction, dimension reduction/feature selection, and classification. In the feature extraction stage, prosodic and spectral features are usually used. These features alone are not able to discriminate unstable transitions in the speech signal in different emotions. For this purpose, in this paper, spectro-temporal features such as Gabor filter bank (GBFB) and separate Gabor filter bank (SGBFB) features [1] have been used, which have more discriminative power. New spectral features such as constant-Q cepstral coefficient (CQCC) [2], Single frequency cepstral coefficient (SFCC) [3] and IIR-CQT Mel-frequency cepstral coefficient (ICMC) [4] that have not previously been used to identify

Research paper thumbnail of A Neural Network-Based Optimal Nonlinear Fusion of Speech Pitch Detection Algorithms

2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI)

Fundamental frequency estimation is one of the most important issues in the field of speech proce... more Fundamental frequency estimation is one of the most important issues in the field of speech processing. An accurate estimate of the fundamental frequency plays a key role in the field of speech and music analysis. So far, various methods have been proposed in the time- and frequency-domain. However, the main challenge is the strong noises in speech signals. In this paper, to improve the accuracy of fundamental frequency estimation, we propose a method for optimal nonlinear combination of fundamental frequency estimation methods, in noisy signals. In this method, to discriminate voiced frames from unvoiced frames in a better way, the Voiced/Unvoiced (V/U) scores of four pitch detection methods are combined with nonlinear fusion. These methods are: Autocorrelation (AC), Yin, YAAPT and SWIPE. After identifying the Voiced/Unvoiced label of each frame, the fundamental frequency (F0) of the frame is estimated using the SWIPE method. The optimal function for nonlinear combination is determined using Multi-Layer Perceptron (MLP) neural network (NN). To evaluate the proposed method, 10 speech files (5 female and 5 male voices) are selected from the PTDB-TUG standard database and the results are presented in terms of GPE, VDE, PTE and FFE standard error criteria. The results indicate that our proposed method relatively reduced the aforementioned criteria (averaged in various SNRs) by 25.06%, 20.92%, 13.94%, and 25.94% respectively, which demonstrate the effectiveness of the proposed method in comparison to state-of-the-art methods.

Research paper thumbnail of A New Dynamic Simulated Annealing Algorithm For Global Optimization

Journal of Mathematics and Computer Science

Many problems in system analysis in real world lead to continuous-domain optimization. Existence ... more Many problems in system analysis in real world lead to continuous-domain optimization. Existence of sophisticated and many-variable problems in this field emerge need of efficient optimization methods. One of the optimization algorithms for multi-dimensional functions is simulated annealing (SA). In this paper, a modified simulated annealing named Dynamic Simulated Annealing (DSA) is proposed which dynamically switch between two types of generating function on traversed path of continuous Markov chain. Our experiments indicate that this approach can improve convergence and stability and avoid delusive areas in benchmark functions better than SA without any extra mentionable computational cost.

Research paper thumbnail of Comparison of MLP NN Approach with PCA and ICA for Extraction of Hidden Regulatory Signals in Biological Networks

Iranian Journal of Chemistry & Chemical Engineering-international English Edition, Dec 1, 2006

The biologists now face with the masses of high dimensional datasets generated from various high-... more The biologists now face with the masses of high dimensional datasets generated from various high-throughput technologies, which are outputs of complex interconnected biological networks at different levels driven by a number of hidden regulatory signals. So far, many computational and statistical methods such as PCA and ICA have been employed for computing low-dimensional or hidden representations of these datasets, but in most cases the results are inconsistent with underlying real network. In this paper we have employed and compared three linear (PCA and ICA) and non-linear (MLP neural network) dimensionality reduction techniques to uncover these regulatory signals, from outputs of such networks. The three approaches were verified experimentally using the absorbance spectra of a network of seven hemoglobin solutions, and the results revealed the superiority of the MLP NN to PCA and ICA. This study shows the capability of the MLP NN approach to efficiently determine the regulatory components in biological networked systems.

Research paper thumbnail of A Regularized Least Squares-Based Method for Optimal Fusion of Speech Pitch Detection Algorithms

2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), 2018

Fundamental frequency estimation is one of the most important issues in the field of speech proce... more Fundamental frequency estimation is one of the most important issues in the field of speech processing. An accurate estimate of the fundamental frequency plays a key role in the field of speech and music analysis. So far, various methods have been proposed in the time- and frequency-domain. However, the main challenge is the strong noises in speech signals. In this paper, to improve the accuracy of fundamental frequency estimation, we propose a method for optimal combination of fundamental frequency estimation methods, in noisy signals. In this method, to discriminate voiced frames from unvoiced frames in a better way, the Voiced/Unvoiced (V/U) scores of four pitch detection methods are combined linearly. These methods are: Autocorrelation, Yin, YAAPT and SWIPE. After identifying the Voiced/Unvoiced label of each frame, the fundamental frequency (F0) of the frame is estimated using the SWIPE method. The optimal coefficients for linear combination are determined using the regularized least squares method with Tikhonov regularization. To evaluate the proposed method, 10 speech files (5 female and 5 male voices) are selected from the PTDB-TUG standard database and the results are presented in terms of SDFPE, GPE, VDE, PTE and FFE standard error criteria. The results indicate that our proposed method relatively reduced the aforementioned criteria (averaged in various SNRs) by 27.13%, 22.14%, 17.40%, and 26.74% respectively, which demonstrate the effectiveness of the proposed method in comparison to state-of-the-art methods.

Research paper thumbnail of A New Quantum-PSO Metaheuristic and Its Application to ARMA Modeling of Speech Spectrum

2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), 2018

In speech signal representation models, autoregressive moving average (ARMA) modeling is used in ... more In speech signal representation models, autoregressive moving average (ARMA) modeling is used in various applications, such as feature extraction, signal coding, speech synthesis, and speech recognition. In this paper, a new method based on quantum-behaved particle swarm optimization (QPSO) is proposed for estimation of ARMA model coefficients. In the proposed algorithm called PMF-QPSO (probability mass function QPSO), by storing some of the last global best particles in memory, based on their fitnesses, they are given a chance to influence the motion of the next generation particles, which reduces the risk of stopping in local optima and increases the exploration of QPSO algorithm. Also, to ensure the stability of the estimated model, line spectral frequencies (LSF) are used as optimization parameters, and, accordingly, the truncated Laplace distribution is considered for the probability distribution of new particle locations. The implementation of the suggested algorithm in high-o...

Research paper thumbnail of Accuracy Improvement of All-Pole Spectrum Estimation Using Particle Swarm Optimization and Comparing with Classic Methods

pectrum estimation has many applications in digital signal processing. Parametric models for spec... more pectrum estimation has many applications in digital signal processing. Parametric models for spectrum estimation include AR model, MA model, and ARMA model. Normally, classic methods such as Durbin-Levinson or Burg method are used to calculate parameters of AR model. Clearly, there is a distance (error) between spectrum estimated using these algorithms and the actual signal spectrum. In this paper, Particle Swarm Optimization (PSO) method is used to reduce this error. Results show at least 40% improvement in decreasing error of estimated all-pole spectrum in comparison with classic spectral estimation methods.

Research paper thumbnail of Bidirectional Neural Network for Feature Compensation of Clean and Telephone Speech Signals

In this paper, we continue our previous work on nonlinear feature compensation of distortions in ... more In this paper, we continue our previous work on nonlinear feature compensation of distortions in clean and telephone speech recognition systems. We have shown that Bidirectional Neural Network (Bidi-NN) can compensate nonlinearly-distorted components of feature vectors. In this study, we present a new effort to improve recognition accuracy on clean and telephone speech data by employing a two-stage feature compensation technique for recovering optimal (from a classification point of view) Log-Filter Bank Energies (LFBE). These new features are achieved by training a new Bidi-NN with compensated features and considering compensated feature as the input data to Bidi-NN. We also achieved MFCC features by applying discrete cosine transform (DCT) to compensated Log-Filter Bank Energies (LFBE) features. HMM phone models are trained on these modified features. By using the two-stage compensated features, we obtained an absolute improvement of 4.73% and 9.29% in phone recognition accuracy c...

Research paper thumbnail of Fast estimation of warping factor in the vocal tract length normalization using obtained scores of gender detection modeling

Research paper thumbnail of Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm

Multimedia Tools and Applications, 2019

In recent years, Speech Emotion Recognition (SER) has received considerable attention in affectiv... more In recent years, Speech Emotion Recognition (SER) has received considerable attention in affective computing field. In this paper, an improved system for SER is proposed. In the feature extraction step, a hybrid high-dimensional rich feature vector is extracted from both speech signal and glottal-waveform signal using techniques such as MFCC, PLPC, and MVDR. The prosodic features derived from fundamental frequency (f0) contour are also added to this feature vector. The proposed system is based on a holistic approach that employs a modified quantum-behaved particle swarm optimization (QPSO) algorithm (called pQPSO) to estimate both the optimal projection matrix for feature-vector dimension reduction and Gaussian Mixture Model (GMM) classifier parameters. Since the problem parameters are in a limited range and the standard QPSO algorithm performs a search in an infinite range, in this paper, the QPSO is modified in such a way that it uses a truncated probability distribution and makes the search more efficient. The system works in real-time and is evaluated on three standard emotional speech databases Berlin database of emotional speech (EMO-DB), Surrey AudioVisual Expressed Emotion (SAVEE) and Interactive Emotional Dyadic Motion Capture (IEMOCAP). The proposed method improves the accuracy of the SER system compared to classical methods such as FA, PCA, PPCA, LDA, standard QPSO, wQPSO, and deep neural network, and also outperforms many state-of-the-art recent approaches that use the same datasets.

Research paper thumbnail of A survey on spectral methods in spoken language identification

Signal and Data Processing, 2017

Research paper thumbnail of A Comparison of PCA, ICA and Neural Network-based Approaches for Determination of Regulatory Signals in Biological Systems

Research paper thumbnail of Fast communication: Bernoulli versus Markov: Investigation of state transition regime in switching-state acoustic models

Signal Processing, Apr 1, 2009

Research paper thumbnail of An improved spectral subtraction speech enhancement system by using an adaptive spectral estimator

Canadian Conference on Electrical and Computer Engineering, 2005., 2005

Spectral subtraction is one of the most famous and common-used methods for speech enhancement. Th... more Spectral subtraction is one of the most famous and common-used methods for speech enhancement. The main weakness of this method is the production of an annoying noise called musical noise. In this paper, we have reduced the musical noise and improved the quality of enhanced speech by increasing the accuracy of the system spectral estimator. This method is useful for

Research paper thumbnail of Applying continuous action reinforcement learning automata(CARLA) to global training of hidden Markov models

International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., 2004

In this research, we have employed global search and global optimization techniques based on Simu... more In this research, we have employed global search and global optimization techniques based on Simulated Annealing (SA) and Continuous Action Reinforcement Learning Automata (CARLA) for global training of Hidden Markov Models. The main goal of this paper is comparing CARLA method to other continuous global optimization methods like SA. Experimental results show that the CARLA outperforms SA. This is due to the fact that CARLA is a continuous global optimization method with memory and SA is a memoryless one.

Research paper thumbnail of Two-stage feature compensation of clean and telephone speech signals employing bidirectional neural network

10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), 2010

Research paper thumbnail of Noise and Transmission Channel Degradation Compensation and Score Normalization Using a Robust Hybrid Speaker Verification and Identification System

Research paper thumbnail of A robust speaker recognition system combining factor analysis techniques

2014 21th Iranian Conference on Biomedical Engineering (ICBME), 2014

in this paper we implement state of the art factor analysis based methods and fused their scores ... more in this paper we implement state of the art factor analysis based methods and fused their scores to gain a channel robust speaker recognition system. These two methods are joint factor analysis (JFA) and i-Vector which define low-dimensional speaker and channel dependent spaces. For score fusion we propose a simple weight computation without training step. We experiment our method on two conditions; 1) in channel matched training and test channel (telephone in training phase/telephone in test phase) task and 2) the channel mismatched condition (telephone training phase/microphone, GSM and VOIP in test phase) task. Our strategies outperform a state-of-the-art GMM-UBM based system. We obtained more than 4% absolute EER improvement for both channel dependent and channel independent condition compared to the standard GMM-UBM based method. Simulation also results that the combined i-Vector and JFA based system give better performance than all implemented method.

Research paper thumbnail of A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter

2011 IEEE GCC Conference and Exhibition (GCC), 2011

Keyword spotting (KWS) refers to detection of a limited number of given keywords in speech uttera... more Keyword spotting (KWS) refers to detection of a limited number of given keywords in speech utterances. In this paper, we evaluate a robust keyword spotting system based on hidden markov models for speaker independent Persian conversational telephone speech. Performance of base line keyword spotter is improved by means of normalizing features using cepstral mean and variance normalization (CMVN) and cepstral gain normalization (CGN). And better performance is gained by applying auto-regressive moving average (ARMA) filter on normalized features. Experimental results show that although all these methods improve keyword spotting performance, CMVN and ARMA (MVA) processing of PLP features works much better on our Persian conversational telephone speech database and 41% improvement to baseline system is achieved at false alarm (FA) rate equal to 8.6 FA/KW/Hour.

Research paper thumbnail of A new method for language recognition based on improved GMM

2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011

... Faculty of Engineering, Shahed University Persian Gulf Superhighway, Tehran, IR Iran 1iman.mo... more ... Faculty of Engineering, Shahed University Persian Gulf Superhighway, Tehran, IR Iran 1iman.mousavian@gmail.com 3 smsadeghi2006@gmail.com 4 kabudian ... as the network input, and the language four bit code as the output are presented to the MLP network and train it how ...

Research paper thumbnail of Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme Learning Machine with a New Weighting Scheme and Spectro-Temporal Features Along with Classical Feature Selection and A New Quantum-Inspired Dimension Reduction Method

arXiv (Cornell University), Nov 13, 2021

In today's world, affective computing is very important in the relationship between man and machi... more In today's world, affective computing is very important in the relationship between man and machine. In this paper, a multi-stage system for speech emotion recognition (SER) based on speech signal is proposed, which uses new techniques in different stages of processing. The system consists of three stages: feature extraction, feature selection/dimension reduction, and finally feature classification. In the first stage, a complex set of long-termstatistics features is extracted from both the speech signal and the glottal-waveform signal using a combination of new and diverse features such as prosodic features, spectral features, and spectro-temporal features. One of the challenges of the SER systems is to distinguish correlated emotions. These features are good discriminators for speech emotions and increase the SER's ability to recognize similar and different emotions. The data augmentation technique is also used to increase the number of training samples. This feature vector with a large number of dimensions naturally has redundancy. In the second stage, using classical feature selection techniques as well as a new quantum-inspired technique to reduce the feature vector dimensionality (proposed by the authors), the number of feature vector dimensions is reduced. In the third stage, the optimized feature vector is classified by a weighted deep sparse extreme learning machine (ELM) classifier. The classifier performs classification in three steps: sparse random feature learning, orthogonal random projection using the singular value decomposition (SVD) technique, and discriminative classification in the last step using the generalized Tikhonov regularization technique. Also, many existing emotional datasets suffer from the problem of data imbalanced distribution, which in turn increases the classification error and decreases system performance. In this paper, a new weighting method has also been proposed to deal with class imbalance, which is more efficient than existing weighting methods. The proposed method is evaluated on three standard emotional databases EMODB, SAVEE, and IEMOCAP. According to our latest information, the system proposed in this paper is more accurate in recognizing emotions than the latest state-of-the-art methods. 1-Introduction Recognition of emotions from speech signal or Speech Emotion Recognition (SER) is one of the research fields in affective computing. The purpose of the SER is to analyze human speech signal and to extract the emotional state of the person (fear, joy, sadness, etc). The SER systems usually have three stages: feature extraction, dimension reduction/feature selection, and classification. In the feature extraction stage, prosodic and spectral features are usually used. These features alone are not able to discriminate unstable transitions in the speech signal in different emotions. For this purpose, in this paper, spectro-temporal features such as Gabor filter bank (GBFB) and separate Gabor filter bank (SGBFB) features [1] have been used, which have more discriminative power. New spectral features such as constant-Q cepstral coefficient (CQCC) [2], Single frequency cepstral coefficient (SFCC) [3] and IIR-CQT Mel-frequency cepstral coefficient (ICMC) [4] that have not previously been used to identify

Research paper thumbnail of A Neural Network-Based Optimal Nonlinear Fusion of Speech Pitch Detection Algorithms

2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI)

Fundamental frequency estimation is one of the most important issues in the field of speech proce... more Fundamental frequency estimation is one of the most important issues in the field of speech processing. An accurate estimate of the fundamental frequency plays a key role in the field of speech and music analysis. So far, various methods have been proposed in the time- and frequency-domain. However, the main challenge is the strong noises in speech signals. In this paper, to improve the accuracy of fundamental frequency estimation, we propose a method for optimal nonlinear combination of fundamental frequency estimation methods, in noisy signals. In this method, to discriminate voiced frames from unvoiced frames in a better way, the Voiced/Unvoiced (V/U) scores of four pitch detection methods are combined with nonlinear fusion. These methods are: Autocorrelation (AC), Yin, YAAPT and SWIPE. After identifying the Voiced/Unvoiced label of each frame, the fundamental frequency (F0) of the frame is estimated using the SWIPE method. The optimal function for nonlinear combination is determined using Multi-Layer Perceptron (MLP) neural network (NN). To evaluate the proposed method, 10 speech files (5 female and 5 male voices) are selected from the PTDB-TUG standard database and the results are presented in terms of GPE, VDE, PTE and FFE standard error criteria. The results indicate that our proposed method relatively reduced the aforementioned criteria (averaged in various SNRs) by 25.06%, 20.92%, 13.94%, and 25.94% respectively, which demonstrate the effectiveness of the proposed method in comparison to state-of-the-art methods.

Research paper thumbnail of A New Dynamic Simulated Annealing Algorithm For Global Optimization

Journal of Mathematics and Computer Science

Many problems in system analysis in real world lead to continuous-domain optimization. Existence ... more Many problems in system analysis in real world lead to continuous-domain optimization. Existence of sophisticated and many-variable problems in this field emerge need of efficient optimization methods. One of the optimization algorithms for multi-dimensional functions is simulated annealing (SA). In this paper, a modified simulated annealing named Dynamic Simulated Annealing (DSA) is proposed which dynamically switch between two types of generating function on traversed path of continuous Markov chain. Our experiments indicate that this approach can improve convergence and stability and avoid delusive areas in benchmark functions better than SA without any extra mentionable computational cost.

Research paper thumbnail of Comparison of MLP NN Approach with PCA and ICA for Extraction of Hidden Regulatory Signals in Biological Networks

Iranian Journal of Chemistry & Chemical Engineering-international English Edition, Dec 1, 2006

The biologists now face with the masses of high dimensional datasets generated from various high-... more The biologists now face with the masses of high dimensional datasets generated from various high-throughput technologies, which are outputs of complex interconnected biological networks at different levels driven by a number of hidden regulatory signals. So far, many computational and statistical methods such as PCA and ICA have been employed for computing low-dimensional or hidden representations of these datasets, but in most cases the results are inconsistent with underlying real network. In this paper we have employed and compared three linear (PCA and ICA) and non-linear (MLP neural network) dimensionality reduction techniques to uncover these regulatory signals, from outputs of such networks. The three approaches were verified experimentally using the absorbance spectra of a network of seven hemoglobin solutions, and the results revealed the superiority of the MLP NN to PCA and ICA. This study shows the capability of the MLP NN approach to efficiently determine the regulatory components in biological networked systems.

Research paper thumbnail of A Regularized Least Squares-Based Method for Optimal Fusion of Speech Pitch Detection Algorithms

2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), 2018

Fundamental frequency estimation is one of the most important issues in the field of speech proce... more Fundamental frequency estimation is one of the most important issues in the field of speech processing. An accurate estimate of the fundamental frequency plays a key role in the field of speech and music analysis. So far, various methods have been proposed in the time- and frequency-domain. However, the main challenge is the strong noises in speech signals. In this paper, to improve the accuracy of fundamental frequency estimation, we propose a method for optimal combination of fundamental frequency estimation methods, in noisy signals. In this method, to discriminate voiced frames from unvoiced frames in a better way, the Voiced/Unvoiced (V/U) scores of four pitch detection methods are combined linearly. These methods are: Autocorrelation, Yin, YAAPT and SWIPE. After identifying the Voiced/Unvoiced label of each frame, the fundamental frequency (F0) of the frame is estimated using the SWIPE method. The optimal coefficients for linear combination are determined using the regularized least squares method with Tikhonov regularization. To evaluate the proposed method, 10 speech files (5 female and 5 male voices) are selected from the PTDB-TUG standard database and the results are presented in terms of SDFPE, GPE, VDE, PTE and FFE standard error criteria. The results indicate that our proposed method relatively reduced the aforementioned criteria (averaged in various SNRs) by 27.13%, 22.14%, 17.40%, and 26.74% respectively, which demonstrate the effectiveness of the proposed method in comparison to state-of-the-art methods.

Research paper thumbnail of A New Quantum-PSO Metaheuristic and Its Application to ARMA Modeling of Speech Spectrum

2018 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), 2018

In speech signal representation models, autoregressive moving average (ARMA) modeling is used in ... more In speech signal representation models, autoregressive moving average (ARMA) modeling is used in various applications, such as feature extraction, signal coding, speech synthesis, and speech recognition. In this paper, a new method based on quantum-behaved particle swarm optimization (QPSO) is proposed for estimation of ARMA model coefficients. In the proposed algorithm called PMF-QPSO (probability mass function QPSO), by storing some of the last global best particles in memory, based on their fitnesses, they are given a chance to influence the motion of the next generation particles, which reduces the risk of stopping in local optima and increases the exploration of QPSO algorithm. Also, to ensure the stability of the estimated model, line spectral frequencies (LSF) are used as optimization parameters, and, accordingly, the truncated Laplace distribution is considered for the probability distribution of new particle locations. The implementation of the suggested algorithm in high-o...

Research paper thumbnail of Accuracy Improvement of All-Pole Spectrum Estimation Using Particle Swarm Optimization and Comparing with Classic Methods

pectrum estimation has many applications in digital signal processing. Parametric models for spec... more pectrum estimation has many applications in digital signal processing. Parametric models for spectrum estimation include AR model, MA model, and ARMA model. Normally, classic methods such as Durbin-Levinson or Burg method are used to calculate parameters of AR model. Clearly, there is a distance (error) between spectrum estimated using these algorithms and the actual signal spectrum. In this paper, Particle Swarm Optimization (PSO) method is used to reduce this error. Results show at least 40% improvement in decreasing error of estimated all-pole spectrum in comparison with classic spectral estimation methods.

Research paper thumbnail of Bidirectional Neural Network for Feature Compensation of Clean and Telephone Speech Signals

In this paper, we continue our previous work on nonlinear feature compensation of distortions in ... more In this paper, we continue our previous work on nonlinear feature compensation of distortions in clean and telephone speech recognition systems. We have shown that Bidirectional Neural Network (Bidi-NN) can compensate nonlinearly-distorted components of feature vectors. In this study, we present a new effort to improve recognition accuracy on clean and telephone speech data by employing a two-stage feature compensation technique for recovering optimal (from a classification point of view) Log-Filter Bank Energies (LFBE). These new features are achieved by training a new Bidi-NN with compensated features and considering compensated feature as the input data to Bidi-NN. We also achieved MFCC features by applying discrete cosine transform (DCT) to compensated Log-Filter Bank Energies (LFBE) features. HMM phone models are trained on these modified features. By using the two-stage compensated features, we obtained an absolute improvement of 4.73% and 9.29% in phone recognition accuracy c...

Research paper thumbnail of Fast estimation of warping factor in the vocal tract length normalization using obtained scores of gender detection modeling

Research paper thumbnail of Speech emotion recognition using discriminative dimension reduction by employing a modified quantum-behaved particle swarm optimization algorithm

Multimedia Tools and Applications, 2019

In recent years, Speech Emotion Recognition (SER) has received considerable attention in affectiv... more In recent years, Speech Emotion Recognition (SER) has received considerable attention in affective computing field. In this paper, an improved system for SER is proposed. In the feature extraction step, a hybrid high-dimensional rich feature vector is extracted from both speech signal and glottal-waveform signal using techniques such as MFCC, PLPC, and MVDR. The prosodic features derived from fundamental frequency (f0) contour are also added to this feature vector. The proposed system is based on a holistic approach that employs a modified quantum-behaved particle swarm optimization (QPSO) algorithm (called pQPSO) to estimate both the optimal projection matrix for feature-vector dimension reduction and Gaussian Mixture Model (GMM) classifier parameters. Since the problem parameters are in a limited range and the standard QPSO algorithm performs a search in an infinite range, in this paper, the QPSO is modified in such a way that it uses a truncated probability distribution and makes the search more efficient. The system works in real-time and is evaluated on three standard emotional speech databases Berlin database of emotional speech (EMO-DB), Surrey AudioVisual Expressed Emotion (SAVEE) and Interactive Emotional Dyadic Motion Capture (IEMOCAP). The proposed method improves the accuracy of the SER system compared to classical methods such as FA, PCA, PPCA, LDA, standard QPSO, wQPSO, and deep neural network, and also outperforms many state-of-the-art recent approaches that use the same datasets.

Research paper thumbnail of A survey on spectral methods in spoken language identification

Signal and Data Processing, 2017

Research paper thumbnail of A Comparison of PCA, ICA and Neural Network-based Approaches for Determination of Regulatory Signals in Biological Systems

Research paper thumbnail of Fast communication: Bernoulli versus Markov: Investigation of state transition regime in switching-state acoustic models

Signal Processing, Apr 1, 2009

Research paper thumbnail of An improved spectral subtraction speech enhancement system by using an adaptive spectral estimator

Canadian Conference on Electrical and Computer Engineering, 2005., 2005

Spectral subtraction is one of the most famous and common-used methods for speech enhancement. Th... more Spectral subtraction is one of the most famous and common-used methods for speech enhancement. The main weakness of this method is the production of an annoying noise called musical noise. In this paper, we have reduced the musical noise and improved the quality of enhanced speech by increasing the accuracy of the system spectral estimator. This method is useful for

Research paper thumbnail of Applying continuous action reinforcement learning automata(CARLA) to global training of hidden Markov models

International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., 2004

In this research, we have employed global search and global optimization techniques based on Simu... more In this research, we have employed global search and global optimization techniques based on Simulated Annealing (SA) and Continuous Action Reinforcement Learning Automata (CARLA) for global training of Hidden Markov Models. The main goal of this paper is comparing CARLA method to other continuous global optimization methods like SA. Experimental results show that the CARLA outperforms SA. This is due to the fact that CARLA is a continuous global optimization method with memory and SA is a memoryless one.

Research paper thumbnail of Two-stage feature compensation of clean and telephone speech signals employing bidirectional neural network

10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010), 2010

Research paper thumbnail of Noise and Transmission Channel Degradation Compensation and Score Normalization Using a Robust Hybrid Speaker Verification and Identification System

Research paper thumbnail of A robust speaker recognition system combining factor analysis techniques

2014 21th Iranian Conference on Biomedical Engineering (ICBME), 2014

in this paper we implement state of the art factor analysis based methods and fused their scores ... more in this paper we implement state of the art factor analysis based methods and fused their scores to gain a channel robust speaker recognition system. These two methods are joint factor analysis (JFA) and i-Vector which define low-dimensional speaker and channel dependent spaces. For score fusion we propose a simple weight computation without training step. We experiment our method on two conditions; 1) in channel matched training and test channel (telephone in training phase/telephone in test phase) task and 2) the channel mismatched condition (telephone training phase/microphone, GSM and VOIP in test phase) task. Our strategies outperform a state-of-the-art GMM-UBM based system. We obtained more than 4% absolute EER improvement for both channel dependent and channel independent condition compared to the standard GMM-UBM based method. Simulation also results that the combined i-Vector and JFA based system give better performance than all implemented method.

Research paper thumbnail of A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter

2011 IEEE GCC Conference and Exhibition (GCC), 2011

Keyword spotting (KWS) refers to detection of a limited number of given keywords in speech uttera... more Keyword spotting (KWS) refers to detection of a limited number of given keywords in speech utterances. In this paper, we evaluate a robust keyword spotting system based on hidden markov models for speaker independent Persian conversational telephone speech. Performance of base line keyword spotter is improved by means of normalizing features using cepstral mean and variance normalization (CMVN) and cepstral gain normalization (CGN). And better performance is gained by applying auto-regressive moving average (ARMA) filter on normalized features. Experimental results show that although all these methods improve keyword spotting performance, CMVN and ARMA (MVA) processing of PLP features works much better on our Persian conversational telephone speech database and 41% improvement to baseline system is achieved at false alarm (FA) rate equal to 8.6 FA/KW/Hour.

Research paper thumbnail of A new method for language recognition based on improved GMM

2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), 2011

... Faculty of Engineering, Shahed University Persian Gulf Superhighway, Tehran, IR Iran 1iman.mo... more ... Faculty of Engineering, Shahed University Persian Gulf Superhighway, Tehran, IR Iran 1iman.mousavian@gmail.com 3 smsadeghi2006@gmail.com 4 kabudian ... as the network input, and the language four bit code as the output are presented to the MLP network and train it how ...