Hamid Reza Abutalebi | Yazd University (original) (raw)
Uploads
Papers by Hamid Reza Abutalebi
2010 5th International Symposium on Telecommunications, 2010
ABSTRACT This paper describes the blind speech target sources using Independent Component Analysi... more ABSTRACT This paper describes the blind speech target sources using Independent Component Analysis in the frequency domain. Mixtures are convolutive and the target sources are assumed to be close to sensors and to have non-gaussianity and also we do not have any information about the position and active time of each source. We are going to determine the number of target sources and solving the permutation problem based on basis vector clustering and at the end, extract the target sources in the frequency domain.
7'th International Symposium on Telecommunications (IST'2014), 2014
Kernel-based methods have been widely used in various machine learning tasks. The performance of ... more Kernel-based methods have been widely used in various machine learning tasks. The performance of these methods strongly relies on the choice of the kernel which represents the similarity between each pair of data points. Therefore, choosing an appropriate kernel function or tuning its parameter(s) is an important issue in the kernel-based methods. Multiple Kernel Learning (MKL) methods have been developed to tackle this problem by learning an optimal combination of a set of predefined kernels. Distance Metric Learning (DML) approaches have been also attracted the attention of a number of researchers in order to find an optimum metric automatically. In this paper, within the framework of the SVM classifier, we present a MKL method which is based on the concept of the distance metric learning theory. The method is compared to the other popularly used MKL approaches. We show that the MKL methods generally outperform the best kernel.
Signal and Data Processing, 2017
In this paper, a robust distant talking speech recognition system for a home automation system ap... more In this paper, a robust distant talking speech recognition system for a home automation system application is presented based on a microphone array. We use commands such as turning on/off or openinglclosing for controlling diferent device in a home or office. By combining diferent enhancement methods such as sub-array beamforming, spectrum enhancement, feature domain enhancement we increase the word recognition
In this paper we present an optimal estimator of magnitude spectrum for speech enhancement when t... more In this paper we present an optimal estimator of magnitude spectrum for speech enhancement when the clean speech DFT coefficients are modeled by a Laplacian distribution and the noise DFT coefficients are modeled by a Gaussian distribution. Chen has already introduced a Minimum Mean Square Error (MMSE) estimator of the magnitude spectrum. However, the proposed estimator, namely LapMMSE, does not have a closed form and is computationally extensive. We use his formulation for the MMSE estimator, employ some approximations and propose a computationally effective estimator for the magnitude spectrum. Experimental studies demonstrate better performance of our proposed estimator, Improved LapMMSE (ImpLapMMSE) Compared to LapMMSE and previous estimators in which Laplacian and Gaussian assumptions were made.
한국소음진동공학회 국제학술발표논문집, Jul 1, 2008
In the recent years, Constant Directivity Beam-formers (CDBs) have been extensively employed in s... more In the recent years, Constant Directivity Beam-formers (CDBs) have been extensively employed in speech enhancement applications. In this research, by extending previous works on CDBs and considering multiple desired sources, we have implemented a structure that preserves system response (beam-pattern) in multiple look directions (for the case of multiple desired sources). Also, the system adaptively minimizes transient noise power in the output of beam-former, and furthermore, produces some controlled nulls (controlled in both amplitude and angle) on beam-pattern. This strengthens the system in removing permanent directional noises and producing a frequency-invariant beam-pattern with multiple main-lobs (for capturing multiple desired signals) and controlled nulls in arbitrary frequency band. We have evaluated the capability of the proposed method in enhancement of broadband and telephony speech in the presence of various noise sources. Our extensive experiments demonstrate efficiency of the proposed method in suppression of environment noises.
Iet Signal Processing, Dec 1, 2014
Time difference of arrival (TDOA)-based techniques are a main category of speaker localisation me... more Time difference of arrival (TDOA)-based techniques are a main category of speaker localisation methods. In a large subcategory of these methods, the generalised cross-correlation (GCC) is employed for TDOA estimation. In this study, the authors propose a subband processing-based method that computes the GCC of the microphone pairs in each subband. The information collected from different subbands is then combined together to estimate the direction of two simultaneous speakers. While the conventional methods consider the whole signal spectrum in the localisation procedure, the proposed method takes advantage of the difference in the frequency contents of the speakers. The proposed method computes the histograms of the peak positions of the GCC curve for each microphone pair in different subbands. These histograms are then fused using one of the three proposed histogram averaging methods, called simple, sectional, and weighted averaging. The proposed method has been evaluated on simulated and real speech data in noisy, reverberant, and noisy-reverberant conditions. The evaluation results demonstrate the superiority of the proposed subband processing-based method over its fullband counterpart. The authors' experiments also show that among different histogram averaging methods, the weighted averaging has greater performance in estimating the direction of speakers.
Iet Signal Processing, 2011
The authors present here a novel method for reducing the late reverberation of speech signals in ... more The authors present here a novel method for reducing the late reverberation of speech signals in noisy environments. In this method, the amplitude of clean signal is obtained by an adaptive estimator that minimises the mean square error (MSE) under signal presence uncertainty. The spectral gain function, that is an adaptive variable-order minimum MSE estimator, is obtained as a weighted geometric mean of hypothetical gains associated with speech presence and absence. The order of estimator is estimated for each time frame and each frequency component individually. The authors propose the adaptation of order of estimator according to the probability of speech presence, which makes the estimation more accurate. The evaluations confirm superiority of the proposed method in dereverberation of speech signals in noisy environments.
ABSTRACT In this paper, we have considered a frequency tracking method based on Extended Kalman F... more ABSTRACT In this paper, we have considered a frequency tracking method based on Extended Kalman Filtering (EKF). The method uses a state space model to estimate and track the frequency of a harmonic signal embedded in broad-band noise. This signal is generally characterized by time-varying frequency and amplitude (nonstationary noisy harmonic signal). In this paper we have proposed a new state-space model for estimate and track the frequency of such nonstationary signals. The conventional EKF frequency trackers are typically characterized by a vector of three tuning parameters. With some assumptions and without any performance loss, we have proposed an EKF frequency tracker that only uses one tuning parameter. This simplification allows an easier and more transparent tuning of the EKF tracking behaviour.
The problem of speech enhancement using wavelet thresholding algorithm is considered. Major probl... more The problem of speech enhancement using wavelet thresholding algorithm is considered. Major problems in applying the basic algorithm are discussed and modifications are proposed to improve the method. First, we propose the use of different thresholds for different wavelet bands. Next, by employing a pause detection algorithm, noise profile is estimated and the thresholds are adapted. This enables the modified enhancement system to handle colored and nonstationary noises. Finally, a wavelet-based voiced/unvoiced classification is proposed and implemented that can further improve the performance of the enhancement system. To evaluate the system performance, we have used real-life noise types such as multi-talker babble and low-pass noises. Subjective and objective evaluations show that the proposed system improves the performance the wavelet thresholding algorithm.
European Signal Processing Conference, Aug 23, 2010
Steered Response Power-PHAse Transform (SRP-PHAT) method has been already proposed and investigat... more Steered Response Power-PHAse Transform (SRP-PHAT) method has been already proposed and investigated for the sound source localization. Grid search methods can be used to find global maximum of SRP, but they are so computationally expensive that can not be used in real-time applications. In this paper, we have proposed a SRP-based localization method which works in cascade with a DOA estimation module; i.e. first the direction of speaker is recognized by one of the DOA estimation methods; after that, we bound the search region to a space fragment around estimated direction of speaker; then we use SRP-PHAT algorithm computations and volume contraction methods (such as SRC and CFRC) on this fragmentized regions and decrease computational costs to a large extent. By use of the data collected from different (speaker) scenarios, we demonstrate the accuracy and speed gained by proposed method.
This paper proposes a new robust adaptive beamformer applicable to microphone arrays. The propose... more This paper proposes a new robust adaptive beamformer applicable to microphone arrays. The proposed beamformer is a Generalized Sidelobe Canceller (GSC) with a single-channel noise reduction stage. The single-channel stage can be either Optimally Modified Log-Spectral Amplitude (OMLSA) estimator or Adaptive Minimum Mean-Square (AMMSE) spectral amplitude estimator. These hybrid structures, named GSC-OMLSA/ GSC-AMMSE, improve noise reduction of the GSC beamformer in highly additive noisy environments where GSC alone fails to work properly. The proposed algorithms are evaluated through both subjective and objective measures like segmental SNR and log-likelihood ratio distance. The results demonstrate that GSC-OMLSA and GSC-AMMSE algorithm performs significantly better compare to GSC.
IEEE Signal Processing Letters, 2004
Performance of Adaptive Noise Cancellation (ANC) degrades severely when uncorrelated noise compon... more Performance of Adaptive Noise Cancellation (ANC) degrades severely when uncorrelated noise components are present at the two inputs. Thus, practical background diffuse noises pose a serious problem for ANC systems. In this research, we propose a new hybrid system that integrates Subband Adaptive Filters (SAFs) and a Wiener filter. The hybrid system is implemented on an oversampled DFT filterbank that efficiently integrates the SAF and the Wiener filter components in the frequency-domain. Performance evaluation of the hybrid system in presence of diffuse noise interference shows that the proposed system is superior to both the Wiener filter and the SAF subsystems.
2010 5th International Symposium on Telecommunications, 2010
ABSTRACT This paper describes the blind speech target sources using Independent Component Analysi... more ABSTRACT This paper describes the blind speech target sources using Independent Component Analysis in the frequency domain. Mixtures are convolutive and the target sources are assumed to be close to sensors and to have non-gaussianity and also we do not have any information about the position and active time of each source. We are going to determine the number of target sources and solving the permutation problem based on basis vector clustering and at the end, extract the target sources in the frequency domain.
7'th International Symposium on Telecommunications (IST'2014), 2014
Kernel-based methods have been widely used in various machine learning tasks. The performance of ... more Kernel-based methods have been widely used in various machine learning tasks. The performance of these methods strongly relies on the choice of the kernel which represents the similarity between each pair of data points. Therefore, choosing an appropriate kernel function or tuning its parameter(s) is an important issue in the kernel-based methods. Multiple Kernel Learning (MKL) methods have been developed to tackle this problem by learning an optimal combination of a set of predefined kernels. Distance Metric Learning (DML) approaches have been also attracted the attention of a number of researchers in order to find an optimum metric automatically. In this paper, within the framework of the SVM classifier, we present a MKL method which is based on the concept of the distance metric learning theory. The method is compared to the other popularly used MKL approaches. We show that the MKL methods generally outperform the best kernel.
Signal and Data Processing, 2017
In this paper, a robust distant talking speech recognition system for a home automation system ap... more In this paper, a robust distant talking speech recognition system for a home automation system application is presented based on a microphone array. We use commands such as turning on/off or openinglclosing for controlling diferent device in a home or office. By combining diferent enhancement methods such as sub-array beamforming, spectrum enhancement, feature domain enhancement we increase the word recognition
In this paper we present an optimal estimator of magnitude spectrum for speech enhancement when t... more In this paper we present an optimal estimator of magnitude spectrum for speech enhancement when the clean speech DFT coefficients are modeled by a Laplacian distribution and the noise DFT coefficients are modeled by a Gaussian distribution. Chen has already introduced a Minimum Mean Square Error (MMSE) estimator of the magnitude spectrum. However, the proposed estimator, namely LapMMSE, does not have a closed form and is computationally extensive. We use his formulation for the MMSE estimator, employ some approximations and propose a computationally effective estimator for the magnitude spectrum. Experimental studies demonstrate better performance of our proposed estimator, Improved LapMMSE (ImpLapMMSE) Compared to LapMMSE and previous estimators in which Laplacian and Gaussian assumptions were made.
한국소음진동공학회 국제학술발표논문집, Jul 1, 2008
In the recent years, Constant Directivity Beam-formers (CDBs) have been extensively employed in s... more In the recent years, Constant Directivity Beam-formers (CDBs) have been extensively employed in speech enhancement applications. In this research, by extending previous works on CDBs and considering multiple desired sources, we have implemented a structure that preserves system response (beam-pattern) in multiple look directions (for the case of multiple desired sources). Also, the system adaptively minimizes transient noise power in the output of beam-former, and furthermore, produces some controlled nulls (controlled in both amplitude and angle) on beam-pattern. This strengthens the system in removing permanent directional noises and producing a frequency-invariant beam-pattern with multiple main-lobs (for capturing multiple desired signals) and controlled nulls in arbitrary frequency band. We have evaluated the capability of the proposed method in enhancement of broadband and telephony speech in the presence of various noise sources. Our extensive experiments demonstrate efficiency of the proposed method in suppression of environment noises.
Iet Signal Processing, Dec 1, 2014
Time difference of arrival (TDOA)-based techniques are a main category of speaker localisation me... more Time difference of arrival (TDOA)-based techniques are a main category of speaker localisation methods. In a large subcategory of these methods, the generalised cross-correlation (GCC) is employed for TDOA estimation. In this study, the authors propose a subband processing-based method that computes the GCC of the microphone pairs in each subband. The information collected from different subbands is then combined together to estimate the direction of two simultaneous speakers. While the conventional methods consider the whole signal spectrum in the localisation procedure, the proposed method takes advantage of the difference in the frequency contents of the speakers. The proposed method computes the histograms of the peak positions of the GCC curve for each microphone pair in different subbands. These histograms are then fused using one of the three proposed histogram averaging methods, called simple, sectional, and weighted averaging. The proposed method has been evaluated on simulated and real speech data in noisy, reverberant, and noisy-reverberant conditions. The evaluation results demonstrate the superiority of the proposed subband processing-based method over its fullband counterpart. The authors' experiments also show that among different histogram averaging methods, the weighted averaging has greater performance in estimating the direction of speakers.
Iet Signal Processing, 2011
The authors present here a novel method for reducing the late reverberation of speech signals in ... more The authors present here a novel method for reducing the late reverberation of speech signals in noisy environments. In this method, the amplitude of clean signal is obtained by an adaptive estimator that minimises the mean square error (MSE) under signal presence uncertainty. The spectral gain function, that is an adaptive variable-order minimum MSE estimator, is obtained as a weighted geometric mean of hypothetical gains associated with speech presence and absence. The order of estimator is estimated for each time frame and each frequency component individually. The authors propose the adaptation of order of estimator according to the probability of speech presence, which makes the estimation more accurate. The evaluations confirm superiority of the proposed method in dereverberation of speech signals in noisy environments.
ABSTRACT In this paper, we have considered a frequency tracking method based on Extended Kalman F... more ABSTRACT In this paper, we have considered a frequency tracking method based on Extended Kalman Filtering (EKF). The method uses a state space model to estimate and track the frequency of a harmonic signal embedded in broad-band noise. This signal is generally characterized by time-varying frequency and amplitude (nonstationary noisy harmonic signal). In this paper we have proposed a new state-space model for estimate and track the frequency of such nonstationary signals. The conventional EKF frequency trackers are typically characterized by a vector of three tuning parameters. With some assumptions and without any performance loss, we have proposed an EKF frequency tracker that only uses one tuning parameter. This simplification allows an easier and more transparent tuning of the EKF tracking behaviour.
The problem of speech enhancement using wavelet thresholding algorithm is considered. Major probl... more The problem of speech enhancement using wavelet thresholding algorithm is considered. Major problems in applying the basic algorithm are discussed and modifications are proposed to improve the method. First, we propose the use of different thresholds for different wavelet bands. Next, by employing a pause detection algorithm, noise profile is estimated and the thresholds are adapted. This enables the modified enhancement system to handle colored and nonstationary noises. Finally, a wavelet-based voiced/unvoiced classification is proposed and implemented that can further improve the performance of the enhancement system. To evaluate the system performance, we have used real-life noise types such as multi-talker babble and low-pass noises. Subjective and objective evaluations show that the proposed system improves the performance the wavelet thresholding algorithm.
European Signal Processing Conference, Aug 23, 2010
Steered Response Power-PHAse Transform (SRP-PHAT) method has been already proposed and investigat... more Steered Response Power-PHAse Transform (SRP-PHAT) method has been already proposed and investigated for the sound source localization. Grid search methods can be used to find global maximum of SRP, but they are so computationally expensive that can not be used in real-time applications. In this paper, we have proposed a SRP-based localization method which works in cascade with a DOA estimation module; i.e. first the direction of speaker is recognized by one of the DOA estimation methods; after that, we bound the search region to a space fragment around estimated direction of speaker; then we use SRP-PHAT algorithm computations and volume contraction methods (such as SRC and CFRC) on this fragmentized regions and decrease computational costs to a large extent. By use of the data collected from different (speaker) scenarios, we demonstrate the accuracy and speed gained by proposed method.
This paper proposes a new robust adaptive beamformer applicable to microphone arrays. The propose... more This paper proposes a new robust adaptive beamformer applicable to microphone arrays. The proposed beamformer is a Generalized Sidelobe Canceller (GSC) with a single-channel noise reduction stage. The single-channel stage can be either Optimally Modified Log-Spectral Amplitude (OMLSA) estimator or Adaptive Minimum Mean-Square (AMMSE) spectral amplitude estimator. These hybrid structures, named GSC-OMLSA/ GSC-AMMSE, improve noise reduction of the GSC beamformer in highly additive noisy environments where GSC alone fails to work properly. The proposed algorithms are evaluated through both subjective and objective measures like segmental SNR and log-likelihood ratio distance. The results demonstrate that GSC-OMLSA and GSC-AMMSE algorithm performs significantly better compare to GSC.
IEEE Signal Processing Letters, 2004
Performance of Adaptive Noise Cancellation (ANC) degrades severely when uncorrelated noise compon... more Performance of Adaptive Noise Cancellation (ANC) degrades severely when uncorrelated noise components are present at the two inputs. Thus, practical background diffuse noises pose a serious problem for ANC systems. In this research, we propose a new hybrid system that integrates Subband Adaptive Filters (SAFs) and a Wiener filter. The hybrid system is implemented on an oversampled DFT filterbank that efficiently integrates the SAF and the Wiener filter components in the frequency-domain. Performance evaluation of the hybrid system in presence of diffuse noise interference shows that the proposed system is superior to both the Wiener filter and the SAF subsystems.