Dr. Sujan K U M A R Roy - Profile on Academia.edu (original) (raw)

Papers by Dr. Sujan K U M A R Roy

Single channel speech enhancement using subband iterative Kalman filter

2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016

In this paper, we propose a single channel speech enhancement algorithm using a subband iterative... more In this paper, we propose a single channel speech enhancement algorithm using a subband iterative Kalman filter. A wavelet filterbank is first used to decompose the noise corrupted speech into a number of subbands. To achieve the best tradeoff among the noise reduction, speech intelligibility and computational complexity, a partial reconstruction scheme based on consecutive mean squared error is proposed to synthesize the low-frequency (LF) and high-frequency (HF) subbands. An iterative Kalman filter is then applied to the partially reconstructed HF subband speech. Finally, the enhanced HF subband speech is combined with the partially reconstructed LF subband speech to reconstruct the fullband enhanced speech. Experimental results show that the proposed subband iterative Kalman filter based algorithm is capable of reducing adverse environmental noises for a wide range of input SNRs. The overall performance of our method in terms of segmental SNR, perceptual evaluation of speech quality (PESQ) and computational cost is superior to several existing Kalman filter based algorithms.

The quality and intelligibility of speech conversation are generally degraded by the surrounding ... more The quality and intelligibility of speech conversation are generally degraded by the surrounding noises. The main objective of speech enhancement (SE) is to eliminate or reduce such disturbing noises from the degraded speech. Various SE methods have been proposed in literature. Among them, the Kalman filter (KF) is known to be an efficient SE method that uses the minimum mean square error (MMSE). However, most of the conventional KF based speech enhancement methods need access to clean speech and additive noise information for the state-space model parameters, namely, the linear prediction coefficients (LPCs) and the additive noise variance estimation, which is impractical in the sense that in practice, we can access only the noisy speech. Moreover, it is quite difficult to estimate these model parameters efficiently in the presence of adverse environmental noises. Therefore, the main focus of this thesis is to develop single channel speech enhancement algorithms using Kalman filter...

Robust Pitch Estimation using Ensemble Empirical Mode Decomposition

Speech Prosody 2014, 2014

This paper presents an efficient pitch estimation algorithm for noisy speech signal using ensembl... more This paper presents an efficient pitch estimation algorithm for noisy speech signal using ensemble empirical mode decomposition (EEMD) based time domain filtering. The dominant harmonic of noisy speech is enhanced to make pitch period more prominent. The normalized autocorrelation function (NACF) of the modified signal is then decomposed into time varying subband signals using EEMD. In contrast to the ordinary EMD, it does not introduce any mode mixing during decomposition. The subbands containing pitch component are selected and separated yielding partially reconstructed signal. The pitch period is determined from thus separated signals. The experimental results show that the proposed algorithm performs better compared to other recently reported algorithms in noisy environment.

Pitch estimation of noisy speech using ensemble empirical mode decomposition and dominant harmonic modification

2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE), 2014

This paper presents an efficient pitch estimation algorithm (PEA) using dominant harmonic modific... more This paper presents an efficient pitch estimation algorithm (PEA) using dominant harmonic modification (DHM) and ensemble empirical mode decomposition (EEMD). The noisy speech is first low-pass filtered within the ranges of fundamental frequencies (50-500Hz) to obtain the pre-filtered signal (PFS). The pre-processed signal is then modified by enhancing its dominant harmonic and followed by the computation of the normalized autocorrelation function (NACF). Then, an EEMD based data adaptive time domain noise filtering is applied to the NACF. Finally, partial reconstruction is performed in the EEMD domain to determine the pitch period. Experimental evaluation of the proposed PEA shows that it outperforms some of the existing PEAs for a wide range of SNRs.

Adaptive thresholding approach for robust voiced/unvoiced classification

2011 IEEE International Symposium of Circuits and Systems (ISCAS), 2011

This paper presents a robust voiced/unvoiced classification method by using linear model of empir... more This paper presents a robust voiced/unvoiced classification method by using linear model of empirical mode decomposition (EMD) controlled by Hurst exponent. EMD decomposes any signals into a finite number of band limited signals called intrinsic mode functions (IMFs). It is assumed that voiced speech signal is composed of trend due to vocal cord vibration and some noise. No trend is

International Journal of Speech Technology, 2011

A novel and robust pitch estimation method is presented in this paper. The basic idea is to resha... more A novel and robust pitch estimation method is presented in this paper. The basic idea is to reshape the speech signal using a combination of the dominant harmonic modification (DHM) and data adaptive time domain filtering techniques. The noisy speech signal is filtered within the ranges of fundamental frequencies to obtain the pre-filtered signal (PFS). The dominant harmonic (DH) of the PFS is determined and enhanced its amplitude. Normalized autocorrelation function (NACF) is applied to that modified signal. Then empirical mode decomposition (EMD) based data adaptive time domain filtering is applied to the NACF signal. Partial reconstruction is performed in EMD domain. The pitch period is determined from the partially reconstructed signal. The experimental results show that the proposed method performs better than the other recently devel

Pitch estimation of noisy speech signals using EMD-fourier based hybrid algorithm

Proceedings of 2010 IEEE International Symposium on Circuits and Systems, 2010

AbstractThis paper focuses on a pitch estimation method of noisy speech signal using the combina... more AbstractThis paper focuses on a pitch estimation method of noisy speech signal using the combination of empirical mode decomposition (EMD) and discrete Fourier transform (DFT). The noisy speech signal is filtered within the range of fundamental frequency. Normalized ...

2020 14th International Conference on Signal Processing and Communication Systems (ICSPCS), 2020

Speech enhancement using augmented Kalman filter (AKF) suffers from the biased estimates of the l... more Speech enhancement using augmented Kalman filter (AKF) suffers from the biased estimates of the linear prediction coefficients (LPCs) of speech and noise signal in noisy conditions. The existing AKF was particularly designed to enhance the colored noise corrupted speech. In this paper, a causal convolutional encoder-decoder (CCED)-based method utilizes the LPC estimates of the AKF for speech enhancement. Specifically, a CCED network is used to estimate the instantaneous noise spectrum for computing the LPCs of noise on a framewise basis. Each noise corrupted speech frame is pre-whitened by a whitening filter, which is constructed with the noise LPCs. The speech LPCs are computed from the pre-whitened speech. The improved speech and noise LPCs enables the AKF to minimize the residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than the benchmark methods in various noise conditions for a wide range of SNR levels. Index Terms-Speech enhancement, augmented Kalman filter, convolution neural network, LPC, whitening filter.

A Non-Iterative Kalman Filter for Single Channel Speech Enhancement in Non-Stationary Noise Condition

2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS), 2018

This paper presents a non-iterative Kalman filter (NIT-KF) for single channel speech enhancement ... more This paper presents a non-iterative Kalman filter (NIT-KF) for single channel speech enhancement in nonstationary noise condition (NNC). To adopt NIT-KF with NNC, we address the adjustment of biased Kalman gain through efficient parameter estimation. We introduce an effective noise spectrum tracking method based on decision directed approach (DDA) controlled through a posteriori SNR and speech activity detector (SAD). With the estimated noise spectrum, the spectral over subtraction (SOS) algorithm is employed to the noisy speech; this gives a pre-filtered speech (PFS). The noise variance and LPCs are computed from the estimated noise and PFS, respectively. These are applied to NIT-KF to produce the enhanced speech. It is shown that the adjusted Kalman gain in NIT-KF is effective in minimizing the additive noise effect to an acceptable level. Extensive simulation results reveal that the proposed method outperforms other benchmark methods.

2020 14th International Conference on Signal Processing and Communication Systems (ICSPCS), 2020

The state-of-the-art robustness metric-based tuning of the augmented Kalman filter (AKF) gives an... more The state-of-the-art robustness metric-based tuning of the augmented Kalman filter (AKF) gives an under-estimated Kalman gain, resulting distortion in the enhanced speech during colored noise suppression. This paper introduces a sensitivity metric-based tuning of the AKF for enhancing speech corrupted with different noises. Specifically, we observe that the sensitivity metric-based tuning of the AKF overcomes the under-estimation issues of Kalman gain in the existing method. It is shown that the reduced-biased Kalman gain enables the AKF to restrict the residual noise passed to the enhanced speech. It also minimizes the distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the enhanced speech produced by the proposed method exhibits higher quality as well as intelligibility than the benchmark methods in colored and non-stationary noise conditions for a wide range of SNR levels.

Interspeech 2020, 2020

The existing Kalman filter (KF) suffers from poor estimates of the noise variance and the linear ... more The existing Kalman filter (KF) suffers from poor estimates of the noise variance and the linear prediction coefficients (LPCs) in real-world noise conditions. This results in a degraded speech enhancement performance. In this paper, a deep learning approach is used to more accurately estimate the noise variance and LPCs, enabling the KF to enhance speech in various noise conditions. Specifically, a deep learning approach to MMSEbased noise power spectral density (PSD) estimation, called DeepMMSE, is used. The estimated noise PSD is used to compute the noise variance. We also construct a whitening filter with its coefficients computed from the estimated noise PSD. It is then applied to the noisy speech, yielding pre-whitened speech for computing the LPCs. The improved noise variance and LPC estimates enable the KF to minimise the residual noise and distortion in the enhanced speech. Experimental results show that the proposed method exhibits higher quality and intelligibility in the enhanced speech than the benchmark methods in various noise conditions for a wide-range of SNR levels.

Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmen... more Current deep learning approaches to linear prediction coefficient (LPC) estimation for the augmented Kalman filter (AKF) produce bias estimates, due to the use of a whitening filter. This severely degrades the perceived quality and intelligibility of enhanced speech produced by the AKF. In this paper, we propose a deep learning framework that produces clean speech and noise LPC estimates with significantly less bias than previous methods, by avoiding the use of a whitening filter. The proposed framework, called DeepLPC, jointly estimates the clean speech and noise LPC power spectra. The estimated clean speech and noise LPC power spectra are passed through the inverse Fourier transform to form autocorrelation matrices, which are then solved by the Levinson-Durbin recursion to form the LPCs and prediction error variances of the speech and noise for the AKF. The performance of DeepLPC is evaluated on the NOIZEUS and DEMAND Voice Bank datasets using subjective AB listening tests, as wel...

Signals, 2021

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias... more Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted...

International Journal of Signal Processing Systems, 2019

This paper presents an iterative Kalman filter (IT-KF) with a reduced-biased Kalman gain for sing... more This paper presents an iterative Kalman filter (IT-KF) with a reduced-biased Kalman gain for single channel speech enhancement in Non-stationary Noise Conditions (NNCs). The proposed IT-KF aims to offset the bias in Kalman gain through efficient parameter estimation leading to improve the speech enhancement performance. To do this, we introduce a Decision Directed (DD) and a posteriori SNR based noise variance estimation method controlled through Speech Activity Detector (SAD). The proposed SAD incorporates a majority voting of three distinct SAD fusions. The LPC parameters are computed from the pre-smoothing of noisy speech. With these initial estimated parameters, an IT-KF processes the noisy speech at first iteration. The parameters are re-estimated from the processed speech, readjust the Kalman gain, and the process is repeated at second iteration. It is shown that the adjusted Kalman gain enables the IT-KF to minimize the remaining artifacts of the processed speech, yielding the enhanced speech. Extensive simulation results reveal that the proposed method outperforms other benchmark methods in NNCs for a wide range of SNRs.

The performance of speech coding, speech recognition, and speech enhancement largely depends upon... more The performance of speech coding, speech recognition, and speech enhancement largely depends upon the accuracy of the linear prediction coefficient (LPC) of clean speech and noise in practice. Formulation of speech and noise LPC estimation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised technique, typically a deep neural network (DNN) is trained to learn a mapping from noisy speech features to clean speech and noise LPCs. Training targets for DNN to clean speech and noise LPC estimation fall into four categories: line spectrum frequency (LSF), LPC power spectrum (LPC-PS), power spectrum (PS), and magnitude spectrum (MS). The choice of appropriate training target as well as the DNN method can have a significant impact on LPC estimation in practice. Motivated by this, we perform a comprehensive study on the training targets using two state-of-the-art DNN methods--- residual network and temporal convolutional network (ResNet-TCN) and ...

IEEE Access, 2021

Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal conv... more Current augmented Kalman filter (AKF)-based speech enhancement algorithms utilise a temporal convolutional network (TCN) to estimate the clean speech and noise linear prediction coefficient (LPC). However, the multi-head attention network MHANet) has demonstrated the ability to more efficiently model the long-term dependencies of noisy speech than TCNs. Motivated by this, we investigate the MHANet for LPC estimation. We aim to produce clean speech and noise LPC parameters with the least bias to date. With this, we also aim to produce higher quality and more intelligible enhanced speech than any current KF or AKF-based SEA. To this end, we investigate MHANet within the DeepLPC framework. DeepLPC is a deep learning framework for jointly estimating the clean speech and noise LPC power spectra. DeepLPC is selected as it exhibits significantly less bias than other frameworks, by avoiding the use of whitening filters and post-processing. DeepLPC-MHANet is evaluated on the NOIZEUS corpus using subjective AB listening tests, as well as seven different objective measures (CSIG, CBAK, COVL, PESQ, STOI, SegSNR, and SI-SDR). DeepLPC-MHANet is compared to five existing deep learning-based methods. Compared to other deep learning approaches, DeepLPC-MHANet produced clean speech LPC estimates with the least amount of bias. DeepLPC-MHANet-AKF also produced higher objective scores than any of the competing methods (with an improvement of 0.17 for CSIG, 0.15 for CBAK, 0.19 for COVL, 0.24 for PESQ, 3.70% for STOI, 1.03 dB for SegSNR, and 1.04 dB for SI-SDR over the next best method). The enhanced speech produced by DeepLPC-MHANet-AKF was also the most preferred amongst ten listeners. By producing LPC estimates with the least amount of bias to date, DeepLPC-MHANet enables the AKF to produce enhanced speech at a higher quality and intelligibility than any previous KF or AKF-based method. INDEX TERMS Speech enhancement, Kalman filter, augmented Kalman filter, LPC, temporal convolutional network, multi-head attention network.

2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020

The existing augmented Kalman filter (AKF) suffers from poor LPC estimates in real-world noise co... more The existing augmented Kalman filter (AKF) suffers from poor LPC estimates in real-world noise conditions, which degrades the speech enhancement performance. In this paper, a deep learning technique exploits the LPC estimates for the AKF to enhance speech in various noise conditions. Specifically, a deep residual network is used to estimate the noise PSD for computing noise LPCs. A whitening filter is also implemented with the noise LPCs to pre-whiten the noisy speech signal prior to estimating the speech LPCs. It is shown that the improved speech and noise LPCs enable the AKF to minimize the residual noise as well as distortion in the enhanced speech. Experimental results show that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than the benchmark methods in various noise conditions for a wide-range of SNR levels.

Speech corrupted by background noise (or noisy speech) can reduce the efficiency of communication... more Speech corrupted by background noise (or noisy speech) can reduce the efficiency of communication between man-man and man-machine. A speech enhancement algorithm (SEA) can be used to suppress the embedded background noise and increase the quality and intelligibility of noisy speech. Many applications, such as speech communication systems, hearing aid devices, and speech recognition systems, typically rely upon speech enhancement algorithms for robustness. This dissertation focuses on single-channel speech enhancement using Kalman filtering with machine learning methods. In Kalman filter (KF)-based speech enhancement, each clean speech frame is represented by an auto-regressive (AR) process, whose parameters comprise the linear prediction coefficients (LPCs) and prediction error variance. The LPC parameters and the additive noise variance are used to form the recursive equations of the KF. In augmented KF (AKF), both the clean speech and additive noise LPC parameters are incorporated...

2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020

Speech enhancement using augmented Kalman filter (AKF) suffers from the inaccurate estimates of t... more Speech enhancement using augmented Kalman filter (AKF) suffers from the inaccurate estimates of the key parameters, linear prediction coefficients (LPCs) of speech and noise signal in noisy conditions. The existing AKF particularly enhances speech in colored noise conditions. In this paper, a deep residual network (ResNet)-based method utilizes the LPC estimates of the AKF for speech enhancement in various noise conditions. Specifically, a ResNet20 (constructed with 20 layers) gives an estimate of the noise waveform for each noisy speech frame to compute the noise LPC parameters. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the corresponding noise LPCs. The speech LPC parameters are computed from the pre-whitened speech. The improved speech and noise LPC parameters enable the AKF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the proposed method exhibi...

2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2020

Speech enhancement using Kalman filter (KF) suffers from inaccurate estimates of the noise varian... more Speech enhancement using Kalman filter (KF) suffers from inaccurate estimates of the noise variance and the linear prediction coefficients (LPCs) in real-life noise conditions. This causes a degraded speech enhancement performance. In this paper, a causal convolutional neural network (CCNN) model is used to more accurately estimate the noise variance and LPC parameters of the KF for speech enhancement in real-life noise conditions. Specifically, a CCNN model gives an instantaneous estimate of the noise waveform for each noisy speech frame to compute the noise variance. Each noisy speech frame is pre-whitened by a whitening filter, which is constructed with the coefficients computed from the estimated noise. The LPC parameters are computed from the pre-whitened speech. The improved noise variance and LPCs enables the KF to minimize residual noise as well as distortion in the enhanced speech. Objective and subjective testing on NOIZEUS corpus reveal that the enhanced speech produced b...

Single channel speech enhancement using subband iterative Kalman filter

2016 IEEE International Symposium on Circuits and Systems (ISCAS), 2016

Robust Pitch Estimation using Ensemble Empirical Mode Decomposition

Speech Prosody 2014, 2014

Pitch estimation of noisy speech using ensemble empirical mode decomposition and dominant harmonic modification

2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE), 2014

Adaptive thresholding approach for robust voiced/unvoiced classification

2011 IEEE International Symposium of Circuits and Systems (ISCAS), 2011

International Journal of Speech Technology, 2011

Pitch estimation of noisy speech signals using EMD-fourier based hybrid algorithm

Proceedings of 2010 IEEE International Symposium on Circuits and Systems, 2010

2020 14th International Conference on Signal Processing and Communication Systems (ICSPCS), 2020

A Non-Iterative Kalman Filter for Single Channel Speech Enhancement in Non-Stationary Noise Condition

2018 12th International Conference on Signal Processing and Communication Systems (ICSPCS), 2018

2020 14th International Conference on Signal Processing and Communication Systems (ICSPCS), 2020

Interspeech 2020, 2020

Signals, 2021

International Journal of Signal Processing Systems, 2019

IEEE Access, 2021

2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020

2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020

2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2020