Kalyan Banerjee | IIT Kanpur (original) (raw)

Kalyan Banerjee

Uploads

Papers by Kalyan Banerjee

Research paper thumbnail of Comparing ANN and GMM in a voice conversion framework

In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussia... more In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussian mixture models (GMMs) for design of voice conversion system using line spectral frequencies (LSFs) as feature vectors. Both the ANN and GMM based models are explored to capture nonlinear mapping functions for modifying the vocal tract characteristics of a source speaker according to a desired target speaker. The LSFs are used to represent the vocal tract transfer function of a particular speaker. Mapping of the intonation patterns (pitch contour) is carried out using a codebook based model at segmental level. The energy profile of the signal is modified using a fixed scaling factor defined between the source and target speakers at the segmental level. Two different methods for residual modification such as residual copying and residual selection methods are used to generate the target residual signal. The performance of ANN and GMM based voice conversion (VC) system are conducted using subjective and objective measures. The results indicate that the proposed ANN-based model using LSFs feature set may be used as an alternative to state-of-the-art GMM-based models used to design a voice conversion system.

Research paper thumbnail of A pitch synchronous approach to design voice conversion system using source-filter correlation

We propose a pitch synchronous approach to design the voice conversion system taking into account... more We propose a pitch synchronous approach to design the voice conversion system taking into account the correlation between the excitation signal and vocal tract system characteristics of speech production mechanism. The glottal closure instants (GCIs) also known as epochs are used as anchor points for analysis and synthesis of the speech signal. The Gaussian mixture model (GMM) is considered to be the state-of-art method for vocal tract modification in a voice conversion framework. However, the GMM based models generate overly-smooth utterances and need to be tuned according to the amount of available training data. In this paper, we propose the support vector machine multi-regressor (M-SVR) based model that requires less tuning parameters to capture a mapping function between the vocal tract characteristics of the source and the target speaker. The prosodic features are modified using epoch based method and compared with the baseline pitch synchronous overlap and add (PSOLA) based method for pitch and time scale modification. The linear prediction residual (LP residual) signal corresponding to each frame of the converted vocal tract transfer function is selected from the target residual codebook using a modified cost function. The cost function is calculated based on mapped vocal tract transfer function and its dynamics along with minimum residual phase, pitch period and energy differences with the codebook entries. The LP residual signal corresponding to the target speaker is generated by concatenating the selected frame and its previous frame so as to retain the maximum information around the GCIs. The proposed system is also tested using GMM based model for vocal tract modification. The average mean opinion score (MOS) and ABX test results are 3.95 and 85 for GMM based system and 3.98 and 86 for the M-SVR based system respectively. The subjective and objective evaluation results suggest that the proposed M-SVR based model for vocal tract modification combined with modified residual selection and epoch based model for prosody modification can provide a good quality synthesized target output. The results also suggest that the proposed integrated system performs slightly better than the GMM based baseline system designed using either epoch based or PSOLA based model for prosody modification.

Research paper thumbnail of Removal of High Density Salt and Pepper Noise from Color Images through Variable Window Size

In this paper we pro pose a new efficient algorithm for restoration of both color and gray scale ... more In this paper we pro pose a new efficient algorithm for restoration of both color and gray scale images afTected by Impulse noise. The algorithm framed works adaptively at different noise levels. Based on noise density, pixel is healed by calculating mean of surrounding healthy pixels in a considered window size and taking the center pixel as the kernel of the window. The proposed algorithm restores pixels which are badly afTected by fixed valued impulse noise (0,255)(ln 8-bit encoding). The algorithm processes the whole image in a single pass and heals the afTected kernel in the considered window by measuring successive amplitudes of pixel. The proposed algorithm also aims in preserving edges of images and thus distortions measured in edges are sufficiently less than other proposed algorithms.

Research paper thumbnail of Comparing ANN and GMM in a voice conversion framework

In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussia... more In this paper, we present a comparative analysis of artificial neural networks (ANNs) and Gaussian mixture models (GMMs) for design of voice conversion system using line spectral frequencies (LSFs) as feature vectors. Both the ANN and GMM based models are explored to capture nonlinear mapping functions for modifying the vocal tract characteristics of a source speaker according to a desired target speaker. The LSFs are used to represent the vocal tract transfer function of a particular speaker. Mapping of the intonation patterns (pitch contour) is carried out using a codebook based model at segmental level. The energy profile of the signal is modified using a fixed scaling factor defined between the source and target speakers at the segmental level. Two different methods for residual modification such as residual copying and residual selection methods are used to generate the target residual signal. The performance of ANN and GMM based voice conversion (VC) system are conducted using subjective and objective measures. The results indicate that the proposed ANN-based model using LSFs feature set may be used as an alternative to state-of-the-art GMM-based models used to design a voice conversion system.

Research paper thumbnail of A pitch synchronous approach to design voice conversion system using source-filter correlation

We propose a pitch synchronous approach to design the voice conversion system taking into account... more We propose a pitch synchronous approach to design the voice conversion system taking into account the correlation between the excitation signal and vocal tract system characteristics of speech production mechanism. The glottal closure instants (GCIs) also known as epochs are used as anchor points for analysis and synthesis of the speech signal. The Gaussian mixture model (GMM) is considered to be the state-of-art method for vocal tract modification in a voice conversion framework. However, the GMM based models generate overly-smooth utterances and need to be tuned according to the amount of available training data. In this paper, we propose the support vector machine multi-regressor (M-SVR) based model that requires less tuning parameters to capture a mapping function between the vocal tract characteristics of the source and the target speaker. The prosodic features are modified using epoch based method and compared with the baseline pitch synchronous overlap and add (PSOLA) based method for pitch and time scale modification. The linear prediction residual (LP residual) signal corresponding to each frame of the converted vocal tract transfer function is selected from the target residual codebook using a modified cost function. The cost function is calculated based on mapped vocal tract transfer function and its dynamics along with minimum residual phase, pitch period and energy differences with the codebook entries. The LP residual signal corresponding to the target speaker is generated by concatenating the selected frame and its previous frame so as to retain the maximum information around the GCIs. The proposed system is also tested using GMM based model for vocal tract modification. The average mean opinion score (MOS) and ABX test results are 3.95 and 85 for GMM based system and 3.98 and 86 for the M-SVR based system respectively. The subjective and objective evaluation results suggest that the proposed M-SVR based model for vocal tract modification combined with modified residual selection and epoch based model for prosody modification can provide a good quality synthesized target output. The results also suggest that the proposed integrated system performs slightly better than the GMM based baseline system designed using either epoch based or PSOLA based model for prosody modification.

Research paper thumbnail of Removal of High Density Salt and Pepper Noise from Color Images through Variable Window Size

In this paper we pro pose a new efficient algorithm for restoration of both color and gray scale ... more In this paper we pro pose a new efficient algorithm for restoration of both color and gray scale images afTected by Impulse noise. The algorithm framed works adaptively at different noise levels. Based on noise density, pixel is healed by calculating mean of surrounding healthy pixels in a considered window size and taking the center pixel as the kernel of the window. The proposed algorithm restores pixels which are badly afTected by fixed valued impulse noise (0,255)(ln 8-bit encoding). The algorithm processes the whole image in a single pass and heals the afTected kernel in the considered window by measuring successive amplitudes of pixel. The proposed algorithm also aims in preserving edges of images and thus distortions measured in edges are sufficiently less than other proposed algorithms.

Log In