Jozef Polacký - Academia.edu (original) (raw)

Papers by Jozef Polacký

Research paper thumbnail of An Impact of Narrowband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Recognition over Telecommunication Channel

Communications - Scientific letters of the University of Zilina, 2016

Over the past decades, Automatic Speaker Recognition (ASR) has become a very popular area of rese... more Over the past decades, Automatic Speaker Recognition (ASR) has become a very popular area of research in pattern recognition and machine learning. Scientists from around the world have been constantly working on improving speaker recognition systems and have also been looking for more effective procedures, which increase the actual recognition rate. ASR is a general term for both speaker identification and speaker verification tasks. A principle of a speaker identification and verification is displayed in Figs. 1 and 2 respectively.

Research paper thumbnail of Robust Speaker Verification over Narrowband and Wideband Communication Channels

Modern speaker recognition applications involve the authentication of users by their voices. A wi... more Modern speaker recognition applications involve the authentication of users by their voices. A wide range of systems requires reliable personal recognition techniques to either determine or confirm the identity of a person requesting some type of their services. The main purpose of these techniques is to ensure that provided services are accessed only by a legitimate user and no one else. Voice biometrics for user authentication is a task in which the goal is to perform convenient and secure authentication of speakers. In this work we investigate the use of voiced segments of speech utterances and features normalization techniques on the text-independent speaker verification system based on GMM-UBM approach. A variety of narrowband and wideband codecs were used for simulation real communication channel. Result shows that feature enhancement technique are better for narrowband verification especially.

Research paper thumbnail of Automatic speaker verification on narrowband and wideband lossy coded clean speech

IET Biometrics, Jul 1, 2017

Substantial progress has been achieved in voice-based biometrics in recent times but a variety of... more Substantial progress has been achieved in voice-based biometrics in recent times but a variety of challenges still remain for speech research community. One such obstacle is reliable speaker authentication from speech signals degraded by lossy compression. Compression is commonplace in modern telecommunications, such as mobile telephony, VoIP services, teleconference, voice messaging or gaming. In this study, the authors investigate the effect of lossy speech compression on text-independent speaker verification. Voice biometrics performance is evaluated on clean speech signals distorted by the state-of-the-art narrowband (NB) as well as wideband (WB) speech codecs. The tests are performed in both channel-matched and channel-mismatched scenarios. The test results show that coded WB speech improves voice authentication precision by 1–3% of equal error rate over coded NB speech, even at the lowest investigated bitrates. It is also shown that the enhanced voice services codec does not p...

Research paper thumbnail of Influence of packet loss on a speaker verification system over IP network

2016 26th International Conference Radioelektronika (RADIOELEKTRONIKA), 2016

The paper considers an influence of packet loss on a remote speaker verification in Voice over IP... more The paper considers an influence of packet loss on a remote speaker verification in Voice over IP (VoIP) environment. A lossy speech coding and packet loss represent a significant part of speech degradation in the VoIP environment. As an extent of packet loss impact is tightly related to a type of speech coder used to transmit speech data, different transmission conditions along with different speech codecs are investigated here. The speaker verification system used in this experimental study is based on a probabilistic GMM-UBM approach. In this paper, a speaker verification accuracy is evaluated against a level of packet loss in narrowband and wideband communication channel.

Research paper thumbnail of An Impact of Wideband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Verification over Telecommunication Channel

Proceedings of the 11th International Conference ELEKTRO 2016

An automatic verification of person's identity from its voice is a part of modern telecommuni... more An automatic verification of person's identity from its voice is a part of modern telecommunication services. In order to execute a verification task, a speech signal has to be transmitted to a remote server. So, a performance of the verification system can be influenced by various distortions that can occur when transmitting a speech signal through a communication channel. This paper studies an effect of the state of art wideband (WB) speech codecs on a performance of automatic speaker verification in the context of a channel/codec mismatch between enrollment and test utterances. The speaker verification system is developed on GMM-UBM method. The results show that EVS codec provides the best performance over all the investigated scenarios in this study. Moreover, deploying G.729.1 codec in a training process of the verification system provides the best equal error rate in the fully-codec mismatched scenario. Anyhow, differences between the equal error rates reported for all of the codecs involved in this scenario are mostly nonsignificant.

Research paper thumbnail of Text-Independent Speaker Identification Using GMM With Universal Background Model

State-of-the-art of speaker recognition is fully advanced nowadays. There are various well-known ... more State-of-the-art of speaker recognition is fully advanced nowadays. There are various well-known technologies used to process voice, including Gaussian mixture models. The paper presents our work on speaker identification from his voice. In our experiment we first extract key features from a speech signal using VOICEBOX [1]toolbox in MATLAB. These features are represented by a matrix of mel frequency cepstral coefficients (MFCC). Then, applying MSR Identity Toolbox, we build an identity for each person enrolled in our system using statistical Gaussian Mixture Model Universal Background Model (GMM-UBM) and features extracted from speech signals. Universal Background Model improves Gaussian Mixture Model statistical computation for decision logic in speaker verification task. As a corpus, we used TIMIT database for our experiments. Finally, we compared the recognition accuracy for several different scenarios of our experiments.

Research paper thumbnail of An analysis of the impact of packet loss, codecs and type of voice on internal parameters of P.563 model

Digital Technologies (DT 2014), Jul 2014

This paper deals with an analysis of internal parameters of the P.563 non-intrusive quality predi... more This paper deals with an analysis of internal parameters of the P.563 non-intrusive quality prediction model forming an overall quality prediction of this model in the context of an impact of natural and synthesized speech degraded by packet loss (independent and dependent losses) and speech coding (ITU-T G.711 codec, ITU-T G.729AB codec and iLBC codec). A main aim of this paper is to identify dominant internal parameters of the P.563 model for all the investigated codecs and clp parameters by conducting two-way analysis of variance (ANOVA) tests on all internal parameters of the P.563 model. All the identified dominant internal parameters will be further used in an investigation of non-monotonic behavior of the P.563 model predictions in this context, reported for ITU-T G.729AB codec in [6].

Research paper thumbnail of An Impact of Narrowband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Recognition over Telecommunication Channel

Communications - Scientific letters of the University of Zilina, 2016

Over the past decades, Automatic Speaker Recognition (ASR) has become a very popular area of rese... more Over the past decades, Automatic Speaker Recognition (ASR) has become a very popular area of research in pattern recognition and machine learning. Scientists from around the world have been constantly working on improving speaker recognition systems and have also been looking for more effective procedures, which increase the actual recognition rate. ASR is a general term for both speaker identification and speaker verification tasks. A principle of a speaker identification and verification is displayed in Figs. 1 and 2 respectively.

Research paper thumbnail of Robust Speaker Verification over Narrowband and Wideband Communication Channels

Modern speaker recognition applications involve the authentication of users by their voices. A wi... more Modern speaker recognition applications involve the authentication of users by their voices. A wide range of systems requires reliable personal recognition techniques to either determine or confirm the identity of a person requesting some type of their services. The main purpose of these techniques is to ensure that provided services are accessed only by a legitimate user and no one else. Voice biometrics for user authentication is a task in which the goal is to perform convenient and secure authentication of speakers. In this work we investigate the use of voiced segments of speech utterances and features normalization techniques on the text-independent speaker verification system based on GMM-UBM approach. A variety of narrowband and wideband codecs were used for simulation real communication channel. Result shows that feature enhancement technique are better for narrowband verification especially.

Research paper thumbnail of Automatic speaker verification on narrowband and wideband lossy coded clean speech

IET Biometrics, Jul 1, 2017

Substantial progress has been achieved in voice-based biometrics in recent times but a variety of... more Substantial progress has been achieved in voice-based biometrics in recent times but a variety of challenges still remain for speech research community. One such obstacle is reliable speaker authentication from speech signals degraded by lossy compression. Compression is commonplace in modern telecommunications, such as mobile telephony, VoIP services, teleconference, voice messaging or gaming. In this study, the authors investigate the effect of lossy speech compression on text-independent speaker verification. Voice biometrics performance is evaluated on clean speech signals distorted by the state-of-the-art narrowband (NB) as well as wideband (WB) speech codecs. The tests are performed in both channel-matched and channel-mismatched scenarios. The test results show that coded WB speech improves voice authentication precision by 1–3% of equal error rate over coded NB speech, even at the lowest investigated bitrates. It is also shown that the enhanced voice services codec does not p...

Research paper thumbnail of Influence of packet loss on a speaker verification system over IP network

2016 26th International Conference Radioelektronika (RADIOELEKTRONIKA), 2016

The paper considers an influence of packet loss on a remote speaker verification in Voice over IP... more The paper considers an influence of packet loss on a remote speaker verification in Voice over IP (VoIP) environment. A lossy speech coding and packet loss represent a significant part of speech degradation in the VoIP environment. As an extent of packet loss impact is tightly related to a type of speech coder used to transmit speech data, different transmission conditions along with different speech codecs are investigated here. The speaker verification system used in this experimental study is based on a probabilistic GMM-UBM approach. In this paper, a speaker verification accuracy is evaluated against a level of packet loss in narrowband and wideband communication channel.

Research paper thumbnail of An Impact of Wideband Speech Codec Mismatch on a Performance of GMM-UBM Speaker Verification over Telecommunication Channel

Proceedings of the 11th International Conference ELEKTRO 2016

An automatic verification of person's identity from its voice is a part of modern telecommuni... more An automatic verification of person's identity from its voice is a part of modern telecommunication services. In order to execute a verification task, a speech signal has to be transmitted to a remote server. So, a performance of the verification system can be influenced by various distortions that can occur when transmitting a speech signal through a communication channel. This paper studies an effect of the state of art wideband (WB) speech codecs on a performance of automatic speaker verification in the context of a channel/codec mismatch between enrollment and test utterances. The speaker verification system is developed on GMM-UBM method. The results show that EVS codec provides the best performance over all the investigated scenarios in this study. Moreover, deploying G.729.1 codec in a training process of the verification system provides the best equal error rate in the fully-codec mismatched scenario. Anyhow, differences between the equal error rates reported for all of the codecs involved in this scenario are mostly nonsignificant.

Research paper thumbnail of Text-Independent Speaker Identification Using GMM With Universal Background Model

State-of-the-art of speaker recognition is fully advanced nowadays. There are various well-known ... more State-of-the-art of speaker recognition is fully advanced nowadays. There are various well-known technologies used to process voice, including Gaussian mixture models. The paper presents our work on speaker identification from his voice. In our experiment we first extract key features from a speech signal using VOICEBOX [1]toolbox in MATLAB. These features are represented by a matrix of mel frequency cepstral coefficients (MFCC). Then, applying MSR Identity Toolbox, we build an identity for each person enrolled in our system using statistical Gaussian Mixture Model Universal Background Model (GMM-UBM) and features extracted from speech signals. Universal Background Model improves Gaussian Mixture Model statistical computation for decision logic in speaker verification task. As a corpus, we used TIMIT database for our experiments. Finally, we compared the recognition accuracy for several different scenarios of our experiments.

Research paper thumbnail of An analysis of the impact of packet loss, codecs and type of voice on internal parameters of P.563 model

Digital Technologies (DT 2014), Jul 2014

This paper deals with an analysis of internal parameters of the P.563 non-intrusive quality predi... more This paper deals with an analysis of internal parameters of the P.563 non-intrusive quality prediction model forming an overall quality prediction of this model in the context of an impact of natural and synthesized speech degraded by packet loss (independent and dependent losses) and speech coding (ITU-T G.711 codec, ITU-T G.729AB codec and iLBC codec). A main aim of this paper is to identify dominant internal parameters of the P.563 model for all the investigated codecs and clp parameters by conducting two-way analysis of variance (ANOVA) tests on all internal parameters of the P.563 model. All the identified dominant internal parameters will be further used in an investigation of non-monotonic behavior of the P.563 model predictions in this context, reported for ITU-T G.729AB codec in [6].