Zhanibek Kozhirbayev | Nazarbayev University (original) (raw)

Papers by Zhanibek Kozhirbayev

Research paper thumbnail of Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Lecture notes in computer science, 2024

Research paper thumbnail of Enhancing Neural Machine Translation with Fine-Tuned mBART50 Pre-trained Model: An Examination with Low-Resource Translation Pairs

Enhancing Neural Machine Translation with Fine-Tuned mBART50 Pre-trained Model: An Examination with Low-Resource Translation Pairs

Ingénierie des systèmes d'information/Ingénierie des systèmes d'Information, Jun 20, 2024

Research paper thumbnail of Preliminary Tasks of Word Embeddings Comparison of Unaligned Audio and Text Data for the Kazakh Language

Preliminary Tasks of Word Embeddings Comparison of Unaligned Audio and Text Data for the Kazakh Language

Research paper thumbnail of Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper

Journal of Advances in Information Technology, Dec 31, 2022

In recent years, the progress made in neural models trained on extensive multilingual text or spe... more In recent years, the progress made in neural models trained on extensive multilingual text or speech data has shown great potential for improving the status of underresourced languages. This paper focuses on experimenting with three state-of-the-art speech recognition models, namely Facebook's Wav2Vec2.0 and Wav2Vec2-XLS-R, OpenAI's Whisper, on the Kazakh language. The objective of this research is to investigate the effectiveness of these models in transcribing Kazakh speech and to compare their performance with existing supervised Automatic Speech Recognition (ASR) systems. The study also aims to explore the possibility of using data from other languages for pre-training and to test whether fine-tuning the target language data can improve model performance. Thus, this work can provide insights into the effectiveness of using pretrained multilingual models in underresourced language settings. The wav2vec2.0 model achieved a Character Error Rate (CER) of 2.8 and a Word Error Rate (WER) of 8.7 on the test set, which closely matches the best result achieved by the end-to-end Transformer model. The large whisper model achieves a CER of approximately 4 on the test set. The results of this study can contribute to the development of robust and efficient ASR systems for the Kazakh language, benefiting various applications, including speech-to-text translation, voice assistants, and speech-based communication tools.

Research paper thumbnail of Speech and Computer

Speech and Computer

Lecture Notes in Computer Science, 2014

Text-to-Speech has traditionally been viewed as a “black box” component, where standard “portfoli... more Text-to-Speech has traditionally been viewed as a “black box” component, where standard “portfolio” voices are typically offered with a professional but “neutral” speaking style. For commercially important languages many different portfolio voices may be offered all with similar speaking styles. A customer wishing to use TTS will typically choose one of these voices. The only alternative is to opt for a “custom voice” solution. In this case, a customer pays for a TTS voice to be created using their preferred voice talent. Such an approach allows for some “tuning” of the scripts used to create the voice. Limited script elements may be added to provide better coverage of the customer’s expected domain and “gilded phrases” can be included to ensure that specific phrase fragments are spoken perfectly. However, even with such an approach the recording style is strictly controlled and standard scripts are augmented rather than redesigned from scratch. The “black box” approach to TTS allows for systems to be produced which satisfy the needs of a large number of customers, even if this means that solutions may be limited in the persona they present. Recent advances in conversational agent applications have changed people’s expectations of how a computer voice should sound and interact. Suddenly, it’s much more important for the TTS system to present a persona which matches the goals of the application. Such systems demand a more flamboyant, upbeat and expressive voice. The “black box” approach is no longer sufficient; voices for high-end conversational agents are being explicitly “designed” to meet the needs of such applications. These voices are both expressive and light in tone, and a complete contrast to the more conservative voices available for traditional markets. This paper will describe how Nuance is addressing this new and challenging market.

Research paper thumbnail of KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Lecture Notes in Computer Science, 2020

Research paper thumbnail of Document and Word-level Language Identification for Noisy User Generated Text

Document and Word-level Language Identification for Noisy User Generated Text

We present herein our work on language identification applied to comments left by the readers of ... more We present herein our work on language identification applied to comments left by the readers of online news sites popular in Kazakhstan. Such comments are typically written in one of the two languages spoken widely in the area (Kazakh and Russian) and sometimes - in a mixture of both. Code-switching (mixing languages) makes it desirable to identify language not only on document, but also on individual word level. We approach both tasks in a single two-step framework, performing unsupervised normalization and Nave Bayes text classification procedures successively. Moreover, we applied deep learning model based on recurrent networks with LSTM cell in order to classify text. Our results suggest improvement over the state-of-the-art for Kazakh language.

Research paper thumbnail of Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Research paper thumbnail of A performance comparison of container-based technologies for the Cloud

Future Generation Computer Systems, Mar 1, 2017

The key features of the micro-service hosting technologies for the Cloud were identified.  We pe... more The key features of the micro-service hosting technologies for the Cloud were identified.  We perform test cases to evaluate virtualization performance of these technologies.  There were roughly no overheads on memory utilization or CPU by the examined technologies.  I/O and operating system interactions incurred some overheads. Highlights (for review)

Research paper thumbnail of Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

Automatic language identification (LID) belongs to the automatic process whereby the identity of ... more Automatic language identification (LID) belongs to the automatic process whereby the identity of the language spoken in a speech sample can be distinguished. In recent decades, LID has made significant advancement in spoken language identification which received an advantage from technological achievements in related areas, such as signal processing, pattern recognition, machine learning and neural networks. This work investigates the employment of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for automatic language identification. The main reason of applying LSTM RNNs to the current task is their reasonable capacity in handling sequences. This study shows that LSTM RNNs can efficiently take advantage of temporal dependencies in acoustic data in order to learn relevant features for language recognition tasks. In this paper, we show results for conducted language identification experiments for Kazakh and Russian languages and the presented LSTM RNN model can deal with short utterances (2s). The model was trained using open-source high-level neural networks API Keras on limited computational resources.

Research paper thumbnail of Kazakh Text Normalization using Machine Translation Approaches

We present herein our work on text normalization applied to usergenerated content (UGC) in the Ka... more We present herein our work on text normalization applied to usergenerated content (UGC) in the Kazakh language collected from Kazakhstani segment of Internet. UGC as a text is notoriously difficult to process due to prompt introduction of neologisms, peculiar spelling, code-switching or transliteration. All of this increases lexical variety, thereby aggravating the most prominent problems of NLP, such as out-of-vocabulary lexica and data sparseness. It has been shown that certain preprocessing, known as lexical normalization or simply normalization, is required for them to work properly. We applied machine translation techniques to normalize Kazakh texts. For this, a parallel corpus was created with a set of aligned sentences in canonical and non-canonical forms. Using these comments, we created the phrase-based statistical machine translation system as a baseline system. Furthermore, we applied word-based sequence-sequence model to the normalization task. The former method shows 21.67 BLEUs on the test set, whereas later one obtained approximately 30 BLEU score.

Research paper thumbnail of Cascade Speech Translation for the Kazakh Language

Applied Sciences

Speech translation systems have become indispensable in facilitating seamless communication acros... more Speech translation systems have become indispensable in facilitating seamless communication across language barriers. This paper presents a cascade speech translation system tailored specifically for translating speech from the Kazakh language to Russian. The system aims to enable effective cross-lingual communication between Kazakh and Russian speakers, addressing the unique challenges posed by these languages. To develop the cascade speech translation system, we first created a dedicated speech translation dataset ST-kk-ru based on the ISSAI Corpus. The ST-kk-ru dataset comprises a large collection of Kazakh speech recordings along with their corresponding Russian translations. The automatic speech recognition (ASR) module of the system utilizes deep learning techniques to convert spoken Kazakh input into text. The machine translation (MT) module employs state-of-the-art neural machine translation methods, leveraging the parallel Kazakh-Russian translations available in the datase...

Research paper thumbnail of Comparison of Word Embeddings of Unaligned Audio and Text Data Using Persistent Homology

Comparison of Word Embeddings of Unaligned Audio and Text Data Using Persistent Homology

Springer eBooks, 2022

Research paper thumbnail of Preliminary tasks of unsupervised speech recognition based on unaligned audio and text data

Preliminary tasks of unsupervised speech recognition based on unaligned audio and text data

2022 International Conference on Engineering & MIS (ICEMIS)

Research paper thumbnail of VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources

VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources

2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)

In this paper we present work on intelligent multimodal search and archive system, in which the s... more In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.

Research paper thumbnail of KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Speech and Computer, 2020

Research paper thumbnail of Extended language modeling experiments for Kazakh

In this article we present dataset for the Kazakh language for the language modeling. It is an an... more In this article we present dataset for the Kazakh language for the language modeling. It is an analogue of the Penn Treebank dataset for the Kazakh language as we followed all instructions to create it. The main source for our dataset is articles on the web-pages which were primarily written in Kazakh since there are many new articles translated into Kazakh in Kazakhstan. The dataset is publicly available for research purposes. Several experiments were conducted with this dataset. Together with the traditional n-gram models, we created neural network models for the word-based language model (LM). The latter model on the basis of large parameterized long short-term memory (LSTM) shows the best performance. Since the Kazakh language is considered as an agglutinative language and it might have high out-of-vocabulary (OOV) rate on unseen datasets, we also carried on morph-based LM. With regard to experimental results, sub-word based LM is fitted well for Kazakh in both ngram and neural ...

Research paper thumbnail of Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

2017 IEEE 11th International Conference on Application of Information and Communication Technologies (AICT), 2017

Automatic language identification (LID) belongs to the automatic process whereby the identity of ... more Automatic language identification (LID) belongs to the automatic process whereby the identity of the language spoken in a speech sample can be distinguished. In recent decades, LID has made significant advancement in spoken language identification which received an advantage from technological achievements in related areas, such as signal processing, pattern recognition, machine learning and neural networks. This work investigates the employment of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for automatic language identification. The main reason of applying LSTM RNNs to the current task is their reasonable capacity in handling sequences. This study shows that LSTM RNNs can efficiently take advantage of temporal dependencies in acoustic data in order to learn relevant features for language recognition tasks. In this paper, we show results for conducted language identification experiments for Kazakh and Russian languages and the presented LSTM RNN model can deal with short utterances (2s). The model was trained using open-source high-level neural networks API Keras on limited computational resources.

Research paper thumbnail of Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Research paper thumbnail of Kazakh Text Normalization using Machine Translation Approaches

We present herein our work on text normalization applied to usergenerated content (UGC) in the Ka... more We present herein our work on text normalization applied to usergenerated content (UGC) in the Kazakh language collected from Kazakhstani segment of Internet. UGC as a text is notoriously difficult to process due to prompt introduction of neologisms, peculiar spelling, code-switching or transliteration. All of this increases lexical variety, thereby aggravating the most prominent problems of NLP, such as out-of-vocabulary lexica and data sparseness. It has been shown that certain preprocessing, known as lexical normalization or simply normalization, is required for them to work properly. We applied machine translation techniques to normalize Kazakh texts. For this, a parallel corpus was created with a set of aligned sentences in canonical and non-canonical forms. Using these comments, we created the phrase-based statistical machine translation system as a baseline system. Furthermore, we applied word-based sequence-sequence model to the normalization task. The former method shows 21...

Research paper thumbnail of Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Leveraging Wav2Vec2.0 for Kazakh Speech Recognition: An Experimental Study

Lecture notes in computer science, 2024

Research paper thumbnail of Enhancing Neural Machine Translation with Fine-Tuned mBART50 Pre-trained Model: An Examination with Low-Resource Translation Pairs

Enhancing Neural Machine Translation with Fine-Tuned mBART50 Pre-trained Model: An Examination with Low-Resource Translation Pairs

Ingénierie des systèmes d'information/Ingénierie des systèmes d'Information, Jun 20, 2024

Research paper thumbnail of Preliminary Tasks of Word Embeddings Comparison of Unaligned Audio and Text Data for the Kazakh Language

Preliminary Tasks of Word Embeddings Comparison of Unaligned Audio and Text Data for the Kazakh Language

Research paper thumbnail of Kazakh Speech Recognition: Wav2vec2.0 vs. Whisper

Journal of Advances in Information Technology, Dec 31, 2022

In recent years, the progress made in neural models trained on extensive multilingual text or spe... more In recent years, the progress made in neural models trained on extensive multilingual text or speech data has shown great potential for improving the status of underresourced languages. This paper focuses on experimenting with three state-of-the-art speech recognition models, namely Facebook's Wav2Vec2.0 and Wav2Vec2-XLS-R, OpenAI's Whisper, on the Kazakh language. The objective of this research is to investigate the effectiveness of these models in transcribing Kazakh speech and to compare their performance with existing supervised Automatic Speech Recognition (ASR) systems. The study also aims to explore the possibility of using data from other languages for pre-training and to test whether fine-tuning the target language data can improve model performance. Thus, this work can provide insights into the effectiveness of using pretrained multilingual models in underresourced language settings. The wav2vec2.0 model achieved a Character Error Rate (CER) of 2.8 and a Word Error Rate (WER) of 8.7 on the test set, which closely matches the best result achieved by the end-to-end Transformer model. The large whisper model achieves a CER of approximately 4 on the test set. The results of this study can contribute to the development of robust and efficient ASR systems for the Kazakh language, benefiting various applications, including speech-to-text translation, voice assistants, and speech-based communication tools.

Research paper thumbnail of Speech and Computer

Speech and Computer

Lecture Notes in Computer Science, 2014

Text-to-Speech has traditionally been viewed as a “black box” component, where standard “portfoli... more Text-to-Speech has traditionally been viewed as a “black box” component, where standard “portfolio” voices are typically offered with a professional but “neutral” speaking style. For commercially important languages many different portfolio voices may be offered all with similar speaking styles. A customer wishing to use TTS will typically choose one of these voices. The only alternative is to opt for a “custom voice” solution. In this case, a customer pays for a TTS voice to be created using their preferred voice talent. Such an approach allows for some “tuning” of the scripts used to create the voice. Limited script elements may be added to provide better coverage of the customer’s expected domain and “gilded phrases” can be included to ensure that specific phrase fragments are spoken perfectly. However, even with such an approach the recording style is strictly controlled and standard scripts are augmented rather than redesigned from scratch. The “black box” approach to TTS allows for systems to be produced which satisfy the needs of a large number of customers, even if this means that solutions may be limited in the persona they present. Recent advances in conversational agent applications have changed people’s expectations of how a computer voice should sound and interact. Suddenly, it’s much more important for the TTS system to present a persona which matches the goals of the application. Such systems demand a more flamboyant, upbeat and expressive voice. The “black box” approach is no longer sufficient; voices for high-end conversational agents are being explicitly “designed” to meet the needs of such applications. These voices are both expressive and light in tone, and a complete contrast to the more conservative voices available for traditional markets. This paper will describe how Nuance is addressing this new and challenging market.

Research paper thumbnail of KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Lecture Notes in Computer Science, 2020

Research paper thumbnail of Document and Word-level Language Identification for Noisy User Generated Text

Document and Word-level Language Identification for Noisy User Generated Text

We present herein our work on language identification applied to comments left by the readers of ... more We present herein our work on language identification applied to comments left by the readers of online news sites popular in Kazakhstan. Such comments are typically written in one of the two languages spoken widely in the area (Kazakh and Russian) and sometimes - in a mixture of both. Code-switching (mixing languages) makes it desirable to identify language not only on document, but also on individual word level. We approach both tasks in a single two-step framework, performing unsupervised normalization and Nave Bayes text classification procedures successively. Moreover, we applied deep learning model based on recurrent networks with LSTM cell in order to classify text. Our results suggest improvement over the state-of-the-art for Kazakh language.

Research paper thumbnail of Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Research paper thumbnail of A performance comparison of container-based technologies for the Cloud

Future Generation Computer Systems, Mar 1, 2017

The key features of the micro-service hosting technologies for the Cloud were identified.  We pe... more The key features of the micro-service hosting technologies for the Cloud were identified.  We perform test cases to evaluate virtualization performance of these technologies.  There were roughly no overheads on memory utilization or CPU by the examined technologies.  I/O and operating system interactions incurred some overheads. Highlights (for review)

Research paper thumbnail of Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

Automatic language identification (LID) belongs to the automatic process whereby the identity of ... more Automatic language identification (LID) belongs to the automatic process whereby the identity of the language spoken in a speech sample can be distinguished. In recent decades, LID has made significant advancement in spoken language identification which received an advantage from technological achievements in related areas, such as signal processing, pattern recognition, machine learning and neural networks. This work investigates the employment of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for automatic language identification. The main reason of applying LSTM RNNs to the current task is their reasonable capacity in handling sequences. This study shows that LSTM RNNs can efficiently take advantage of temporal dependencies in acoustic data in order to learn relevant features for language recognition tasks. In this paper, we show results for conducted language identification experiments for Kazakh and Russian languages and the presented LSTM RNN model can deal with short utterances (2s). The model was trained using open-source high-level neural networks API Keras on limited computational resources.

Research paper thumbnail of Kazakh Text Normalization using Machine Translation Approaches

We present herein our work on text normalization applied to usergenerated content (UGC) in the Ka... more We present herein our work on text normalization applied to usergenerated content (UGC) in the Kazakh language collected from Kazakhstani segment of Internet. UGC as a text is notoriously difficult to process due to prompt introduction of neologisms, peculiar spelling, code-switching or transliteration. All of this increases lexical variety, thereby aggravating the most prominent problems of NLP, such as out-of-vocabulary lexica and data sparseness. It has been shown that certain preprocessing, known as lexical normalization or simply normalization, is required for them to work properly. We applied machine translation techniques to normalize Kazakh texts. For this, a parallel corpus was created with a set of aligned sentences in canonical and non-canonical forms. Using these comments, we created the phrase-based statistical machine translation system as a baseline system. Furthermore, we applied word-based sequence-sequence model to the normalization task. The former method shows 21.67 BLEUs on the test set, whereas later one obtained approximately 30 BLEU score.

Research paper thumbnail of Cascade Speech Translation for the Kazakh Language

Applied Sciences

Speech translation systems have become indispensable in facilitating seamless communication acros... more Speech translation systems have become indispensable in facilitating seamless communication across language barriers. This paper presents a cascade speech translation system tailored specifically for translating speech from the Kazakh language to Russian. The system aims to enable effective cross-lingual communication between Kazakh and Russian speakers, addressing the unique challenges posed by these languages. To develop the cascade speech translation system, we first created a dedicated speech translation dataset ST-kk-ru based on the ISSAI Corpus. The ST-kk-ru dataset comprises a large collection of Kazakh speech recordings along with their corresponding Russian translations. The automatic speech recognition (ASR) module of the system utilizes deep learning techniques to convert spoken Kazakh input into text. The machine translation (MT) module employs state-of-the-art neural machine translation methods, leveraging the parallel Kazakh-Russian translations available in the datase...

Research paper thumbnail of Comparison of Word Embeddings of Unaligned Audio and Text Data Using Persistent Homology

Comparison of Word Embeddings of Unaligned Audio and Text Data Using Persistent Homology

Springer eBooks, 2022

Research paper thumbnail of Preliminary tasks of unsupervised speech recognition based on unaligned audio and text data

Preliminary tasks of unsupervised speech recognition based on unaligned audio and text data

2022 International Conference on Engineering & MIS (ICEMIS)

Research paper thumbnail of VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources

VITA Search - An Intelligent Multimodal Search and Archive System for Online Media Resources

2019 IEEE 13th International Conference on Application of Information and Communication Technologies (AICT)

In this paper we present work on intelligent multimodal search and archive system, in which the s... more In this paper we present work on intelligent multimodal search and archive system, in which the scientific findings obtained in the work on recognition of Kazakh and Russian speeches, language identification and spoken term detection methods were applied. The paper describes the goals and objectives, the architecture, as well as the subsystem modules of the developed system. The VITA Search system allows for accurately determining the exact time of the required spoken information in the data in Kazakh and Russian languages from various broadcast channels. The speech recognition unit uses the Kaldi toolkit to generate lattices from the raw audio data. An acoustic model trained using deep neural networks shows significant results. The word error rate on the train set for recognition of Kazakh speech was 3.86, and for Russian speech - 9.85. Moreover, we integrated a language identification model trained using Long Short-Term Memory Recurrent Neural Networks in order to select the correct model for the input audio. Regarding spoken term detection, we applied word and proxy-based approaches to search for keyword terms among the lattices.

Research paper thumbnail of KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Speech and Computer, 2020

Research paper thumbnail of Extended language modeling experiments for Kazakh

In this article we present dataset for the Kazakh language for the language modeling. It is an an... more In this article we present dataset for the Kazakh language for the language modeling. It is an analogue of the Penn Treebank dataset for the Kazakh language as we followed all instructions to create it. The main source for our dataset is articles on the web-pages which were primarily written in Kazakh since there are many new articles translated into Kazakh in Kazakhstan. The dataset is publicly available for research purposes. Several experiments were conducted with this dataset. Together with the traditional n-gram models, we created neural network models for the word-based language model (LM). The latter model on the basis of large parameterized long short-term memory (LSTM) shows the best performance. Since the Kazakh language is considered as an agglutinative language and it might have high out-of-vocabulary (OOV) rate on unseen datasets, we also carried on morph-based LM. With regard to experimental results, sub-word based LM is fitted well for Kazakh in both ngram and neural ...

Research paper thumbnail of Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

Kazakh and Russian Languages Identification Using Long Short-Term Memory Recurrent Neural Networks

2017 IEEE 11th International Conference on Application of Information and Communication Technologies (AICT), 2017

Automatic language identification (LID) belongs to the automatic process whereby the identity of ... more Automatic language identification (LID) belongs to the automatic process whereby the identity of the language spoken in a speech sample can be distinguished. In recent decades, LID has made significant advancement in spoken language identification which received an advantage from technological achievements in related areas, such as signal processing, pattern recognition, machine learning and neural networks. This work investigates the employment of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) for automatic language identification. The main reason of applying LSTM RNNs to the current task is their reasonable capacity in handling sequences. This study shows that LSTM RNNs can efficiently take advantage of temporal dependencies in acoustic data in order to learn relevant features for language recognition tasks. In this paper, we show results for conducted language identification experiments for Kazakh and Russian languages and the presented LSTM RNN model can deal with short utterances (2s). The model was trained using open-source high-level neural networks API Keras on limited computational resources.

Research paper thumbnail of Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Regarding the impact of Kazakh phonetic transcription on the performance of automatic speech recognition systems

Research paper thumbnail of Kazakh Text Normalization using Machine Translation Approaches

We present herein our work on text normalization applied to usergenerated content (UGC) in the Ka... more We present herein our work on text normalization applied to usergenerated content (UGC) in the Kazakh language collected from Kazakhstani segment of Internet. UGC as a text is notoriously difficult to process due to prompt introduction of neologisms, peculiar spelling, code-switching or transliteration. All of this increases lexical variety, thereby aggravating the most prominent problems of NLP, such as out-of-vocabulary lexica and data sparseness. It has been shown that certain preprocessing, known as lexical normalization or simply normalization, is required for them to work properly. We applied machine translation techniques to normalize Kazakh texts. For this, a parallel corpus was created with a set of aligned sentences in canonical and non-canonical forms. Using these comments, we created the phrase-based statistical machine translation system as a baseline system. Furthermore, we applied word-based sequence-sequence model to the normalization task. The former method shows 21...