Nilufar Abdurakhmonova | National University of Uzbekistan (original) (raw)

Papers by Nilufar Abdurakhmonova

Research paper thumbnail of Developing named entity recognition algorithms for Uzbek: Dataset Insights and Implementation

Data in brief, Apr 1, 2024

Research paper thumbnail of “Turkic Morpheme”: From the Portal to the Linguistic Platform

Lecture notes in networks and systems, 2024

Research paper thumbnail of Statistical machine translation proposal for Uzbek to English

DOAJ (DOAJ: Directory of Open Access Journals), Nov 30, 2021

Research paper thumbnail of Automating the Transition from Dialectal to Literary Forms in Uzbek Language Texts: An Algorithmic Perspective

Research paper thumbnail of Morpheme Analysis of Occasionalisms in Natural Language Processing (NLP)

Zenodo (CERN European Organization for Nuclear Research), Aug 15, 2023

Якуб Умар оглы д.ф.н., профессор (Туркия) Алмаз Улви Биннатова д.ф.н., профессор (Азербайджан) Ба... more Якуб Умар оглы д.ф.н., профессор (Туркия) Алмаз Улви Биннатова д.ф.н., профессор (Азербайджан) Бакиева Гуландом д.ф.н., профессор (Узбекистан) Миннуллин Ким д.ф.н., профессор (Татарстан) Махмудов Низомиддин д.ф.н., профессор (Узбекистан) Керимов Исмаил д.ф.н., профессор (Россия) Джураев Маматкул д.ф.н., профессор (Узбекистан) Kуренов Рахыммамед к.ф.н. (Туркменистан) Кристофер Джеймс Форт Университет Мичигана (США) Умархаджаев Мухтар д.ф.н., профессор (Узбекистан) Мирзаев Ибодулло д.ф.н., профессор (Узбекистан) Балтабаев Хамидулла д.ф.н., профессор (Узбекистан) Дустмухаммедов Хуршид д.ф.н., профессор (Узбекистан) Лиходзиевский А.С. д.ф.н., профессор (Узбекистан) Сиддикова Ирода д.ф.н., профессор (Узбекистан) Шиукашвили Тамар д.ф.н. (Грузия) Юсупов Ойбек отв. секретарь, доцент (Узбекистан)

Research paper thumbnail of Modeling WordNet Type Thesaurus for Uzbek Language Semantic Dictionary

International journal of systems engineering, 2018

These days creating the corpus of texts for Uzbek language, creating and developing linguistic da... more These days creating the corpus of texts for Uzbek language, creating and developing linguistic databases, searchengine systems-are one of the crucial tasks of computational linguistics. Particularly, electronic dictionary-thesauruses, semantic dictionaries are one of them. Dictionary-thesaurus formation structure for Uzbek language, transferring the terminological dictionary into the e-version and implementing rules for establishing semantic relations between words where it gives a chance to establish automation linguistic processes of dictionary-thesauruses, which is the foundation of linguistic databases. Analyzing logical structure of paper-based dictionary thesauruses has given a chance to formalize its structure and creating rules for converting to e-version of dictionary-thesaurus syllables by using predicates language. Descriptors system is suggested in PROLOG language rules set for constructing e-version of dictionary-syllables.

Research paper thumbnail of Developing NLP Tool for Linguistic Analysis of Turkic Languages

2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)

Research paper thumbnail of Turkic Morpheme Web Portal as a Platform for Turkology Research

2020 International Conference on Information Science and Communications Technologies (ICISCT), 2020

This paper describes the development of Turkic Morpheme web portal, a toolkit that takes into acc... more This paper describes the development of Turkic Morpheme web portal, a toolkit that takes into account core features of Turkic languages and meets the requirements for research activities in computational linguistics and typology. This portal was created on the basis of the structural-parametric functional model of the Turkic morpheme and contains special linguistic databases that describe the categories of Turkic languages at different levels: morphological, syntactic, and semantic. The portal can also be used in educational process as a reference system for Turkic languages.

Research paper thumbnail of Applying Web Crawler Technologies for Compiling Parallel Corpora as one Stage of Natural Language Processing

2022 7th International Conference on Computer Science and Engineering (UBMK)

Research paper thumbnail of Modeling It of Discourse Analysis and the Ussues Machine Translation

The aim of the research work is to analyze theories on the formation of linguistic database of th... more The aim of the research work is to analyze theories on the formation of linguistic database of the translation program of simple texts from English into Uzbek and to create program foundations. The object of the research work is word combinations and simple sentences of English and Uzbek languages, grammatical expressions as well. Scientific novelty of the research work is as follows: Drawn conclusions provide exactness of translation on creating linguistic database of machine translation. created linguistic database of phrasal verbs, morphological lexicon, affixes of English and Uzbek languages and their morphological and syntactic models; identified coordination of simple sentence models for automatic translation; substantiated principles that have been created for analyzing morphological, syntactic-semantic texts of machine translation; worked out recommendations of coordinating paradigmatic attitudes on creating principles for electron dictionary and software for linguistic data...

Research paper thumbnail of MorphUz: Morphological Analyzer for the Uzbek Language

2022 7th International Conference on Computer Science and Engineering (UBMK)

Research paper thumbnail of Theoritical Moethodological Foundations of Creation Learner Dictionary

This article analyzes the views of scholars on the creation of learner dictionaries. According to... more This article analyzes the views of scholars on the creation of learner dictionaries. According to this, the principles of creating a dictionary such as the definition of the lexical minimum, the study of audience demand, the creation of article content through computer technology tools and corpus-based statistical analysis are theoretically studied in this paper. The specificity, structure, goals and objectives of these dictionaries are important in the creation of theoretical and methodological bases of educational lexicography of the Uzbek language in the example of practical research in this field in Russian and English.

Research paper thumbnail of UZWORDNET: A Lexical-Semantic Database for the Uzbek Language

Proceedings of the 11th International Global Wordnet Conference (GWC-2021), 2021

The results reported in this paper aim to increase the presence of the Uzbek language in the Inte... more The results reported in this paper aim to increase the presence of the Uzbek language in the Internet and its usability within IT applications. We describe the initial development of a “word-net” for the Uzbek language compatible to Princeton WordNet. We called it UZWORDNET. In the current version, UZWORDNET contains 28140 synsets, 64389 sense and 20683 words; its estimated accuracy is 75.98%. To the best of our knowledge, it is the largest wordnet for Uzbek existing to date, and the second wordnet developed overall.

Research paper thumbnail of Formal-Functional Models of the Uzbek Electron Corpus

The paper is devoted to the structure and its linguistic annotation for building Uzbek Corpus. Li... more The paper is devoted to the structure and its linguistic annotation for building Uzbek Corpus. Linguistic annotation, metadata and corpus manager as formal-functional model of the corpus are important for usage for many purposes. The fact that the platform allows users to address language and literature issues, use it online. The Uzbek corpus based on structural and sub corpus models, which partially represented in this paper, is going on process to develop Uzbek language technology.

Research paper thumbnail of Linguistic functionality of Uzbek Electron Corpus: uzbekcorpus.uz

2021 International Conference on Information Science and Communications Technologies (ICISCT)

Research paper thumbnail of Dependency Parsing Based On Uzbek Corpus

Syntactic parsing is crucial stage among existing different types of parsing methods in the field... more Syntactic parsing is crucial stage among existing different types of parsing methods in the field of NLP. Syntactic parsing assists to identify the type sentence and word combinations that represented grammatical relations of the words. However, there are various grammatical features of the languages, almost all languages follow common linguistic rules. The Uzbek language belongs to agglutinative language family based on free constituent order language in syntax. Our investigations show that morphological aspect of word forms plays an essential role to identify and compose syntactic relations for the Uzbek language. Given morphological and lexical information can solve the some problems which connecting with syntactic parsing as well. Our article represents some main point of views the stages of parsing on CoNLLU format based on Uzbek corpus analysis. Tabbiy tilni qayta ishlashda turli tahlil qilish metodlari orasida sintaktik analiz qilish muhim sanaladi. Sintaktik analiz tilning g...

Research paper thumbnail of O'zbek tili elektron korpusining kompyuter modellari

Research paper thumbnail of Personal Names Spell-Checking – a Study Related to Uzbek

The Journal of social sciences and humanities, 2018

Objective: In the paper we describe the development process of the dictionary of Uzbek names and ... more Objective: In the paper we describe the development process of the dictionary of Uzbek names and surnames. Methodology: The dictionary is created to support the identification of personal names in Uzbek texts, and to aid the spell-checking of texts written in Uzbek. Results: Apart from discussing the development process, we also evaluate the dictionary by performing a set of experiments. Conclusion: We verify whether the information collected in the dictionary can be successfully used to find and, if needed, correct the misspelled names and surnames.

Research paper thumbnail of First Results of the TurkLang-7 Project: Creating Russian-Turkic Parallel Corpora and MT Systems

The idea of the “TurkLang-7” project is to create datasets and neural machine translation systems... more The idea of the “TurkLang-7” project is to create datasets and neural machine translation systems for a set of Russian-Turkic low-resource language pairs. It is planned to achieve this goal through a hybrid approach to the creation of a multilingual parallel corpus between Russian and Turkic languages, studying the applicability and effectiveness of neural network learning methods (transfer learning, multi-task learning, back-translation, dual learning) in the context of the selected language pairs, as well as the development of specialized methods for the unification of parallel data in different languages, based on the agglutinative nature of the selected Turkic languages (structural and functional model of the Turkic morpheme). In this paper, we describe the main stages of work on this project and the results of the first year: we developed a semiautomatic process for creating parallel corpora, collected data from several sources on 7 Turkic languages, and conducted the first exp...

Research paper thumbnail of Development of Intellectual Web System for Morph Analyzing of Uzbek Words

Applied Sciences

Currently, there is an active development of the Uzbek sector of the Internet. In it, as in other... more Currently, there is an active development of the Uzbek sector of the Internet. In it, as in other national sectors, the most common form of presentation of textual information is semi-structured documents, work that presupposes the availability of reliable algorithms for text analysis, including its lexical characteristics. The article offers an intelligent web application developed for morphological analysis of words in the Uzbek language. The web application is based on the concept of generation and stem analysis of the Uzbek language word forms. A well-known Porter algorithm was chosen as the basis for stemming. The morphoanalyzer generates word forms of the Uzbek language based on the division of words into certain classes, taking into account the specifics and structure of this language. For example, nouns can be classified by meaning (related, nominal), by quantity (singular and plural), by case, and also, by the endings of belonging (possessive).

Research paper thumbnail of Developing named entity recognition algorithms for Uzbek: Dataset Insights and Implementation

Data in brief, Apr 1, 2024

Research paper thumbnail of “Turkic Morpheme”: From the Portal to the Linguistic Platform

Lecture notes in networks and systems, 2024

Research paper thumbnail of Statistical machine translation proposal for Uzbek to English

DOAJ (DOAJ: Directory of Open Access Journals), Nov 30, 2021

Research paper thumbnail of Automating the Transition from Dialectal to Literary Forms in Uzbek Language Texts: An Algorithmic Perspective

Research paper thumbnail of Morpheme Analysis of Occasionalisms in Natural Language Processing (NLP)

Zenodo (CERN European Organization for Nuclear Research), Aug 15, 2023

Якуб Умар оглы д.ф.н., профессор (Туркия) Алмаз Улви Биннатова д.ф.н., профессор (Азербайджан) Ба... more Якуб Умар оглы д.ф.н., профессор (Туркия) Алмаз Улви Биннатова д.ф.н., профессор (Азербайджан) Бакиева Гуландом д.ф.н., профессор (Узбекистан) Миннуллин Ким д.ф.н., профессор (Татарстан) Махмудов Низомиддин д.ф.н., профессор (Узбекистан) Керимов Исмаил д.ф.н., профессор (Россия) Джураев Маматкул д.ф.н., профессор (Узбекистан) Kуренов Рахыммамед к.ф.н. (Туркменистан) Кристофер Джеймс Форт Университет Мичигана (США) Умархаджаев Мухтар д.ф.н., профессор (Узбекистан) Мирзаев Ибодулло д.ф.н., профессор (Узбекистан) Балтабаев Хамидулла д.ф.н., профессор (Узбекистан) Дустмухаммедов Хуршид д.ф.н., профессор (Узбекистан) Лиходзиевский А.С. д.ф.н., профессор (Узбекистан) Сиддикова Ирода д.ф.н., профессор (Узбекистан) Шиукашвили Тамар д.ф.н. (Грузия) Юсупов Ойбек отв. секретарь, доцент (Узбекистан)

Research paper thumbnail of Modeling WordNet Type Thesaurus for Uzbek Language Semantic Dictionary

International journal of systems engineering, 2018

These days creating the corpus of texts for Uzbek language, creating and developing linguistic da... more These days creating the corpus of texts for Uzbek language, creating and developing linguistic databases, searchengine systems-are one of the crucial tasks of computational linguistics. Particularly, electronic dictionary-thesauruses, semantic dictionaries are one of them. Dictionary-thesaurus formation structure for Uzbek language, transferring the terminological dictionary into the e-version and implementing rules for establishing semantic relations between words where it gives a chance to establish automation linguistic processes of dictionary-thesauruses, which is the foundation of linguistic databases. Analyzing logical structure of paper-based dictionary thesauruses has given a chance to formalize its structure and creating rules for converting to e-version of dictionary-thesaurus syllables by using predicates language. Descriptors system is suggested in PROLOG language rules set for constructing e-version of dictionary-syllables.

Research paper thumbnail of Developing NLP Tool for Linguistic Analysis of Turkic Languages

2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)

Research paper thumbnail of Turkic Morpheme Web Portal as a Platform for Turkology Research

2020 International Conference on Information Science and Communications Technologies (ICISCT), 2020

This paper describes the development of Turkic Morpheme web portal, a toolkit that takes into acc... more This paper describes the development of Turkic Morpheme web portal, a toolkit that takes into account core features of Turkic languages and meets the requirements for research activities in computational linguistics and typology. This portal was created on the basis of the structural-parametric functional model of the Turkic morpheme and contains special linguistic databases that describe the categories of Turkic languages at different levels: morphological, syntactic, and semantic. The portal can also be used in educational process as a reference system for Turkic languages.

Research paper thumbnail of Applying Web Crawler Technologies for Compiling Parallel Corpora as one Stage of Natural Language Processing

2022 7th International Conference on Computer Science and Engineering (UBMK)

Research paper thumbnail of Modeling It of Discourse Analysis and the Ussues Machine Translation

The aim of the research work is to analyze theories on the formation of linguistic database of th... more The aim of the research work is to analyze theories on the formation of linguistic database of the translation program of simple texts from English into Uzbek and to create program foundations. The object of the research work is word combinations and simple sentences of English and Uzbek languages, grammatical expressions as well. Scientific novelty of the research work is as follows: Drawn conclusions provide exactness of translation on creating linguistic database of machine translation. created linguistic database of phrasal verbs, morphological lexicon, affixes of English and Uzbek languages and their morphological and syntactic models; identified coordination of simple sentence models for automatic translation; substantiated principles that have been created for analyzing morphological, syntactic-semantic texts of machine translation; worked out recommendations of coordinating paradigmatic attitudes on creating principles for electron dictionary and software for linguistic data...

Research paper thumbnail of MorphUz: Morphological Analyzer for the Uzbek Language

2022 7th International Conference on Computer Science and Engineering (UBMK)

Research paper thumbnail of Theoritical Moethodological Foundations of Creation Learner Dictionary

This article analyzes the views of scholars on the creation of learner dictionaries. According to... more This article analyzes the views of scholars on the creation of learner dictionaries. According to this, the principles of creating a dictionary such as the definition of the lexical minimum, the study of audience demand, the creation of article content through computer technology tools and corpus-based statistical analysis are theoretically studied in this paper. The specificity, structure, goals and objectives of these dictionaries are important in the creation of theoretical and methodological bases of educational lexicography of the Uzbek language in the example of practical research in this field in Russian and English.

Research paper thumbnail of UZWORDNET: A Lexical-Semantic Database for the Uzbek Language

Proceedings of the 11th International Global Wordnet Conference (GWC-2021), 2021

The results reported in this paper aim to increase the presence of the Uzbek language in the Inte... more The results reported in this paper aim to increase the presence of the Uzbek language in the Internet and its usability within IT applications. We describe the initial development of a “word-net” for the Uzbek language compatible to Princeton WordNet. We called it UZWORDNET. In the current version, UZWORDNET contains 28140 synsets, 64389 sense and 20683 words; its estimated accuracy is 75.98%. To the best of our knowledge, it is the largest wordnet for Uzbek existing to date, and the second wordnet developed overall.

Research paper thumbnail of Formal-Functional Models of the Uzbek Electron Corpus

The paper is devoted to the structure and its linguistic annotation for building Uzbek Corpus. Li... more The paper is devoted to the structure and its linguistic annotation for building Uzbek Corpus. Linguistic annotation, metadata and corpus manager as formal-functional model of the corpus are important for usage for many purposes. The fact that the platform allows users to address language and literature issues, use it online. The Uzbek corpus based on structural and sub corpus models, which partially represented in this paper, is going on process to develop Uzbek language technology.

Research paper thumbnail of Linguistic functionality of Uzbek Electron Corpus: uzbekcorpus.uz

2021 International Conference on Information Science and Communications Technologies (ICISCT)

Research paper thumbnail of Dependency Parsing Based On Uzbek Corpus

Syntactic parsing is crucial stage among existing different types of parsing methods in the field... more Syntactic parsing is crucial stage among existing different types of parsing methods in the field of NLP. Syntactic parsing assists to identify the type sentence and word combinations that represented grammatical relations of the words. However, there are various grammatical features of the languages, almost all languages follow common linguistic rules. The Uzbek language belongs to agglutinative language family based on free constituent order language in syntax. Our investigations show that morphological aspect of word forms plays an essential role to identify and compose syntactic relations for the Uzbek language. Given morphological and lexical information can solve the some problems which connecting with syntactic parsing as well. Our article represents some main point of views the stages of parsing on CoNLLU format based on Uzbek corpus analysis. Tabbiy tilni qayta ishlashda turli tahlil qilish metodlari orasida sintaktik analiz qilish muhim sanaladi. Sintaktik analiz tilning g...

Research paper thumbnail of O'zbek tili elektron korpusining kompyuter modellari

Research paper thumbnail of Personal Names Spell-Checking – a Study Related to Uzbek

The Journal of social sciences and humanities, 2018

Objective: In the paper we describe the development process of the dictionary of Uzbek names and ... more Objective: In the paper we describe the development process of the dictionary of Uzbek names and surnames. Methodology: The dictionary is created to support the identification of personal names in Uzbek texts, and to aid the spell-checking of texts written in Uzbek. Results: Apart from discussing the development process, we also evaluate the dictionary by performing a set of experiments. Conclusion: We verify whether the information collected in the dictionary can be successfully used to find and, if needed, correct the misspelled names and surnames.

Research paper thumbnail of First Results of the TurkLang-7 Project: Creating Russian-Turkic Parallel Corpora and MT Systems

The idea of the “TurkLang-7” project is to create datasets and neural machine translation systems... more The idea of the “TurkLang-7” project is to create datasets and neural machine translation systems for a set of Russian-Turkic low-resource language pairs. It is planned to achieve this goal through a hybrid approach to the creation of a multilingual parallel corpus between Russian and Turkic languages, studying the applicability and effectiveness of neural network learning methods (transfer learning, multi-task learning, back-translation, dual learning) in the context of the selected language pairs, as well as the development of specialized methods for the unification of parallel data in different languages, based on the agglutinative nature of the selected Turkic languages (structural and functional model of the Turkic morpheme). In this paper, we describe the main stages of work on this project and the results of the first year: we developed a semiautomatic process for creating parallel corpora, collected data from several sources on 7 Turkic languages, and conducted the first exp...

Research paper thumbnail of Development of Intellectual Web System for Morph Analyzing of Uzbek Words

Applied Sciences

Currently, there is an active development of the Uzbek sector of the Internet. In it, as in other... more Currently, there is an active development of the Uzbek sector of the Internet. In it, as in other national sectors, the most common form of presentation of textual information is semi-structured documents, work that presupposes the availability of reliable algorithms for text analysis, including its lexical characteristics. The article offers an intelligent web application developed for morphological analysis of words in the Uzbek language. The web application is based on the concept of generation and stem analysis of the Uzbek language word forms. A well-known Porter algorithm was chosen as the basis for stemming. The morphoanalyzer generates word forms of the Uzbek language based on the division of words into certain classes, taking into account the specifics and structure of this language. For example, nouns can be classified by meaning (related, nominal), by quantity (singular and plural), by case, and also, by the endings of belonging (possessive).

Research paper thumbnail of Korpus lingvistikasi

Globe, 2023

Ushbu darslik 70230801 – Kompyuter lingvistikasi magistratura mutaxassisligidagi magistrlarga mo‘... more Ushbu darslik 70230801 – Kompyuter lingvistikasi magistratura mutaxassisligidagi magistrlarga mo‘ljallangan bo‘lib, o‘zbek tili elektron korpusini konseptologik va strukturaviy loyihalashda xorijiy tajriba amaliyotini o‘rganish, tilning lingvistik korpusini yaratishda morfologik va sintaktik teglash va tahlil qilishning FST va UdPipe kabi avtomatik usullarini o‘zbek tiliga tatbiq qilish orqali lingvistik algoritmni tuzish hamda lisoniy modellarini mashina tiliga o‘tkazish, matn fragmentining reprezentativligi va qidiriv birliklari (lemma va token)ni tahlil qilish uchun matn korpusining lingvistik va dasturiy ta’minotini tuzish, o‘zbek tili uchun korpus yaratish texnologiyalari va metodlarini lingvistik instrumentariylar yordamida amalga oshirish, korpus menejerining formal-funksional modellari asosida korpus interfeysini shakllantirishga oid bilim va ko‘nikmalarni nazariy va amaliy jihatdan shakllantirishga yordam beradi.