Rositsa Dekova | University of Plovdiv (original) (raw)

Papers by Rositsa Dekova

Research paper thumbnail of Author Correction: Spatial communication systems across languages reflect universal action constraints

Nature Human Behaviour, Jan 8, 2024

In the version of the article initially published, in Table 1, "Bulgarian" previously read "Bulga... more In the version of the article initially published, in Table 1, "Bulgarian" previously read "Bulgaria", and "Bulgaria" previously read "Eurasia". These errors have been corrected in the HTML and PDF versions of the article.

Research paper thumbnail of Spatial communication systems across languages reflect universal action constraints

Nature Human Behaviour, Oct 29, 2023

Speakers of different (spoken) languages share the same perceptual apparatus, so one might expect... more Speakers of different (spoken) languages share the same perceptual apparatus, so one might expect that the world's 7,000 or so living languages 1 may have evolved communication systems that also share common properties 2,3. Yet the idea that there are universals in communication systems has been challenged with studies documenting extensive cross-linguistic variation in domains closely yoked to perception, including colour naming 4-8 and spatial communication 9-11. Spatial communication is an important test case; as Evans and Levinson note in 'The myth of language universals', "spatial cognition is fundamental to any animal, and therefore if Fodor is right anywhere [that languages directly encode the categories we think in], it should be here" 12 (p. 436). However, extensive cross-linguistic variation has been

Research paper thumbnail of Автореферат на дисератaционен труд "Лексикално кодиране на глаголи в български и английски"

В рамките на дисертационния труд се прави задълбочен анализ на елементите от семантичната репрезе... more В рамките на дисертационния труд се прави задълбочен анализ на елементите от семантичната репрезентация на глаголите в съответствие с тяхната явна синтактична реализация и се представят варианти за лексикалното кодиране на тази информация както в рамките на един език, така и в съпоставка между различни езици.
Изследването обхваща множество феномени, възникващи в процеса на изразяване на концептуалната структура в синтаксиса. Анализите са основани на емпирични данни и експериментални резултати от два индоевропейски езика – български и английски.
За целите на изследването са подбрани групи от глаголи, представящи няколко основни глаголни класа в български и английски, като техните семантичните характеристики са изследвани в тясна връзка със синтактичното им разпространение. Основно внимание е обърнато на подгрупите на така наречения клас Глаголи на контакт чрез удар (дефиниран в Левин, 1993), както и на глаголи, включващи движение (в класификацията на Левин те спадат към групата на Глаголи за хвърляне). Тези глаголи представляват особен интерес поради разнообразието на алтернации, които позволяват, както и заради ограниченията, които налагат върху синтактичните си обкръжения.
В дисертацията са използвани анализи на корпусни данни и резултатите от компютърно базирани лингвистични изследвания, които да послужат за надежден източник при определянето на информацията, кодирана в глаголите, както и за тестване на начините, по които носителите на езика си служат с тази информация при речева продукция.

Research paper thumbnail of Electronic Corpora as a Research Tool-Possibilities and Prospects

Automatic excerption of language material from electronic corpora provides great opportunities fo... more Automatic excerption of language material from electronic corpora provides great opportunities for research. It facilitates the excerption, accelerates the finding of examples and refines them. In this report we present the search options in the Bulgarian National Corpus, as it is freely available, morphologically annotated and balanced in terms of genre and theme of the texts, as it contains texts from different periods in the range of 100 years. We limited the search to find examples of the syntactic structure Small clause, since its specificity illustrates well both the advantages and the limitations of the automatic search. After a series of experiments with different search queries for specific types of small clauses, depending on the expression and its syntax, we came to the conclusion that automatic excerption of linguistic material offers a number of advantages and opportunities, but it has its limits, difficulties and prospects.

Research paper thumbnail of Spatial communication systems across languages reflect universal action constraints

Nature Human Behaviour, 2023

The extent to which languages share properties reflecting the non-linguistic constraints of the s... more The extent to which languages share properties reflecting the non-linguistic constraints of the speakers who speak them is key to the debate regarding the relationship between language and cognition. A critical case is spatial communication, where it has been argued that semantic universals should exist, if anywhere. Here, using an experimental paradigm able to separate variation within a language from variation between languages, we tested the use of spatial demonstratives—
the most fundamental and frequent spatial terms across languages. In n = 874 speakers across 29 languages, we show that speakers of all tested languages use spatial demonstratives as a function of being able to reach or act on an object being referred to. In some languages, the position of the addressee is also relevant in selecting between demonstrative forms. Commonalities and differences across languages in spatial communication can be understood in terms of universal constraints on action shaping spatial language and cognition.

Research paper thumbnail of УТВЪРДИТЕЛНО "ДА БЕ, ДА"? ДА БЕ, ДА!

Доклади от Юбилейна научна сесия „Съвременни тенденции в езиковедските изследвания“ (посветена на 85 години от рождението на проф. д.ф.н. Йордан Пенчев), 2017

This paper discusses the hypothesis that the Bulgarian expression da be, da may express either af... more This paper discusses the hypothesis that the Bulgarian expression da be, da may express either affirmation or negation (expressed by irony), depending on the context and the intonational contour of the phrase. An experiment was carried out in which the informants were recorded reading out short exchanges of both types, and the intonation contours of the phrase have been analysed.

Research paper thumbnail of Интелигентнa система за онтологично представяне на българските диалекти: онтологичен модул

Research paper thumbnail of Electronic Corpora as a Research Tool-Possibilities and Prospects

Automatic excerption of language material from electronic corpora provides great opportunities fo... more Automatic excerption of language material from electronic corpora provides great opportunities for research. It facilitates the excerption, accelerates the finding of examples and refines them. In this report we present the search options in the Bulgarian National Corpus, as it is freely available, morphologically annotated and balanced in terms of genre and theme of the texts, as it contains texts from different periods in the range of 100 years. We limited the search to find examples of the syntactic structure Small clause, since its specificity illustrates well both the advantages and the limitations of the automatic search. After a series of experiments with different search queries for specific types of small clauses, depending on the expression and its syntax, we came to the conclusion that automatic excerption of linguistic material offers a number of advantages and opportunities, but it has its limits, difficulties and prospects.

Research paper thumbnail of Teaching English Conditional Clauses through Language Resources

The categorization and usage of conditional clauses has drawn the interest of generations of ling... more The categorization and usage of conditional clauses has drawn the interest of generations of linguists working in various linguistic frameworks. Introducing the students to the academic studies on this topic is a compulsory prerequisite, but it should be properly balanced with their own research experience and that is where language resources fit in. The paper focuses on conditional clauses presented through lexical-semantic databases such as FrameNet and a number of freely available English corpora-The British National Corpus (BNC), The Corpus of Contemporary American (COCA), The News on the Web Corpus (NOW), etc. Practical applications The paper presents an innovative approach to teaching conditional sentences in an English Practice Class. The main objective is to implement language resources such as lexical-semantic databases and corpora in order to enhance the students' analytical abilities and improve their language competency. In addition, students can apply this approach ...

Research paper thumbnail of Electronic Language Resources in Teaching Mathematical Linguistics

Research paper thumbnail of Application of Clause Alignment for Statistical Machine Translation

The paper presents a new resource light flexible method for clause alignment which combines the G... more The paper presents a new resource light flexible method for clause alignment which combines the Gale-Church algorithm with internally collected textual information. The method does not resort to any pre-developed linguistic resources which makes it very appropriate for resource light clause alignment. We experiment with a combination of the method with the original Gale-Church algorithm (1993) applied for clause alignment. The performance of this flexible method, as it will be referred to hereafter, is measured over a specially designed test corpus. The clause alignment is explored as means to provide improved training data for the purposes of Statistical Machine Translation (SMT). A series of experiments with Moses demonstrate ways to modify the parallel resource and effects on translation quality: (1) baseline training with a Bulgarian-English parallel corpus aligned at sentence level; (2) training based on parallel clause pairs; (3) training with clause reordering, where clauses ...

Research paper thumbnail of Bulgarian X-language Parallel Corpus

The paper presents the methodology and the outcome of the compilation and the processing of the B... more The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling – general and focused). Certain levels of annotation applied to Bul-X-Cor are taken as obligatory (sentence segmentation and sentence alignment), while others depend on the availability of tools for a particular language (morpho-syntactic tagging, lemmatisat...

Research paper thumbnail of Bulgarian X-language Parallel Corpus

The paper presents the methodology and the outcome of the compilation and the processing of the B... more The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling ― general and focused). Certain levels of annotation applied to Bul-X-Cor are taken as obligatory (sentence segmentation and sentence alignment), while others depend on the availability of tools for a particular language (morpho-syntactic tagging, lemmatisat...

Research paper thumbnail of Bulgarian-English Sentence- and Clause-Aligned Corpus

The paper presents the partially automatically annotated and fully manually validated Bulgarian-E... more The paper presents the partially automatically annotated and fully manually validated Bulgarian-English Sentenceand Clause-Aligned Corpus. The discussion covers the motivation behind the corpus development, the structure and content of the corpus, illustrated by statistical data, the segmentation and alignment strategy and the tools used in the corpus processing. The paper sketches the principles of clause annotation adopted in the creation of the corpus and addresses some issues related to interlingual asymmetry. The paper concludes with an outline of some applications of the corpus in the field of computational linguistics.

Research paper thumbnail of The Ontology of Bulgarian Dialects – Architecture and Information Retrieval

Following a concise description of the structure, the paper focuses on the potential of the Ontol... more Following a concise description of the structure, the paper focuses on the potential of the Ontology of the Bulgarian Dialects, which demonstrates a novel usage of the ontological modelling for the purposes of dialect digital archiving and information processing. The ontology incorporates information on the dialects of the Bulgarian language and includes data from 84 dialects, spoken not only on the territory of the Republic of Bulgaria, but also abroad. It encodes both their geographical distribution and some of their main diagnostic features, such as the different mutations (also referred to as reflexes) of some of the Old Bulgarian vowels. The mutations modelled so far in the ontology include the reflex of the back nasal vowel /ѫ/ under stress, the reflex of the back er vowel /ъ/ under stress, and the reflex of the yat vowel /ѣ/ under stress when it precedes a syllable with a back vowel. Besides the opportunity for formal structuring of the considerable amount of data gathered th...

Research paper thumbnail of Application of Clause Alignment for Statistical Machine Translation

The paper presents a new resource light flexible method for clause alignment which combines the G... more The paper presents a new resource light flexible method for clause alignment which combines the Gale-Church algorithm with internally collected textual information. The method does not resort to any pre-developed linguistic resources which makes it very appropriate for resource light clause alignment. We experiment with a combination of the method with the original Gale-Church algorithm (1993) applied for clause alignment. The performance of this flexible method, as it will be referred to hereafter, is measured over a specially designed test corpus. The clause alignment is explored as means to provide improved training data for the purposes of Statistical Machine Translation (SMT). A series of experiments with Moses demonstrate ways to modify the parallel resource and effects on translation quality: (1) baseline training with a Bulgarian-English parallel corpus aligned at sentence level; (2) training based on parallel clause pairs; (3) training with clause reordering, where clauses ...

Research paper thumbnail of Lexical Encoding of Verbs in English and Bulgarian

Research paper thumbnail of Lexical encoding of verbs in English and Bulgarian

Research paper thumbnail of The Ontology of Bulgarian Dialects -architecture and information retrieval

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Following a concise description of the structure, the paper focuses on the potential of the Ontol... more Following a concise description of the structure, the paper focuses on the potential of the Ontology of the Bulgarian Dialects, which demonstrates a novel usage of the ontological modelling for the purposes of dialect digital archiving and information processing. The ontology incorporates information on the dialects of the Bulgarian language and includes data from 84 dialects, spoken not only on the territory of the Republic of Bulgaria, but also abroad. It encodes both their geographical distribution and some of their main diagnostic features, such as the different mutations (also referred to as reflexes) of some of the Old Bulgarian vowels. The mutations modelled so far in the ontology include the reflex of the back nasal vowel /ѫ/ under stress, the reflex of the back er vowel /ъ/ under stress, and the reflex of the yat vowel /ѣ/ under stress when it precedes a syllable with a back vowel. Besides the opportunity for formal structuring of the considerable amount of data gathered through the years by dialectologists, the ontology also provides numerous possibilities for information retrieval-searches by dialect, country, dialect region, city or village, various combinations of diagnostic features.

Research paper thumbnail of Introducing Computational Linguistics and NLP to High School Students

Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), 2018

The paper addresses a possible way of introducing core concepts of Computational Linguistics thro... more The paper addresses a possible way of introducing core concepts of Computational Linguistics through problems given at the linguistic contests organized for high school students in Bulgaria and abroad. Following a brief presentation of the foundation and the underlying objective of these contests, we outline some of the types of problems as reflecting the different levels of language processing and the diversity of approaches and tasks to be solved. By presenting the variety of problems given so far through the years, we would like to attract the attention of the academic community to this captivating method through which high school students might be acquainted with the challenges and the main goals of Computational Linguistics (CL) and Natural Language Processing (NLP).

Research paper thumbnail of Author Correction: Spatial communication systems across languages reflect universal action constraints

Nature Human Behaviour, Jan 8, 2024

In the version of the article initially published, in Table 1, "Bulgarian" previously read "Bulga... more In the version of the article initially published, in Table 1, "Bulgarian" previously read "Bulgaria", and "Bulgaria" previously read "Eurasia". These errors have been corrected in the HTML and PDF versions of the article.

Research paper thumbnail of Spatial communication systems across languages reflect universal action constraints

Nature Human Behaviour, Oct 29, 2023

Speakers of different (spoken) languages share the same perceptual apparatus, so one might expect... more Speakers of different (spoken) languages share the same perceptual apparatus, so one might expect that the world's 7,000 or so living languages 1 may have evolved communication systems that also share common properties 2,3. Yet the idea that there are universals in communication systems has been challenged with studies documenting extensive cross-linguistic variation in domains closely yoked to perception, including colour naming 4-8 and spatial communication 9-11. Spatial communication is an important test case; as Evans and Levinson note in 'The myth of language universals', "spatial cognition is fundamental to any animal, and therefore if Fodor is right anywhere [that languages directly encode the categories we think in], it should be here" 12 (p. 436). However, extensive cross-linguistic variation has been

Research paper thumbnail of Автореферат на дисератaционен труд "Лексикално кодиране на глаголи в български и английски"

В рамките на дисертационния труд се прави задълбочен анализ на елементите от семантичната репрезе... more В рамките на дисертационния труд се прави задълбочен анализ на елементите от семантичната репрезентация на глаголите в съответствие с тяхната явна синтактична реализация и се представят варианти за лексикалното кодиране на тази информация както в рамките на един език, така и в съпоставка между различни езици.
Изследването обхваща множество феномени, възникващи в процеса на изразяване на концептуалната структура в синтаксиса. Анализите са основани на емпирични данни и експериментални резултати от два индоевропейски езика – български и английски.
За целите на изследването са подбрани групи от глаголи, представящи няколко основни глаголни класа в български и английски, като техните семантичните характеристики са изследвани в тясна връзка със синтактичното им разпространение. Основно внимание е обърнато на подгрупите на така наречения клас Глаголи на контакт чрез удар (дефиниран в Левин, 1993), както и на глаголи, включващи движение (в класификацията на Левин те спадат към групата на Глаголи за хвърляне). Тези глаголи представляват особен интерес поради разнообразието на алтернации, които позволяват, както и заради ограниченията, които налагат върху синтактичните си обкръжения.
В дисертацията са използвани анализи на корпусни данни и резултатите от компютърно базирани лингвистични изследвания, които да послужат за надежден източник при определянето на информацията, кодирана в глаголите, както и за тестване на начините, по които носителите на езика си служат с тази информация при речева продукция.

Research paper thumbnail of Electronic Corpora as a Research Tool-Possibilities and Prospects

Automatic excerption of language material from electronic corpora provides great opportunities fo... more Automatic excerption of language material from electronic corpora provides great opportunities for research. It facilitates the excerption, accelerates the finding of examples and refines them. In this report we present the search options in the Bulgarian National Corpus, as it is freely available, morphologically annotated and balanced in terms of genre and theme of the texts, as it contains texts from different periods in the range of 100 years. We limited the search to find examples of the syntactic structure Small clause, since its specificity illustrates well both the advantages and the limitations of the automatic search. After a series of experiments with different search queries for specific types of small clauses, depending on the expression and its syntax, we came to the conclusion that automatic excerption of linguistic material offers a number of advantages and opportunities, but it has its limits, difficulties and prospects.

Research paper thumbnail of Spatial communication systems across languages reflect universal action constraints

Nature Human Behaviour, 2023

The extent to which languages share properties reflecting the non-linguistic constraints of the s... more The extent to which languages share properties reflecting the non-linguistic constraints of the speakers who speak them is key to the debate regarding the relationship between language and cognition. A critical case is spatial communication, where it has been argued that semantic universals should exist, if anywhere. Here, using an experimental paradigm able to separate variation within a language from variation between languages, we tested the use of spatial demonstratives—
the most fundamental and frequent spatial terms across languages. In n = 874 speakers across 29 languages, we show that speakers of all tested languages use spatial demonstratives as a function of being able to reach or act on an object being referred to. In some languages, the position of the addressee is also relevant in selecting between demonstrative forms. Commonalities and differences across languages in spatial communication can be understood in terms of universal constraints on action shaping spatial language and cognition.

Research paper thumbnail of УТВЪРДИТЕЛНО "ДА БЕ, ДА"? ДА БЕ, ДА!

Доклади от Юбилейна научна сесия „Съвременни тенденции в езиковедските изследвания“ (посветена на 85 години от рождението на проф. д.ф.н. Йордан Пенчев), 2017

This paper discusses the hypothesis that the Bulgarian expression da be, da may express either af... more This paper discusses the hypothesis that the Bulgarian expression da be, da may express either affirmation or negation (expressed by irony), depending on the context and the intonational contour of the phrase. An experiment was carried out in which the informants were recorded reading out short exchanges of both types, and the intonation contours of the phrase have been analysed.

Research paper thumbnail of Интелигентнa система за онтологично представяне на българските диалекти: онтологичен модул

Research paper thumbnail of Electronic Corpora as a Research Tool-Possibilities and Prospects

Automatic excerption of language material from electronic corpora provides great opportunities fo... more Automatic excerption of language material from electronic corpora provides great opportunities for research. It facilitates the excerption, accelerates the finding of examples and refines them. In this report we present the search options in the Bulgarian National Corpus, as it is freely available, morphologically annotated and balanced in terms of genre and theme of the texts, as it contains texts from different periods in the range of 100 years. We limited the search to find examples of the syntactic structure Small clause, since its specificity illustrates well both the advantages and the limitations of the automatic search. After a series of experiments with different search queries for specific types of small clauses, depending on the expression and its syntax, we came to the conclusion that automatic excerption of linguistic material offers a number of advantages and opportunities, but it has its limits, difficulties and prospects.

Research paper thumbnail of Teaching English Conditional Clauses through Language Resources

The categorization and usage of conditional clauses has drawn the interest of generations of ling... more The categorization and usage of conditional clauses has drawn the interest of generations of linguists working in various linguistic frameworks. Introducing the students to the academic studies on this topic is a compulsory prerequisite, but it should be properly balanced with their own research experience and that is where language resources fit in. The paper focuses on conditional clauses presented through lexical-semantic databases such as FrameNet and a number of freely available English corpora-The British National Corpus (BNC), The Corpus of Contemporary American (COCA), The News on the Web Corpus (NOW), etc. Practical applications The paper presents an innovative approach to teaching conditional sentences in an English Practice Class. The main objective is to implement language resources such as lexical-semantic databases and corpora in order to enhance the students' analytical abilities and improve their language competency. In addition, students can apply this approach ...

Research paper thumbnail of Electronic Language Resources in Teaching Mathematical Linguistics

Research paper thumbnail of Application of Clause Alignment for Statistical Machine Translation

The paper presents a new resource light flexible method for clause alignment which combines the G... more The paper presents a new resource light flexible method for clause alignment which combines the Gale-Church algorithm with internally collected textual information. The method does not resort to any pre-developed linguistic resources which makes it very appropriate for resource light clause alignment. We experiment with a combination of the method with the original Gale-Church algorithm (1993) applied for clause alignment. The performance of this flexible method, as it will be referred to hereafter, is measured over a specially designed test corpus. The clause alignment is explored as means to provide improved training data for the purposes of Statistical Machine Translation (SMT). A series of experiments with Moses demonstrate ways to modify the parallel resource and effects on translation quality: (1) baseline training with a Bulgarian-English parallel corpus aligned at sentence level; (2) training based on parallel clause pairs; (3) training with clause reordering, where clauses ...

Research paper thumbnail of Bulgarian X-language Parallel Corpus

The paper presents the methodology and the outcome of the compilation and the processing of the B... more The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling – general and focused). Certain levels of annotation applied to Bul-X-Cor are taken as obligatory (sentence segmentation and sentence alignment), while others depend on the availability of tools for a particular language (morpho-syntactic tagging, lemmatisat...

Research paper thumbnail of Bulgarian X-language Parallel Corpus

The paper presents the methodology and the outcome of the compilation and the processing of the B... more The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent in terms of compilation methodology, text representation, metadata description and annotation conventions. The approaches implemented in the construction of Bul-X-Cor include using readily available text collections on the web, manual compilation (by means of Internet browsing) and preferably automatic compilation (by means of web crawling ― general and focused). Certain levels of annotation applied to Bul-X-Cor are taken as obligatory (sentence segmentation and sentence alignment), while others depend on the availability of tools for a particular language (morpho-syntactic tagging, lemmatisat...

Research paper thumbnail of Bulgarian-English Sentence- and Clause-Aligned Corpus

The paper presents the partially automatically annotated and fully manually validated Bulgarian-E... more The paper presents the partially automatically annotated and fully manually validated Bulgarian-English Sentenceand Clause-Aligned Corpus. The discussion covers the motivation behind the corpus development, the structure and content of the corpus, illustrated by statistical data, the segmentation and alignment strategy and the tools used in the corpus processing. The paper sketches the principles of clause annotation adopted in the creation of the corpus and addresses some issues related to interlingual asymmetry. The paper concludes with an outline of some applications of the corpus in the field of computational linguistics.

Research paper thumbnail of The Ontology of Bulgarian Dialects – Architecture and Information Retrieval

Following a concise description of the structure, the paper focuses on the potential of the Ontol... more Following a concise description of the structure, the paper focuses on the potential of the Ontology of the Bulgarian Dialects, which demonstrates a novel usage of the ontological modelling for the purposes of dialect digital archiving and information processing. The ontology incorporates information on the dialects of the Bulgarian language and includes data from 84 dialects, spoken not only on the territory of the Republic of Bulgaria, but also abroad. It encodes both their geographical distribution and some of their main diagnostic features, such as the different mutations (also referred to as reflexes) of some of the Old Bulgarian vowels. The mutations modelled so far in the ontology include the reflex of the back nasal vowel /ѫ/ under stress, the reflex of the back er vowel /ъ/ under stress, and the reflex of the yat vowel /ѣ/ under stress when it precedes a syllable with a back vowel. Besides the opportunity for formal structuring of the considerable amount of data gathered th...

Research paper thumbnail of Application of Clause Alignment for Statistical Machine Translation

The paper presents a new resource light flexible method for clause alignment which combines the G... more The paper presents a new resource light flexible method for clause alignment which combines the Gale-Church algorithm with internally collected textual information. The method does not resort to any pre-developed linguistic resources which makes it very appropriate for resource light clause alignment. We experiment with a combination of the method with the original Gale-Church algorithm (1993) applied for clause alignment. The performance of this flexible method, as it will be referred to hereafter, is measured over a specially designed test corpus. The clause alignment is explored as means to provide improved training data for the purposes of Statistical Machine Translation (SMT). A series of experiments with Moses demonstrate ways to modify the parallel resource and effects on translation quality: (1) baseline training with a Bulgarian-English parallel corpus aligned at sentence level; (2) training based on parallel clause pairs; (3) training with clause reordering, where clauses ...

Research paper thumbnail of Lexical Encoding of Verbs in English and Bulgarian

Research paper thumbnail of Lexical encoding of verbs in English and Bulgarian

Research paper thumbnail of The Ontology of Bulgarian Dialects -architecture and information retrieval

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Following a concise description of the structure, the paper focuses on the potential of the Ontol... more Following a concise description of the structure, the paper focuses on the potential of the Ontology of the Bulgarian Dialects, which demonstrates a novel usage of the ontological modelling for the purposes of dialect digital archiving and information processing. The ontology incorporates information on the dialects of the Bulgarian language and includes data from 84 dialects, spoken not only on the territory of the Republic of Bulgaria, but also abroad. It encodes both their geographical distribution and some of their main diagnostic features, such as the different mutations (also referred to as reflexes) of some of the Old Bulgarian vowels. The mutations modelled so far in the ontology include the reflex of the back nasal vowel /ѫ/ under stress, the reflex of the back er vowel /ъ/ under stress, and the reflex of the yat vowel /ѣ/ under stress when it precedes a syllable with a back vowel. Besides the opportunity for formal structuring of the considerable amount of data gathered through the years by dialectologists, the ontology also provides numerous possibilities for information retrieval-searches by dialect, country, dialect region, city or village, various combinations of diagnostic features.

Research paper thumbnail of Introducing Computational Linguistics and NLP to High School Students

Proceedings of the Third International Conference Computational Linguistics in Bulgaria (CLIB 2018), 2018

The paper addresses a possible way of introducing core concepts of Computational Linguistics thro... more The paper addresses a possible way of introducing core concepts of Computational Linguistics through problems given at the linguistic contests organized for high school students in Bulgaria and abroad. Following a brief presentation of the foundation and the underlying objective of these contests, we outline some of the types of problems as reflecting the different levels of language processing and the diversity of approaches and tasks to be solved. By presenting the variety of problems given so far through the years, we would like to attract the attention of the academic community to this captivating method through which high school students might be acquainted with the challenges and the main goals of Computational Linguistics (CL) and Natural Language Processing (NLP).

Research paper thumbnail of Computational Linguistics Problems for High School Students

A brief overview of some types of computational linguistics problems for High School Students (in... more A brief overview of some types of computational linguistics problems for High School Students (in Bulgarian)