Voula Giouli | Aristotle University of Thessaloniki (original) (raw)
Papers by Voula Giouli
In this paper, we describe work in progress for the development of a named entity recognizer for ... more In this paper, we describe work in progress for the development of a named entity recognizer for Greek. The system aims at information extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy have been the basic guidelines for the system"s design. Our system is an automated pipeline of linguistic components for Greek text processing based on pattern matching techniques. Nonrecursive regular expressions have been implemented on top of it in order to capture different types of named entities. Α corpus of financial texts has been collected from several web sources and has been manually annotated in order to be used for development and testing purposes. Overall precision and recall are 86% and 81% respectively.
ERCIM News 136, 2024
Our research is aimed at employing Large Language Models (LLMs) to build a multi-dimensional para... more Our research is aimed at employing Large Language Models (LLMs) to build a multi-dimensional paraphrase tool in view of enhancing reading comprehension and writing skills of learners of Greek as their mother tongue, second or foreign language (L1, L2/FL). The idea is to integrate
ChatGPT into classroom settings, thus turning Generative AI from a threat to an assistant.
Slovenščina 2.0: Empirične, Aplikativne in Interdisciplinarne Raziskave, Nov 13, 2019
The authors report on a recent survey on monolingual dictionaries available on the Greek market. ... more The authors report on a recent survey on monolingual dictionaries available on the Greek market. General dictionaries outnumber spelling and educational ones and enjoy a prestigious status. Only one general dictionary is digitally born and only two are available through the web, but several are available as CDs. Most of the prestigious dictionaries have received public funding but not all. Lexicography is well considered in Greece where printed dictionaries seem to still have the lead.
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Multiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic... more Multiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as "words with spaces". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-millionword annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
Les émotions dans le discours- Emotions in Discourse, 2000
ABSTRACT This paper presents corpus work aimed at manually annotating sentiment expressions in th... more ABSTRACT This paper presents corpus work aimed at manually annotating sentiment expressions in the Greek (EL) sub–corpus of a corpus of movies coupled with both orthographic (EN) transcriptions and subtitles in (EL) and (ES). Our effort involves the treatment of emotion predicates and emotion–related concepts in naturally occurring texts and their integration into an existing lexical resource that is being re–designed and re–focused so as to ultimately form an Ontology of Emotions. This ontology will be appropriate for both lexicographic and computational approaches to sentiment. Emotion polarity and intensity was also studied based on corpus data.
ABSTRACT We hereby present work aimed at giving an account of Greek verbs denoting emotion that i... more ABSTRACT We hereby present work aimed at giving an account of Greek verbs denoting emotion that is placed within a larger context, aimed towards defining and describing the semantic field of emotions by means of identifying, selecting, classifying and organizing a core lexicon of emotions in a conceptual Data Base. The ultimate goal is the exhaustive description of Modern Greek and the development of a wide-coverage lexical resource that will be appropriate for a range of Natural Language Processing Applications.
Lrec, 2008
The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textua... more The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was collected in view of a range of NLP applications, where semantic interpretation is of paramount importance, and it was manually annotated at the level of Textual Entailment. Moreover, a number of linguistic annotations were also integrated that were deemed useful for prospect system developers. The critical issue was the development of a final resource that is re-usable and adaptable to different NLP systems, in order to either enhance their accuracy or to evaluate their output. We are hereby focusing on the methodological issues underpinning data selection and annotation. An initial approach towards the development of a system catering for the automatic Recognition of Textual Entailment in Greek is also presented and preliminary results are reported.
Language Technology for Cultural Heritage, 2011
There has been a long tradition in the digitization and manual documentation of cultural heritage... more There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the ...
Abstract: This paper reports on work aimed at (a) developing an application tailored to integrate... more Abstract: This paper reports on work aimed at (a) developing an application tailored to integrate and highlight textual cultural resources that, as of yet, remain underexploited, and (b) creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of Greece and Bulgaria (the focus being on the neighboring areas) and to raise awareness about their common cultural identity, emphasizing on literature, folklore and language. To this end, a ...
This paper presents an ongoing effort work focusing on the development of an audiovisual corpus r... more This paper presents an ongoing effort work focusing on the development of an audiovisual corpus resource and its annotation in terms of sentiments and opinions. A modular annotation schema has been employed based on the specifications of existing schemas and extending or adapting them to cater for the peculiarities of the corpus-specific data. Keywords: annotation of emotion, annotation of opinion, movies corpus
repository.dlsi.ua.es
The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textua... more The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was ...
Bulagaria: Insitute for Language and Speech …, 2002
Language Technology for Cultural Heritage, 2011
There has been a long tradition in the digitization and manual documentation of cultural heritage... more There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the ...
Language Technology for Cultural Heritage, 2011
There has been a long tradition in the digitization and manual documentation of cultural heritage... more There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the ...
Lecture Notes in Computer Science, 2000
... 1 Institute for Language and Speech Processing Artemidos 6 &a... more ... 1 Institute for Language and Speech Processing Artemidos 6 & Epidavrou, 151 25, Athens, Greece tel: +301 6875300, fax: +301 6854270 {sboutsis,iason,voula,xaris,spip}@ilsp.gr 2 National Technical University of Athens 3 Cambridge University ml257@cam.ac.uk Abstract. ...
Lecture Notes in Computer Science, 2000
... 1 Institute for Language and Speech Processing Artemidos 6 &a... more ... 1 Institute for Language and Speech Processing Artemidos 6 & Epidavrou, 151 25, Athens, Greece tel: +301 6875300, fax: +301 6854270 {sboutsis,iason,voula,xaris,spip}@ilsp.gr 2 National Technical University of Athens 3 Cambridge University ml257@cam.ac.uk Abstract. ...
This paper describes work aimed at the creation of a lexical resource that incorporates multiword... more This paper describes work aimed at the creation of a lexical resource that incorporates multiword expressions, that is word combinations with lexical, morphological, syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Sag et al, 2002) that exceed the boundaries of single lexical units. We present the methodology adopted for identifying and manually annotating multiword expressions in naturally-occurring texts. The focus will be on the typology of the identified expressions, their semantic properties, and their encoding in the lexical resource.
Abstract: In this paper we present the Dictionary for Junior High School students, a monolingual ... more Abstract: In this paper we present the Dictionary for Junior High School students, a monolingual Greek dictionary targeted at young native language learners, that has been compiled in the framework of a national project for the production of new teaching material. After presenting the specifications set by the Ministry of Education, we discuss the main methodological principles that underlie its construction and elaborate on the main features of the dictionary. In particular, we report on the macrostructure of the dictionary (the ...
Proceedings of the 10th Workshop on Multiword Expressions (MWE), 2014
In this paper, we describe work in progress for the development of a named entity recognizer for ... more In this paper, we describe work in progress for the development of a named entity recognizer for Greek. The system aims at information extraction applications where large scale text processing is needed. Speed of analysis, system robustness, and results accuracy have been the basic guidelines for the system"s design. Our system is an automated pipeline of linguistic components for Greek text processing based on pattern matching techniques. Nonrecursive regular expressions have been implemented on top of it in order to capture different types of named entities. Α corpus of financial texts has been collected from several web sources and has been manually annotated in order to be used for development and testing purposes. Overall precision and recall are 86% and 81% respectively.
ERCIM News 136, 2024
Our research is aimed at employing Large Language Models (LLMs) to build a multi-dimensional para... more Our research is aimed at employing Large Language Models (LLMs) to build a multi-dimensional paraphrase tool in view of enhancing reading comprehension and writing skills of learners of Greek as their mother tongue, second or foreign language (L1, L2/FL). The idea is to integrate
ChatGPT into classroom settings, thus turning Generative AI from a threat to an assistant.
Slovenščina 2.0: Empirične, Aplikativne in Interdisciplinarne Raziskave, Nov 13, 2019
The authors report on a recent survey on monolingual dictionaries available on the Greek market. ... more The authors report on a recent survey on monolingual dictionaries available on the Greek market. General dictionaries outnumber spelling and educational ones and enjoy a prestigious status. Only one general dictionary is digitally born and only two are available through the web, but several are available as CDs. Most of the prestigious dictionaries have received public funding but not all. Lexicography is well considered in Greece where printed dictionaries seem to still have the lead.
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
Multiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic... more Multiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as "words with spaces". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-millionword annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
Les émotions dans le discours- Emotions in Discourse, 2000
ABSTRACT This paper presents corpus work aimed at manually annotating sentiment expressions in th... more ABSTRACT This paper presents corpus work aimed at manually annotating sentiment expressions in the Greek (EL) sub–corpus of a corpus of movies coupled with both orthographic (EN) transcriptions and subtitles in (EL) and (ES). Our effort involves the treatment of emotion predicates and emotion–related concepts in naturally occurring texts and their integration into an existing lexical resource that is being re–designed and re–focused so as to ultimately form an Ontology of Emotions. This ontology will be appropriate for both lexicographic and computational approaches to sentiment. Emotion polarity and intensity was also studied based on corpus data.
ABSTRACT We hereby present work aimed at giving an account of Greek verbs denoting emotion that i... more ABSTRACT We hereby present work aimed at giving an account of Greek verbs denoting emotion that is placed within a larger context, aimed towards defining and describing the semantic field of emotions by means of identifying, selecting, classifying and organizing a core lexicon of emotions in a conceptual Data Base. The ultimate goal is the exhaustive description of Modern Greek and the development of a wide-coverage lexical resource that will be appropriate for a range of Natural Language Processing Applications.
Lrec, 2008
The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textua... more The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was collected in view of a range of NLP applications, where semantic interpretation is of paramount importance, and it was manually annotated at the level of Textual Entailment. Moreover, a number of linguistic annotations were also integrated that were deemed useful for prospect system developers. The critical issue was the development of a final resource that is re-usable and adaptable to different NLP systems, in order to either enhance their accuracy or to evaluate their output. We are hereby focusing on the methodological issues underpinning data selection and annotation. An initial approach towards the development of a system catering for the automatic Recognition of Textual Entailment in Greek is also presented and preliminary results are reported.
Language Technology for Cultural Heritage, 2011
There has been a long tradition in the digitization and manual documentation of cultural heritage... more There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the ...
Abstract: This paper reports on work aimed at (a) developing an application tailored to integrate... more Abstract: This paper reports on work aimed at (a) developing an application tailored to integrate and highlight textual cultural resources that, as of yet, remain underexploited, and (b) creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of Greece and Bulgaria (the focus being on the neighboring areas) and to raise awareness about their common cultural identity, emphasizing on literature, folklore and language. To this end, a ...
This paper presents an ongoing effort work focusing on the development of an audiovisual corpus r... more This paper presents an ongoing effort work focusing on the development of an audiovisual corpus resource and its annotation in terms of sentiments and opinions. A modular annotation schema has been employed based on the specifications of existing schemas and extending or adapting them to cater for the peculiarities of the corpus-specific data. Keywords: annotation of emotion, annotation of opinion, movies corpus
repository.dlsi.ua.es
The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textua... more The paper reports on completed work aimed at the creation of a resource, namely, the Greek Textual Entailment Corpus (GTEC) that is appropriate for guiding training and evaluation of a system that recognizes Textual Entailment in Greek texts. The corpus of textual units was ...
Bulagaria: Insitute for Language and Speech …, 2002
Language Technology for Cultural Heritage, 2011
There has been a long tradition in the digitization and manual documentation of cultural heritage... more There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the ...
Language Technology for Cultural Heritage, 2011
There has been a long tradition in the digitization and manual documentation of cultural heritage... more There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the ...
Lecture Notes in Computer Science, 2000
... 1 Institute for Language and Speech Processing Artemidos 6 &a... more ... 1 Institute for Language and Speech Processing Artemidos 6 & Epidavrou, 151 25, Athens, Greece tel: +301 6875300, fax: +301 6854270 {sboutsis,iason,voula,xaris,spip}@ilsp.gr 2 National Technical University of Athens 3 Cambridge University ml257@cam.ac.uk Abstract. ...
Lecture Notes in Computer Science, 2000
... 1 Institute for Language and Speech Processing Artemidos 6 &a... more ... 1 Institute for Language and Speech Processing Artemidos 6 & Epidavrou, 151 25, Athens, Greece tel: +301 6875300, fax: +301 6854270 {sboutsis,iason,voula,xaris,spip}@ilsp.gr 2 National Technical University of Athens 3 Cambridge University ml257@cam.ac.uk Abstract. ...
This paper describes work aimed at the creation of a lexical resource that incorporates multiword... more This paper describes work aimed at the creation of a lexical resource that incorporates multiword expressions, that is word combinations with lexical, morphological, syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Sag et al, 2002) that exceed the boundaries of single lexical units. We present the methodology adopted for identifying and manually annotating multiword expressions in naturally-occurring texts. The focus will be on the typology of the identified expressions, their semantic properties, and their encoding in the lexical resource.
Abstract: In this paper we present the Dictionary for Junior High School students, a monolingual ... more Abstract: In this paper we present the Dictionary for Junior High School students, a monolingual Greek dictionary targeted at young native language learners, that has been compiled in the framework of a national project for the production of new teaching material. After presenting the specifications set by the Ministry of Education, we discuss the main methodological principles that underlie its construction and elaborate on the main features of the dictionary. In particular, we report on the macrostructure of the dictionary (the ...
Proceedings of the 10th Workshop on Multiword Expressions (MWE), 2014