George Tambouratzis | Institute for Language and Speech Processing (original) (raw)

Papers by George Tambouratzis

Research paper thumbnail of Preliminaries

Machine Translation with Minimal Reliance on Parallel Resources, 2017

Research paper thumbnail of Machine Translation with Minimal Reliance on Parallel Resources

SpringerBriefs in Statistics, 2017

Research paper thumbnail of Swarm Algorithms for NLP - The Case of Limited Training Data

Journal of Artificial Intelligence and Soft Computing Research, 2019

The present article describes a novel phrasing model which can be used for segmenting sentences o... more The present article describes a novel phrasing model which can be used for segmenting sentences of unconstrained text into syntactically-defined phrases. This model is based on the notion of attraction and repulsion forces between adjacent words. Each of these forces is weighed appropriately by system parameters, the values of which are optimised via particle swarm optimisation. This approach is designed to be language-independent and is tested here for different languages. The phrasing model’s performance is assessed per se, by calculating the segmentation accuracy against a golden segmentation. Operational testing also involves integrating the model to a phrase-based Machine Translation (MT) system and measuring the translation quality when the phrasing model is used to segment input text into phrases. Experiments show that the performance of this approach is comparable to other leading segmentation methods and that it exceeds that of baseline systems.

Research paper thumbnail of Language-Independent Hybrid MT: Comparative Evaluation of Translation Quality

Theory and Applications of Natural Language Processing, 2016

The present chapter reviews the development of a hybrid Machine Translation (MT) methodology, whi... more The present chapter reviews the development of a hybrid Machine Translation (MT) methodology, which is readily portable to new language pairs. This MT methodology (which has been developed within the PRESEMT project) is based on sampling mainly monolingual corpora, with very limited use of parallel corpora, thus supporting portability to new language pairs. In designing this methodology, no assumptions are made regarding the availability of extensive and expensive-to-create linguistic resources. In addition, the general-purpose NLP tools used can be chosen interchangeably. Thus PRESEMT circumvents the requirement for specialised resources and tools so as to further support the creation of MT systems for diverse language pairs.

Research paper thumbnail of Expanding the Language model in a low-resource hybrid MT system

Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014

The present article investigates the fusion of different language models to improve translation a... more The present article investigates the fusion of different language models to improve translation accuracy. A hybrid MT system, recentlydeveloped in the European Commissionfunded PRESEMT project that combines example-based MT and Statistical MT principles is used as a starting point. In this article, the syntactically-defined phrasal language models (NPs, VPs etc.) used by this MT system are supplemented by n-gram language models to improve translation accuracy. For specific structural patterns, n-gram statistics are consulted to determine whether the pattern instantiations are corroborated. Experiments indicate improvements in translation accuracy.

Research paper thumbnail of Studying the SPEA2 algorithm for optimising a pattern-recognition based machine translation system

2011 IEEE Symposium on Computational Intelligence in Multicriteria Decision-Making (MDCM), 2011

In this article, aspects regarding the optimisation of mach ine translation systems via evolution... more In this article, aspects regarding the optimisation of mach ine translation systems via evolutionary computation algorithms are examined. The article focuses on pattern- recognition based machine translation systems that use large monolingual corpora in the target language from which statistical information is extracted. The research reported here uses a specific machine translation as a representative for experimentation. Based on previous

Research paper thumbnail of Parameter Optimisation of MT Systems Operating on Monolingual Corpora Employing a Genetic Algorithm

Research paper thumbnail of Accurate phrase alignment in a bilingual corpus for EBMT systems

An ongoing trend in the creation of Machine Translation (MT) systems concerns the automatic extra... more An ongoing trend in the creation of Machine Translation (MT) systems concerns the automatic extraction of information from large bilingual parallel corpora. As these corpora are expensive to create, the largest possible amount of information needs to be extracted in a consistent manner. The present article introduces a phrase alignment methodology for transferring structural information between languages using only a limited-size parallel corpus. This is used as a first processing stage to support a phrase-based MT system that can be readily ported to new language pairs. The essential language resources used in this MT system include a large monolingual corpus and a small parallel one. An analysis of different alignment cases is provided and the solutions chosen are described. In addition, the application of the system to different language pairs is reported and the results obtained are compared across language pairs to investigate the language-independent aspect of the proposed approach.

Research paper thumbnail of Implementing a language-independent MT methodology

The current paper presents a languageindependent methodology, which facilitates the creation of m... more The current paper presents a languageindependent methodology, which facilitates the creation of machine translation (MT) systems for various language pairs. This methodology is implemented in the PRESEMT hybrid MT system. PRESEMT has the lowest possible requirements on specialised resources and tools, given that for many languages (especially less widely used ones) only limited linguistic resources are available. In PRESEMT, the main translation process comprises two phases. The first one, Structure selection, determines the overall structure of a target language (TL) sentence, drawing on syntactic information from a small bilingual corpus. The second phase, Translation equivalent selection, relies on models extracted solely from monolingual corpora to implement translation disambiguation, determine intra-phrase word order and handle functional words. This paper proposes extracting information for disambiguation from the monolingual corpus. Experimental results indicate that such information substantially contributes in improving translation quality.

Research paper thumbnail of Multi-objective optimisation of real-valued parameters of a hybrid MT system using Genetic Algorithms

Pattern Recognition Letters, 2010

In this paper, an automated method is proposed for optimising the real-valued parameters of a hyb... more In this paper, an automated method is proposed for optimising the real-valued parameters of a hybrid Machine Translation (MT) system that employs pattern recognition techniques together with extensive monolingual corpora in the target language from which statistical information is extracted. The absence of a parallel corpus prohibits the use of the training techniques traditionally employed in state-of-the-art Statistical Machine Translation systems. The proposed approach for fine-tuning the system parameters towards the generation of high-quality translations is based on a Genetic Algorithm and the multi-objective evolutionary algorithm SPEA2. In order to evaluate the translation quality, established MT automatic evaluation criteria are employed, such as BLEU and METEOR. Furthermore, various ways of combining these criteria are explored, in order to exploit each one's characteristics and evaluate the produced translations. The experimental results indicate the effectiveness of this approach, since the translation quality of the evaluation sentence sets used is substantially improved in all studied configurations, when compared to the output of the same system operating with manually-defined parameters. Out of all configurations, the multi-objective evolutionary algorithms, combining several MT evaluation metrics, are found to produce the highest quality translations.

Research paper thumbnail of Using patterns for machine translation (MT)

Proceedings of the …, 2006

With this work, we further explore the ideas tested within the METIS-I1 system (Dologlou et al. 2... more With this work, we further explore the ideas tested within the METIS-I1 system (Dologlou et al. 2003) which proved the feasibility of the innovative idea that sound translations could be received with hybrid MT that relied on monolingual corpora – rather than parallel ones – and flat ...

Research paper thumbnail of Pattern Matching-Based System for Machine Translation (MT)

Lecture Notes in Computer Science, 2006

The innovative feature of the system presented in this paper is the use of pattern-matching techn... more The innovative feature of the system presented in this paper is the use of pattern-matching techniques to retrieve translations resulting in a flexible, language-independent approach, which employs a limited amount of explicit a priori linguistic knowledge. Furthermore, while all state-of-the-art corpus-based approaches to Machine Translation (MT) rely on bitexts, this system relies on extensive target language monolingual corpora. The translation process distinguishes three phases: 1) pre-processing with 'light' rule and statisticsbased NLP techniques 2) search & retrieval, 3) synthesising. At Phase 1, the source language sentence is mapped onto a lemma-to-lemma translated string. This string then forms the input to the search algorithm, which retrieves similar sentences from the corpus (Phase 2). This retrieval process is performed iteratively at increasing levels of detail, until the best match is detected. The best retrieved sentence is sent to the synthesising algorithm (Phase 3), which handles phenomena such as agreement.

Research paper thumbnail of VEMUS: An Integrated Platform to Support Music Tuition Tasks

2008 Eighth IEEE International Conference on Advanced Learning Technologies, 2008

In this paper, the VEMUS platform is presented, as a novel approach for music tuition that focuse... more In this paper, the VEMUS platform is presented, as a novel approach for music tuition that focuses on beginner and intermediate students, typically aged from 9 to 15 years. This platform is characterized by an open, highly interactive and networked multilingual music tuition framework that covers a selection of popular wind instruments. The VEMUS environment integrates innovative, pedagogically-motivated elearning components to augment traditional music teaching in three distinct learning settings, namely self-practicing, classroom and distance learning. In the present article, the current stage of development of VEMUS is presented, and the areas where it might be of most use towards supporting the educational activities associated with music tuition are identified.

Research paper thumbnail of The effectiveness of surrogate functions in improving the accuracy of PSO-type algorithms in an NLP task

2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017

Research paper thumbnail of Applying particle swarm optimisation to the morphological segmentation of words from Ancient Greek texts

Pattern Analysis and Applications, 2016

Abstract The present article investigates the effectiveness of evolutionary computation algorithm... more Abstract The present article investigates the effectiveness of evolutionary computation algorithms in a specific optimisation task, namely morphological segmentation of words into subword segments, focusing on the definition of stems and endings. More precisely, particle swarm optimisation (PSO) is compared to an earlier study on the same task using ant colony optimisation (ACO), using a number of different optimisation criteria, for each of which independent experiments are run. In the present article, the system architecture has been revised over earlier implementations, to allow substantially faster simulation times (by several orders of magnitude), which in turn allows the realisation of more iterations. The effect of local search to the PSO final segmentation quality is investigated in detail, with different local search processes being compared in terms of their effectiveness. In addition, issues involving the convergence of PSO are examined, encompassing variants which adopt global versus local training schemes. Experimental results show that, for different datasets, as a rule both PSO and ACO achieve higher segmentation accuracies than manual tuning. A comparison between ACO and PSO is made, over the different criteria used. When focusing on the highest performing criteria, ACO and PSO are comparable, while the system revisions allow the process to be completed much faster. In terms of the highest segmentation accuracy obtained for a specific system configuration, PSO is more effective, by achieving the highest segmentation accuracy amongst all optimisation methods tested.

Research paper thumbnail of Establishing sentential structure via realignments from small parallel corpora

Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra), 2015

Research paper thumbnail of Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system

Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra), 2014

Research paper thumbnail of Reducing spectral mismatches in concatenative speech synthesis via systematic database enrichment

This paper presents work performed for the Time-Domain TTS system, which is being developed at th... more This paper presents work performed for the Time-Domain TTS system, which is being developed at the ILSP for the Greek language. It focuses on the enhancement of the synthetic speech quality, by reducing the spectral mismatches between concatenated segments. To that end, a study has been performed to determine the distance that can best predict when a spectral mismatch is audible. Experimentation with different spectral distances has taken place and the distance with the best performance has been used in order to systematically enrich the segment database, which initially contained only one instance per segment. Results of this procedure indicate a substantial improvement on the synthetic speech quality.

Research paper thumbnail of Optimising the clustering performance of a self-organising logic neural network with topology-preserving capabilities

Pattern Recognition Letters, 1994

In this article, a self-organising logic neural network is studied. This network successfully clu... more In this article, a self-organising logic neural network is studied. This network successfully clusters input patterns into classes characterised by a high similarity, while assigning these classes to the network nodes so that relationships existing in the pattern space are replicated on the network structure. The network performance is optimised by (i) introducing a mechanism which ensures the efficient use of the network nodes for storage of pattern classes and by (ii) determining the training strategy which results in optimal topology-preservation characteristics.

Research paper thumbnail of Assessing the effectiveness of feature groups in author recognition tasks with the SOM model

IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2006

Research paper thumbnail of Preliminaries

Machine Translation with Minimal Reliance on Parallel Resources, 2017

Research paper thumbnail of Machine Translation with Minimal Reliance on Parallel Resources

SpringerBriefs in Statistics, 2017

Research paper thumbnail of Swarm Algorithms for NLP - The Case of Limited Training Data

Journal of Artificial Intelligence and Soft Computing Research, 2019

The present article describes a novel phrasing model which can be used for segmenting sentences o... more The present article describes a novel phrasing model which can be used for segmenting sentences of unconstrained text into syntactically-defined phrases. This model is based on the notion of attraction and repulsion forces between adjacent words. Each of these forces is weighed appropriately by system parameters, the values of which are optimised via particle swarm optimisation. This approach is designed to be language-independent and is tested here for different languages. The phrasing model’s performance is assessed per se, by calculating the segmentation accuracy against a golden segmentation. Operational testing also involves integrating the model to a phrase-based Machine Translation (MT) system and measuring the translation quality when the phrasing model is used to segment input text into phrases. Experiments show that the performance of this approach is comparable to other leading segmentation methods and that it exceeds that of baseline systems.

Research paper thumbnail of Language-Independent Hybrid MT: Comparative Evaluation of Translation Quality

Theory and Applications of Natural Language Processing, 2016

The present chapter reviews the development of a hybrid Machine Translation (MT) methodology, whi... more The present chapter reviews the development of a hybrid Machine Translation (MT) methodology, which is readily portable to new language pairs. This MT methodology (which has been developed within the PRESEMT project) is based on sampling mainly monolingual corpora, with very limited use of parallel corpora, thus supporting portability to new language pairs. In designing this methodology, no assumptions are made regarding the availability of extensive and expensive-to-create linguistic resources. In addition, the general-purpose NLP tools used can be chosen interchangeably. Thus PRESEMT circumvents the requirement for specialised resources and tools so as to further support the creation of MT systems for diverse language pairs.

Research paper thumbnail of Expanding the Language model in a low-resource hybrid MT system

Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014

The present article investigates the fusion of different language models to improve translation a... more The present article investigates the fusion of different language models to improve translation accuracy. A hybrid MT system, recentlydeveloped in the European Commissionfunded PRESEMT project that combines example-based MT and Statistical MT principles is used as a starting point. In this article, the syntactically-defined phrasal language models (NPs, VPs etc.) used by this MT system are supplemented by n-gram language models to improve translation accuracy. For specific structural patterns, n-gram statistics are consulted to determine whether the pattern instantiations are corroborated. Experiments indicate improvements in translation accuracy.

Research paper thumbnail of Studying the SPEA2 algorithm for optimising a pattern-recognition based machine translation system

2011 IEEE Symposium on Computational Intelligence in Multicriteria Decision-Making (MDCM), 2011

In this article, aspects regarding the optimisation of mach ine translation systems via evolution... more In this article, aspects regarding the optimisation of mach ine translation systems via evolutionary computation algorithms are examined. The article focuses on pattern- recognition based machine translation systems that use large monolingual corpora in the target language from which statistical information is extracted. The research reported here uses a specific machine translation as a representative for experimentation. Based on previous

Research paper thumbnail of Parameter Optimisation of MT Systems Operating on Monolingual Corpora Employing a Genetic Algorithm

Research paper thumbnail of Accurate phrase alignment in a bilingual corpus for EBMT systems

An ongoing trend in the creation of Machine Translation (MT) systems concerns the automatic extra... more An ongoing trend in the creation of Machine Translation (MT) systems concerns the automatic extraction of information from large bilingual parallel corpora. As these corpora are expensive to create, the largest possible amount of information needs to be extracted in a consistent manner. The present article introduces a phrase alignment methodology for transferring structural information between languages using only a limited-size parallel corpus. This is used as a first processing stage to support a phrase-based MT system that can be readily ported to new language pairs. The essential language resources used in this MT system include a large monolingual corpus and a small parallel one. An analysis of different alignment cases is provided and the solutions chosen are described. In addition, the application of the system to different language pairs is reported and the results obtained are compared across language pairs to investigate the language-independent aspect of the proposed approach.

Research paper thumbnail of Implementing a language-independent MT methodology

The current paper presents a languageindependent methodology, which facilitates the creation of m... more The current paper presents a languageindependent methodology, which facilitates the creation of machine translation (MT) systems for various language pairs. This methodology is implemented in the PRESEMT hybrid MT system. PRESEMT has the lowest possible requirements on specialised resources and tools, given that for many languages (especially less widely used ones) only limited linguistic resources are available. In PRESEMT, the main translation process comprises two phases. The first one, Structure selection, determines the overall structure of a target language (TL) sentence, drawing on syntactic information from a small bilingual corpus. The second phase, Translation equivalent selection, relies on models extracted solely from monolingual corpora to implement translation disambiguation, determine intra-phrase word order and handle functional words. This paper proposes extracting information for disambiguation from the monolingual corpus. Experimental results indicate that such information substantially contributes in improving translation quality.

Research paper thumbnail of Multi-objective optimisation of real-valued parameters of a hybrid MT system using Genetic Algorithms

Pattern Recognition Letters, 2010

In this paper, an automated method is proposed for optimising the real-valued parameters of a hyb... more In this paper, an automated method is proposed for optimising the real-valued parameters of a hybrid Machine Translation (MT) system that employs pattern recognition techniques together with extensive monolingual corpora in the target language from which statistical information is extracted. The absence of a parallel corpus prohibits the use of the training techniques traditionally employed in state-of-the-art Statistical Machine Translation systems. The proposed approach for fine-tuning the system parameters towards the generation of high-quality translations is based on a Genetic Algorithm and the multi-objective evolutionary algorithm SPEA2. In order to evaluate the translation quality, established MT automatic evaluation criteria are employed, such as BLEU and METEOR. Furthermore, various ways of combining these criteria are explored, in order to exploit each one's characteristics and evaluate the produced translations. The experimental results indicate the effectiveness of this approach, since the translation quality of the evaluation sentence sets used is substantially improved in all studied configurations, when compared to the output of the same system operating with manually-defined parameters. Out of all configurations, the multi-objective evolutionary algorithms, combining several MT evaluation metrics, are found to produce the highest quality translations.

Research paper thumbnail of Using patterns for machine translation (MT)

Proceedings of the …, 2006

With this work, we further explore the ideas tested within the METIS-I1 system (Dologlou et al. 2... more With this work, we further explore the ideas tested within the METIS-I1 system (Dologlou et al. 2003) which proved the feasibility of the innovative idea that sound translations could be received with hybrid MT that relied on monolingual corpora – rather than parallel ones – and flat ...

Research paper thumbnail of Pattern Matching-Based System for Machine Translation (MT)

Lecture Notes in Computer Science, 2006

The innovative feature of the system presented in this paper is the use of pattern-matching techn... more The innovative feature of the system presented in this paper is the use of pattern-matching techniques to retrieve translations resulting in a flexible, language-independent approach, which employs a limited amount of explicit a priori linguistic knowledge. Furthermore, while all state-of-the-art corpus-based approaches to Machine Translation (MT) rely on bitexts, this system relies on extensive target language monolingual corpora. The translation process distinguishes three phases: 1) pre-processing with 'light' rule and statisticsbased NLP techniques 2) search & retrieval, 3) synthesising. At Phase 1, the source language sentence is mapped onto a lemma-to-lemma translated string. This string then forms the input to the search algorithm, which retrieves similar sentences from the corpus (Phase 2). This retrieval process is performed iteratively at increasing levels of detail, until the best match is detected. The best retrieved sentence is sent to the synthesising algorithm (Phase 3), which handles phenomena such as agreement.

Research paper thumbnail of VEMUS: An Integrated Platform to Support Music Tuition Tasks

2008 Eighth IEEE International Conference on Advanced Learning Technologies, 2008

In this paper, the VEMUS platform is presented, as a novel approach for music tuition that focuse... more In this paper, the VEMUS platform is presented, as a novel approach for music tuition that focuses on beginner and intermediate students, typically aged from 9 to 15 years. This platform is characterized by an open, highly interactive and networked multilingual music tuition framework that covers a selection of popular wind instruments. The VEMUS environment integrates innovative, pedagogically-motivated elearning components to augment traditional music teaching in three distinct learning settings, namely self-practicing, classroom and distance learning. In the present article, the current stage of development of VEMUS is presented, and the areas where it might be of most use towards supporting the educational activities associated with music tuition are identified.

Research paper thumbnail of The effectiveness of surrogate functions in improving the accuracy of PSO-type algorithms in an NLP task

2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017

Research paper thumbnail of Applying particle swarm optimisation to the morphological segmentation of words from Ancient Greek texts

Pattern Analysis and Applications, 2016

Abstract The present article investigates the effectiveness of evolutionary computation algorithm... more Abstract The present article investigates the effectiveness of evolutionary computation algorithms in a specific optimisation task, namely morphological segmentation of words into subword segments, focusing on the definition of stems and endings. More precisely, particle swarm optimisation (PSO) is compared to an earlier study on the same task using ant colony optimisation (ACO), using a number of different optimisation criteria, for each of which independent experiments are run. In the present article, the system architecture has been revised over earlier implementations, to allow substantially faster simulation times (by several orders of magnitude), which in turn allows the realisation of more iterations. The effect of local search to the PSO final segmentation quality is investigated in detail, with different local search processes being compared in terms of their effectiveness. In addition, issues involving the convergence of PSO are examined, encompassing variants which adopt global versus local training schemes. Experimental results show that, for different datasets, as a rule both PSO and ACO achieve higher segmentation accuracies than manual tuning. A comparison between ACO and PSO is made, over the different criteria used. When focusing on the highest performing criteria, ACO and PSO are comparable, while the system revisions allow the process to be completed much faster. In terms of the highest segmentation accuracy obtained for a specific system configuration, PSO is more effective, by achieving the highest segmentation accuracy amongst all optimisation methods tested.

Research paper thumbnail of Establishing sentential structure via realignments from small parallel corpora

Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra), 2015

Research paper thumbnail of Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system

Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra), 2014

Research paper thumbnail of Reducing spectral mismatches in concatenative speech synthesis via systematic database enrichment

This paper presents work performed for the Time-Domain TTS system, which is being developed at th... more This paper presents work performed for the Time-Domain TTS system, which is being developed at the ILSP for the Greek language. It focuses on the enhancement of the synthetic speech quality, by reducing the spectral mismatches between concatenated segments. To that end, a study has been performed to determine the distance that can best predict when a spectral mismatch is audible. Experimentation with different spectral distances has taken place and the distance with the best performance has been used in order to systematically enrich the segment database, which initially contained only one instance per segment. Results of this procedure indicate a substantial improvement on the synthetic speech quality.

Research paper thumbnail of Optimising the clustering performance of a self-organising logic neural network with topology-preserving capabilities

Pattern Recognition Letters, 1994

In this article, a self-organising logic neural network is studied. This network successfully clu... more In this article, a self-organising logic neural network is studied. This network successfully clusters input patterns into classes characterised by a high similarity, while assigning these classes to the network nodes so that relationships existing in the pattern space are replicated on the network structure. The network performance is optimised by (i) introducing a mechanism which ensures the efficient use of the network nodes for storage of pattern classes and by (ii) determining the training strategy which results in optimal topology-preservation characteristics.

Research paper thumbnail of Assessing the effectiveness of feature groups in author recognition tasks with the SOM model

IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2006