Angelina Gaspar - Academia.edu (original) (raw)
Papers by Angelina Gaspar
In today's world of distorted views of life, religious values and beliefs, powerlessness and desp... more In today's world of distorted views of life, religious values and beliefs, powerlessness and despair, there is a growing need for the virtue of mercy which could grant hope, peace and justice to mankind. The aim of this paper is to identify a core terminology in a million-word specialized English corpus that was compiled for the purpose of this research, focusing on the concept of mercy and mercy-related terms The corpus consists of the last three Holy Fathers' pontifical discourses freely available at the Holy See web site. The keyword lists g enerated by WordSmith tools for three subcorpora are contrasted to confirm preliminary assumption on their possible correspondence. The assumption is based on the fact that, regardless of different contexts and times the discourses were created, their authors share common religious legacy, beliefs, views and attitudes based on the Scriptures. The paper attempts to find the corpus-based evidences to an unbroken continuity of spiritual authorities in interpreting and pleading for mercy and justice in their pontifical discourses. This paper aims to contribute to the development of theological reflections based on the virtue of mercy, corpus linguistics, domain-specific terminology and clarity of religious concepts and ideas
Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasav... more Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasava pitanje elektronskih resursa i alata. Elektronski rjecnici, korpusi, prijevodne memorije, terminoloske baze i strojno prevođenje neke su od tehnologija koje se koriste u prevođenju u EU. U radu se želi prikazati važnost jezicnih tehnologije kako bi se postigli visejezicni standardi i hrvatski moduli ukljucili u visejezicnu komunikaciju EU.
Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasav... more Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasava pitanje elektronskih resursa i alata. Elektronski rjecnici, korpusi, prijevodne memorije, terminoloske baze i strojno prevođenje neke su od tehnologija koje se koriste u prevođenju u EU. U radu se želi prikazati važnost jezicnih tehnologije kako bi se postigli visejezicni standardi i hrvatski moduli ukljucili u visejezicnu komunikaciju EU.
Language Resources and Evaluation, Jul 10, 2023
In today's world of distorted views of life, religious values and beliefs, powerlessness and desp... more In today's world of distorted views of life, religious values and beliefs, powerlessness and despair, there is a growing need for the virtue of mercy which could grant hope, peace and justice to mankind. The aim of this paper is to identify a core terminology in a million-word specialized English corpus that was compiled for the purpose of this research, focusing on the concept of mercy and mercy-related terms The corpus consists of the last three Holy Fathers' pontifical discourses freely available at the Holy See web site. The keyword lists g enerated by WordSmith tools for three subcorpora are contrasted to confirm preliminary assumption on their possible correspondence. The assumption is based on the fact that, regardless of different contexts and times the discourses were created, their authors share common religious legacy, beliefs, views and attitudes based on the Scriptures. The paper attempts to find the corpus-based evidences to an unbroken continuity of spiritual authorities in interpreting and pleading for mercy and justice in their pontifical discourses. This paper aims to contribute to the development of theological reflections based on the virtue of mercy, corpus linguistics, domain-specific terminology and clarity of religious concepts and ideas
The paper describes a methodology for bilingual terminology extraction and termbase building base... more The paper describes a methodology for bilingual terminology extraction and termbase building based on the terminological, lexical and pragmatic criteria along with the translator's knowledge and experience. The research work is conducted on the sentence aligned million- word Croatian-English parallel corpus of legislative texts, the first bigger corpus designed for this language pair so far. In order to assess the hybrid, statistical and linguistic approach as well as the tools for automatic term extraction, the automatically obtained lists of term candidates are compared to the manually created reference list. The term extraction includes multi-word units and single-word units corresponding to multi-word ones. The tools used in this research are: SDL Trados WinAlign (sentence alignment), SDLMultiTermExtract, and WordSmith (for statistically-based term extraction) and NooJ (linguistically-based environment). The evaluation is reported by statistical measures of precision, recall and Fmeasure. The language resources covering a specific domain speed up the translation process, reduce the cost and time and enable communication across different languages and cultures. Also, their application greatly facilitates machine translation and computer-assisted translation, information retrieval, building of multilingual term bases, glossaries and other resources which are prerequisite for the development of a language with insufficient linguistic resources, such as Croatian.
Expert Systems With Applications, 2022
Communications in computer and information science, 2023
Je-LKS : Journal of e-Learning and Knowledge Society, 2020
E-Learning environment implies self-motivation and perseverance in study and completion of learni... more E-Learning environment implies self-motivation and perseverance in study and completion of learning tasks. However, the more autonomy students have in managing their e-Learning, the harder they cope with distractions and remaining focused and engaged. This research study aims to assess the level of student engagement in four e-Learning platforms (CoLaB Tutor, AC-ware Tutor, CM Tutor and Moodle) in higher education. A model for Tracking Student Learning and Knowledge (TSLAK) is developed and based on two sets of variables: variables tracking student's learning activities (VTL) and variables tracking student's knowledge (VTK). This study aims to provide answers on how a model for tracking student online learning and knowledge can be formalized for the four e-Learning platforms and how can student learning and knowledge acquisition processes be described and measured by VTL and VTK. The results obtained by VTL and VTK indicate a significant decline in students' engagement. Out of 218 the most engaged students, 77 (35%) of them used the CoLaB Tutor, 41 (19%) used the AC-ware Tutor, 52 (24%) used the CM Tutor, and 48 (22%) used the Moodle. The research showed that out of the total number of students only 88 (13%) of them were the most engaged and the most successful or more precisely, 63 (71%) graduates and 25 (29%) undergraduates. Such student engagement and success measured by VTL and VTK indicate the necessity of increasing students' motivation in blended learning environments, strengthening their preparation and introduction to e-Learning platforms, and observing their feedback during a research study.
JUCS - Journal of Universal Computer Science
This paper describes and evaluates the performance of a semi-automatic authoring tool (SAAT) for ... more This paper describes and evaluates the performance of a semi-automatic authoring tool (SAAT) for knowledge extraction in the AC&NL Tutor, highlighting its strengths and weaknesses. We assessed the accuracy of automatic annotation tasks (Part-of-Speech tagging, Name Entity Recognition, Dependency parsing, and Coreference Resolution) performed on a dataset of 160 sentences from unstructured Wikipedia text on a computer. We compared the automatic annotations to the gold standard, created after human post-editing and validation. Human-error analysis included 3769 words, 582 subsentences, 1129 questions, 917 propositions, 1020 concepts, and 667 relations. It resulted in the error type classification and the set of custom rules further used for automatic error identification and correction. The results showed that an average of 68.7% of the error corrections referred to CoreNLP performance and 31.3% to the SAAT extraction algorithms. Our main contributions include an integrated approach t...
Information, 2022
Consistent terminology can positively influence communication, information transfer, and proper u... more Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in parallel corpora for the evaluation of translated terminology. This research was conducted on three types of legal domain subcorpora, dating from different periods: the Croatian-English parallel corpus (1991–2009), Latin-English and Latin-Croatian versions of the Code of Canon Law (1983), and English and Croatian versions of the EU legislation (2013). After the terminology extraction process, validation of term candidates was performed, followed by an evaluation. Terminology consistency was measured using the HHI—a commonly accepted measurement of market concentration. Results show that the HHI can be used for measuring terminology consistency to ...
Angelina Gaspar Faculty of Humanities and Social Sciences Catholic Faculty of Theology, Universit... more Angelina Gaspar Faculty of Humanities and Social Sciences Catholic Faculty of Theology, University of Split, Croatia ABSTRACT This paper presents a corpus-based approach to semi-automatic extraction of English phrasal verbs, very productive, but complex and often non-transparent lexical units, via particles (prepositions, adverbs) they consist of and which are among the top-ranking functional words in the list of running words of the British National Corpus (BNC). The research is carried out on a comparable English corpus of publicly available legal texts consisting of 392 255 words and using WordSmith Tools 6.0. The evaluation of the system efficiency is conducted via the statistical measures of Precision, Recall and F-measure, whereas the list of phrasal verbs is checked against the reference source Cambridge Phrasal Verbs Dictionary (2015). The results show that the process of semi-automatic extraction of phrasal verbs requires a considerable human intervention as well as control...
INFuture2007The Future of …, 2007
Sentence alignment represents the basis for computer-assisted translation (CAT), terminology mana... more Sentence alignment represents the basis for computer-assisted translation (CAT), terminology management, term extraction, word alignment and crosslinguistic information retrieval. Created out of the sentence alignment process, translation memory (TM) represents the basis for further research in translation equivalencies. Automatic sentence alignment, based on parallel texts, faces two types of problems: robustness and discrepancies between source and target texts in layout and omissions which have an influence on the accuracy of the alignment process. The aim of the paper is to present research on the sentence alignment process carried out on the Croatian-English parallel texts (laws, regulations, acts and decisions) and implemented by the alignment tool WinAlign 7.5.0 by SDL Trados 2006 Professional. The alignment process and its impact on the creation of translation memories is presented through comparison of translation memories that differ regarding the levels of expert intervention in the set up of the alignment program and preparation of the source text for the segmentation. Recommendations for further development using statistical analysis, automatic learning techniques and language knowledge are suggested.
In today's world of distorted views of life, religious values and beliefs, powerlessness and desp... more In today's world of distorted views of life, religious values and beliefs, powerlessness and despair, there is a growing need for the virtue of mercy which could grant hope, peace and justice to mankind. The aim of this paper is to identify a core terminology in a million-word specialized English corpus that was compiled for the purpose of this research, focusing on the concept of mercy and mercy-related terms The corpus consists of the last three Holy Fathers' pontifical discourses freely available at the Holy See web site. The keyword lists g enerated by WordSmith tools for three subcorpora are contrasted to confirm preliminary assumption on their possible correspondence. The assumption is based on the fact that, regardless of different contexts and times the discourses were created, their authors share common religious legacy, beliefs, views and attitudes based on the Scriptures. The paper attempts to find the corpus-based evidences to an unbroken continuity of spiritual authorities in interpreting and pleading for mercy and justice in their pontifical discourses. This paper aims to contribute to the development of theological reflections based on the virtue of mercy, corpus linguistics, domain-specific terminology and clarity of religious concepts and ideas
Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasav... more Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasava pitanje elektronskih resursa i alata. Elektronski rjecnici, korpusi, prijevodne memorije, terminoloske baze i strojno prevođenje neke su od tehnologija koje se koriste u prevođenju u EU. U radu se želi prikazati važnost jezicnih tehnologije kako bi se postigli visejezicni standardi i hrvatski moduli ukljucili u visejezicnu komunikaciju EU.
Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasav... more Međunarodna komunikacija zahtijeva primjenu jezicnih tehnologija u postupku prevođenja i naglasava pitanje elektronskih resursa i alata. Elektronski rjecnici, korpusi, prijevodne memorije, terminoloske baze i strojno prevođenje neke su od tehnologija koje se koriste u prevođenju u EU. U radu se želi prikazati važnost jezicnih tehnologije kako bi se postigli visejezicni standardi i hrvatski moduli ukljucili u visejezicnu komunikaciju EU.
Language Resources and Evaluation, Jul 10, 2023
In today's world of distorted views of life, religious values and beliefs, powerlessness and desp... more In today's world of distorted views of life, religious values and beliefs, powerlessness and despair, there is a growing need for the virtue of mercy which could grant hope, peace and justice to mankind. The aim of this paper is to identify a core terminology in a million-word specialized English corpus that was compiled for the purpose of this research, focusing on the concept of mercy and mercy-related terms The corpus consists of the last three Holy Fathers' pontifical discourses freely available at the Holy See web site. The keyword lists g enerated by WordSmith tools for three subcorpora are contrasted to confirm preliminary assumption on their possible correspondence. The assumption is based on the fact that, regardless of different contexts and times the discourses were created, their authors share common religious legacy, beliefs, views and attitudes based on the Scriptures. The paper attempts to find the corpus-based evidences to an unbroken continuity of spiritual authorities in interpreting and pleading for mercy and justice in their pontifical discourses. This paper aims to contribute to the development of theological reflections based on the virtue of mercy, corpus linguistics, domain-specific terminology and clarity of religious concepts and ideas
The paper describes a methodology for bilingual terminology extraction and termbase building base... more The paper describes a methodology for bilingual terminology extraction and termbase building based on the terminological, lexical and pragmatic criteria along with the translator's knowledge and experience. The research work is conducted on the sentence aligned million- word Croatian-English parallel corpus of legislative texts, the first bigger corpus designed for this language pair so far. In order to assess the hybrid, statistical and linguistic approach as well as the tools for automatic term extraction, the automatically obtained lists of term candidates are compared to the manually created reference list. The term extraction includes multi-word units and single-word units corresponding to multi-word ones. The tools used in this research are: SDL Trados WinAlign (sentence alignment), SDLMultiTermExtract, and WordSmith (for statistically-based term extraction) and NooJ (linguistically-based environment). The evaluation is reported by statistical measures of precision, recall and Fmeasure. The language resources covering a specific domain speed up the translation process, reduce the cost and time and enable communication across different languages and cultures. Also, their application greatly facilitates machine translation and computer-assisted translation, information retrieval, building of multilingual term bases, glossaries and other resources which are prerequisite for the development of a language with insufficient linguistic resources, such as Croatian.
Expert Systems With Applications, 2022
Communications in computer and information science, 2023
Je-LKS : Journal of e-Learning and Knowledge Society, 2020
E-Learning environment implies self-motivation and perseverance in study and completion of learni... more E-Learning environment implies self-motivation and perseverance in study and completion of learning tasks. However, the more autonomy students have in managing their e-Learning, the harder they cope with distractions and remaining focused and engaged. This research study aims to assess the level of student engagement in four e-Learning platforms (CoLaB Tutor, AC-ware Tutor, CM Tutor and Moodle) in higher education. A model for Tracking Student Learning and Knowledge (TSLAK) is developed and based on two sets of variables: variables tracking student's learning activities (VTL) and variables tracking student's knowledge (VTK). This study aims to provide answers on how a model for tracking student online learning and knowledge can be formalized for the four e-Learning platforms and how can student learning and knowledge acquisition processes be described and measured by VTL and VTK. The results obtained by VTL and VTK indicate a significant decline in students' engagement. Out of 218 the most engaged students, 77 (35%) of them used the CoLaB Tutor, 41 (19%) used the AC-ware Tutor, 52 (24%) used the CM Tutor, and 48 (22%) used the Moodle. The research showed that out of the total number of students only 88 (13%) of them were the most engaged and the most successful or more precisely, 63 (71%) graduates and 25 (29%) undergraduates. Such student engagement and success measured by VTL and VTK indicate the necessity of increasing students' motivation in blended learning environments, strengthening their preparation and introduction to e-Learning platforms, and observing their feedback during a research study.
JUCS - Journal of Universal Computer Science
This paper describes and evaluates the performance of a semi-automatic authoring tool (SAAT) for ... more This paper describes and evaluates the performance of a semi-automatic authoring tool (SAAT) for knowledge extraction in the AC&NL Tutor, highlighting its strengths and weaknesses. We assessed the accuracy of automatic annotation tasks (Part-of-Speech tagging, Name Entity Recognition, Dependency parsing, and Coreference Resolution) performed on a dataset of 160 sentences from unstructured Wikipedia text on a computer. We compared the automatic annotations to the gold standard, created after human post-editing and validation. Human-error analysis included 3769 words, 582 subsentences, 1129 questions, 917 propositions, 1020 concepts, and 667 relations. It resulted in the error type classification and the set of custom rules further used for automatic error identification and correction. The results showed that an average of 68.7% of the error corrections referred to CoreNLP performance and 31.3% to the SAAT extraction algorithms. Our main contributions include an integrated approach t...
Information, 2022
Consistent terminology can positively influence communication, information transfer, and proper u... more Consistent terminology can positively influence communication, information transfer, and proper understanding. In multilingual written communication processes, challenges are augmented due to translation variants. The main aim of this study was to implement the Herfindahl-Hirshman Index (HHI) for the assessment of translated terminology in parallel corpora for the evaluation of translated terminology. This research was conducted on three types of legal domain subcorpora, dating from different periods: the Croatian-English parallel corpus (1991–2009), Latin-English and Latin-Croatian versions of the Code of Canon Law (1983), and English and Croatian versions of the EU legislation (2013). After the terminology extraction process, validation of term candidates was performed, followed by an evaluation. Terminology consistency was measured using the HHI—a commonly accepted measurement of market concentration. Results show that the HHI can be used for measuring terminology consistency to ...
Angelina Gaspar Faculty of Humanities and Social Sciences Catholic Faculty of Theology, Universit... more Angelina Gaspar Faculty of Humanities and Social Sciences Catholic Faculty of Theology, University of Split, Croatia ABSTRACT This paper presents a corpus-based approach to semi-automatic extraction of English phrasal verbs, very productive, but complex and often non-transparent lexical units, via particles (prepositions, adverbs) they consist of and which are among the top-ranking functional words in the list of running words of the British National Corpus (BNC). The research is carried out on a comparable English corpus of publicly available legal texts consisting of 392 255 words and using WordSmith Tools 6.0. The evaluation of the system efficiency is conducted via the statistical measures of Precision, Recall and F-measure, whereas the list of phrasal verbs is checked against the reference source Cambridge Phrasal Verbs Dictionary (2015). The results show that the process of semi-automatic extraction of phrasal verbs requires a considerable human intervention as well as control...
INFuture2007The Future of …, 2007
Sentence alignment represents the basis for computer-assisted translation (CAT), terminology mana... more Sentence alignment represents the basis for computer-assisted translation (CAT), terminology management, term extraction, word alignment and crosslinguistic information retrieval. Created out of the sentence alignment process, translation memory (TM) represents the basis for further research in translation equivalencies. Automatic sentence alignment, based on parallel texts, faces two types of problems: robustness and discrepancies between source and target texts in layout and omissions which have an influence on the accuracy of the alignment process. The aim of the paper is to present research on the sentence alignment process carried out on the Croatian-English parallel texts (laws, regulations, acts and decisions) and implemented by the alignment tool WinAlign 7.5.0 by SDL Trados 2006 Professional. The alignment process and its impact on the creation of translation memories is presented through comparison of translation memories that differ regarding the levels of expert intervention in the set up of the alignment program and preparation of the source text for the segmentation. Recommendations for further development using statistical analysis, automatic learning techniques and language knowledge are suggested.