Santhosh Thottingal - Academia.edu (original) (raw)
Uploads
Papers by Santhosh Thottingal
ArXiv, 2015
The quality and quantity of articles in each Wikipedia language varies greatly. Translating from ... more The quality and quantity of articles in each Wikipedia language varies greatly. Translating from another Wikipedia is a natural way to add more content, but the translation process is not properly supported in the software used by Wikipedia. Past computer-assisted translation tools built for Wikipedia are not commonly used. We created a tool that adapts to the specific needs of an open community and to the kind of content in Wikipedia. Qualitative and quantitative data indicates that the new tool helps users translate articles easier and faster.
This paper presents OPUS-MT a project that focuses on the development of free resources and tools... more This paper presents OPUS-MT a project that focuses on the development of free resources and tools for machine translation. The current status is a repository of over 1,000 pre-trained neural machine translation models that are ready to be launched in on-line translation services. For this we also provide open source implementations of web applications that can run efficiently on average desktop hardware with a straightforward setup and installation.
This paper presents a finite state transducer approach to morphology analyser and generator for M... more This paper presents a finite state transducer approach to morphology analyser and generator for Malayalam language, an agglutinative, inflectional Dravidian language spoken by 38 million people, mainly by people from Kerala, India. This system, named as Mlmorph, is implemented using Stuttgart Finite State Transducer(SFST) formalism and uses Helsinki Finite-State Technology(HFST) as Toolkit. Evaluations show that it is fast and effective to address the morphological and phonological nature of Malayalam. Applications like spellchecker, named entity recognition, number spell out parser and generator are also built on top of Mlmorph.
Graphemics in the 21st Century
This volume of the Series Grapholinguistics gathers contributions by the participants of the Grap... more This volume of the Series Grapholinguistics gathers contributions by the participants of the Graphemics in the 21st Century (/gʁafematik/) conference that was organized by Yannis Haralambous with the support of IMT Atlantique and the CNRS (UMR 6285 LabSTICC, unit DECIDE) and was held in Brest from June 13 to June 15, 2018. Its aim was to bring together disciplines concerned with writing sys tems and their representation in written communication, as well as to reflect on the current state of research in the area, and on the role that writing and writing systems play in neighboring disciplines like com puter science and information technology, communication, typography, psychology, and pedagogy. Not surprisingly, the papers gathered in this volume belong to var ious disciplines and consider writing from different points of view, in volving linguistics, history, archeology, education, and natural language processing. In his paper " 'Alphabetic writing is in and for itself the more intel ligent Form'. Reflections on the evaluation of writing systems" (in Ger man language), Coulmas takes Hegel's statement in favor of the alpha betic script as a starting point and questions whether, and how, writ ing systems can be compared. This paper is followed by Küster's "Open and Closed Writing Systems. Some Reflections," which gives a differ ent classification approach of scripts based on their openness, e.g., their possibility to increase their set of graphs. The paper "The History of the Graphematic Foot in English and Ger man" by Evertz deals with a notion inspired by the phonological foot unit (hierarchically located between the syllable and the word), the gra phematic foot. The author discusses its pertinence, providing examples in English and German. In several languages there have been attempts to provide genderneu tral forms. The paper "Graphemic Methods for GenderNeutral Writing" by Haralambous & Dichy describes a particular case of genderneutral forms, namely graphemic ones, in
Manjari typeface was an experiment to use spiral splines to define the curves of Malayalam glyphs... more Manjari typeface was an experiment to use spiral splines to define the curves of Malayalam glyphs. It was a milestone in the progression of script towards maximally rounded glyphs. The script has been evolving from its rectangular characteristics in the early days of metallic types to the circular characteristics of popular digital fonts. The design of curves in Manjari are theoretically based on the PhD thesis by Raph Levien-" From Spiral to Spline: Optimal Techniques in Interactive Curve Design ". In this paper we present the design principles and aesthetic concepts behind this font. We will also discuss the popularity and adoption of Manjari in the digital as well as print media.
ArXiv, 2015
The quality and quantity of articles in each Wikipedia language varies greatly. Translating from ... more The quality and quantity of articles in each Wikipedia language varies greatly. Translating from another Wikipedia is a natural way to add more content, but the translation process is not properly supported in the software used by Wikipedia. Past computer-assisted translation tools built for Wikipedia are not commonly used. We created a tool that adapts to the specific needs of an open community and to the kind of content in Wikipedia. Qualitative and quantitative data indicates that the new tool helps users translate articles easier and faster.
This paper presents OPUS-MT a project that focuses on the development of free resources and tools... more This paper presents OPUS-MT a project that focuses on the development of free resources and tools for machine translation. The current status is a repository of over 1,000 pre-trained neural machine translation models that are ready to be launched in on-line translation services. For this we also provide open source implementations of web applications that can run efficiently on average desktop hardware with a straightforward setup and installation.
This paper presents a finite state transducer approach to morphology analyser and generator for M... more This paper presents a finite state transducer approach to morphology analyser and generator for Malayalam language, an agglutinative, inflectional Dravidian language spoken by 38 million people, mainly by people from Kerala, India. This system, named as Mlmorph, is implemented using Stuttgart Finite State Transducer(SFST) formalism and uses Helsinki Finite-State Technology(HFST) as Toolkit. Evaluations show that it is fast and effective to address the morphological and phonological nature of Malayalam. Applications like spellchecker, named entity recognition, number spell out parser and generator are also built on top of Mlmorph.
Graphemics in the 21st Century
This volume of the Series Grapholinguistics gathers contributions by the participants of the Grap... more This volume of the Series Grapholinguistics gathers contributions by the participants of the Graphemics in the 21st Century (/gʁafematik/) conference that was organized by Yannis Haralambous with the support of IMT Atlantique and the CNRS (UMR 6285 LabSTICC, unit DECIDE) and was held in Brest from June 13 to June 15, 2018. Its aim was to bring together disciplines concerned with writing sys tems and their representation in written communication, as well as to reflect on the current state of research in the area, and on the role that writing and writing systems play in neighboring disciplines like com puter science and information technology, communication, typography, psychology, and pedagogy. Not surprisingly, the papers gathered in this volume belong to var ious disciplines and consider writing from different points of view, in volving linguistics, history, archeology, education, and natural language processing. In his paper " 'Alphabetic writing is in and for itself the more intel ligent Form'. Reflections on the evaluation of writing systems" (in Ger man language), Coulmas takes Hegel's statement in favor of the alpha betic script as a starting point and questions whether, and how, writ ing systems can be compared. This paper is followed by Küster's "Open and Closed Writing Systems. Some Reflections," which gives a differ ent classification approach of scripts based on their openness, e.g., their possibility to increase their set of graphs. The paper "The History of the Graphematic Foot in English and Ger man" by Evertz deals with a notion inspired by the phonological foot unit (hierarchically located between the syllable and the word), the gra phematic foot. The author discusses its pertinence, providing examples in English and German. In several languages there have been attempts to provide genderneu tral forms. The paper "Graphemic Methods for GenderNeutral Writing" by Haralambous & Dichy describes a particular case of genderneutral forms, namely graphemic ones, in
Manjari typeface was an experiment to use spiral splines to define the curves of Malayalam glyphs... more Manjari typeface was an experiment to use spiral splines to define the curves of Malayalam glyphs. It was a milestone in the progression of script towards maximally rounded glyphs. The script has been evolving from its rectangular characteristics in the early days of metallic types to the circular characteristics of popular digital fonts. The design of curves in Manjari are theoretically based on the PhD thesis by Raph Levien-" From Spiral to Spline: Optimal Techniques in Interactive Curve Design ". In this paper we present the design principles and aesthetic concepts behind this font. We will also discuss the popularity and adoption of Manjari in the digital as well as print media.