1 the Digital Form of the Thesaurus Dictionary of the Romanian Language (original) (raw)

The Digital Form of the Thesaurus Dictionary of the Romanian Language

2007

The paper argues in favour of an electronic form of the thesaurus dictionary of the Romanian language, the dictionary edited by the Romanian Academy in two editions since 1913. Preliminary steps like scanning, optical character recognition, and pre-processing operations have already been done. The paper presents a prototype for the correction of the digital form of the dictionary. The numerous advantages of the digital thesaurus dictionary are discussed, as a basis for future work in Romanian lexicography and, more generally, in language processing.

The dictionary of Romanian language: steps toward the electronic version

In the context of the globalised Information Society and the variety of solutions for computer-aided acquisition of traditional dictionaries, the paper presents the actual stage of development of the new series of the Romanian Dictionary edited by the Romanian Academy. Through a project financed by the National University Research Council of Romania, some preliminary steps toward a computer-aided acquisition of the dictionary have been made and are outlined in this article.

Digital Lexicographic Systems and Traditional Paper Dictionaries (From Traditional Paper Dictionaries to Digital Lexicographic Systems)

Cognitive Studies | Études cognitives

Digital Lexicographic Systems and Traditional Paper Dictionaries (From Traditional Paper Dictionaries to Digital Lexicographic Systems)Main problems of modern lexicography are under analysis. The theory of lexicographical systems is presented as well as its applications for the description of the structure of lexicographical systems for new digital Ukrainian Explanatory Dictionary and Etymological Dictionary of Ukrainian Language. The concept of virtual lexicographical laboratories is presented. The implementation of two virtual lexicographical laboratories is given (the first for Ukrainian Explanatory Dictionary and next for Etymological Dictionary of Ukrainian Language).

The Page-Building a Pennsylvania German Thesaurus through the Correction of OCR Errors (co-author Camilla Balsamo)

2019

The aim of our project is to build an online Thesaurus of Pennsylvania German. This North American minority language has been in contact with American English ever since its emergence around 1800 and is still lacking a received standard, producing thus quite of a linguistic challenge. The research has been carried out on different text-based sources. We opted for Open Source software: firstly, we scanned images of source-text and converted them using an OCR. Errors occur repeatedly, in conversion phase, either due to flecks of the original text, or through the process of machine encoding. Errors might be varying: non-word detection, word-boundary, fails in punctuation, lack of diacritical marks in the output, tokenization errors and misrecognition of partof-speech (POS). We are processing text through the programming language Python, working on inputs that automatically fix errors without causing further issues. In order to write the algorithms of correction, several techniques have...

The Page-Building a Pennsylvania German Thesaurus through the Correction of OCR Errors

2019

The aim of our project is to build an online Thesaurus of Pennsylvania German. This North American minority language has been in contact with American English ever since its emergence around 1800 and is still lacking a received standard, producing thus quite of a linguistic challenge. The research has been carried out on different text-based sources. We opted for Open Source software: firstly, we scanned images of source-text and converted them using an OCR. Errors occur repeatedly, in conversion phase, either due to flecks of the original text, or through the process of machine encoding. Errors might be varying: non-word detection, word-boundary, fails in punctuation, lack of diacritical marks in the output, tokenization errors and misrecognition of partof-speech (POS). We are processing text through the programming language Python, working on inputs that automatically fix errors without causing further issues. In order to write the algorithms of correction, several techniques have...

Electronic Lexicography in the 21 st Century New Applications for New Users

2011

Continuous technological progress has had a considerable impact on dictionary--making, both on how dictionaries are made, and the format(s) in which they are presented to the user. There has been an ongoing discussion on whether electronic dictionaries will slowly replace paper dictionaries, and in many parts of the world, this has already taken place. It is now the time of online dictionaries and dictionary apps - the users want to click, tap, slide, etc. But dictionary users that use these new technologies are putting new demands on dictionary--makers. It is now often expected that dictionaries are free, offer quick access to all the types of information in them, contain every single word in existence, and include other types of features, e.g. grammar, games, blogs and forums. The question is thus no longer about electronic format competing with the paper format, but more about how to utilize the many advantages of electronic medium to make dictionaries as user friendly as possible. There is another group of users that have been affected by technological progress, namely lexicographers themselves. As corpora get larger and larger, there is more and more data to analyse. Also, the existence of different dictionary formats means that the needs of different types of users have to be met. It is thus essential that the lexicographers are provided with tools that speed up their work, and automate the procedures that do require little human intervention. The papers found in these proceedings from the eLex 2011 conference on electronic lexicography, which took place between November 10 th and 12 th in Bled, Slovenia, contain the reports on electronic dictionaries or ongoing lexicographic projects that seek to address some of these issues. The interest in the conference by both members of the academia and representatives of the industry is clear evidence that electronic lexicography needs an event where current projects are presented and topical issues are discussed. We would like to thank everyone who contributed to the success of the conference: the keynote speakers, the presenters, the sponsors, the programme committee, and the organising committee.

The Dictionary of the Serbian Academy: from the Text to the Lexical Database

2018

In this paper we discuss the project of digitization of the Dictionary of the Serbo-Croatian Standard and Vernacular Language. Scanning and character recognition were a particular challenge, since various non-standard character set encoding was used in the course of the almost 60-year long production of the dictionary. The first aim of the project was to formalize the micro-structure of the dictionary articles in order to parse the digitized text of and transform it into structured data stored in relational lexical database. This approach is compatible with several standard structured forms and ontologies (TEI, LMF, Ontolex, LexInfo). A lexical database model was designed in compliance with these structured forms, following mostly the lemon model. Mapping of the lexical entry markers to LexInfo and TEI enabled export of the lexical data to the mentioned formats. A software solution for the dictionary text analysis, parsing and lexical database population was developed and tested on ...

PRE-REQUISITES FOR THE PREPARATION OF AN ELECTRONIC THESAURUS FOR A TEXT PROCESSOR IN INDIAN LANGUAGES

In this article, we do not propose to evaluate the thesaurus facility available in this text processor for English. We plan to look forward to preparing an Electronic Thesaurus for Text Processing (shortly ETTP) for Indian languages, which, in fact, is more ambitious and complex than the one we have seen above. This will reflect the mental make up, or the psychological make up of the mental lexicon, so that the user can utilize the said thesaurus in whatever way he likes to make use of. Tamil language is taken for this case study. The text processor is so ambitious that suppose one wants to write about a novel centering around a hospital, he will be provided with the lexical items that are related to the hospital situation. This will be a great boon especially in the Indian context, since most writers have difficulty in finding the right word for such conepts in the Indian language they use.