Snezana Petrovic | Институт за српски језик САНУ (original) (raw)
Uploads
Books by Snezana Petrovic
From September 5th to 7th, 2006 the international scholarly symposium “Slavic Etymology today” to... more From September 5th to 7th, 2006 the international scholarly symposium “Slavic Etymology today” took place in Belgrade, Serbia. Organized by the Serbian Academy of Sciences and Arts and the Serbian Language Institute of SASA, it was held under the auspices of the Etymological Commission of the International Slavistic Committee. More than a year later the proceedings of this symposium are now being published, under the same title — a rather prosaic one, which is not due to the lack of imagination on the side of the conference organizers, but rather to their intention to clearly express their initial purpose to gather — by personal invitation
— a representative number of the most prominent specialists in the field of Slavic etymology, coming from all
leading centers of this kind of research, as well as some selected scholars from adjacent disciplines, in order to provide not only the maximum insight into present-
day approaches to etymological studies, but also to present various possibilities of their interdisciplinary connections.
We hope that the readers of this book will be able to share the unanimous impression of the organizers, participants
and guests of the Belgrade symposium that this gathering was successful in fulfilling this task.
This monograph contains the final versions all the articles accepted for the symposium, presented at it and timely submitted to the editorial board —a total of thirty six papers, here arranged in an alphabetical order. Countrywise,
the most numerous individual contribution, with nine articles, comes from Serbia proper, but prevailing are
those from other Slavic and European countries: seven from Russia, four from Bulgaria, three each from Poland and the Czech Republic, two each from Slovenia and Cyprus, and one each from Slovakia, Macedonia, Ukraine, France, Italy and Romania. It is noteworthy that through their collaborators
almost all ongoing projects for etymological thesaurus-type dictionaries are represented: both Proto-Slavic dictionaries,
the Mocow and the Cracow one, as well as the voluminous etymological dictionaries of the Old Church Slavonic,
Bulgarian, Slovenian, Ukrainian and Serbian languages. The methodological scope of these contributions varies in a wide range from the traditional philological approach which employs a contextual analysis of attestations for establishing the original meaning of a word as a basis for its etymology, to
the attempts at amending and supplementing the established phonetic laws, which opens new prospects for reconsidering some old etymologies and proposing new ones. While those works combine etymological quests with historical phonology and accentology, the others focus on some hitherto insufficiently studied word-formation models
or morphological phenomena or on demonstrating the benefit etymology can take from historical syntax as well as from the folklore text linguistics. In the majority of works, not only present but often dominating, are some topics in historical semantics, such as the typology of semantic fields. Duly positioned is the method of studying words by word-families, by which the potential traps of their individual treatment are avoided. The width and depth of achieved comparative insights vary from prehistoric — Indo-European, Balto-Slavic, Proto-Slavic — to present-day Balkanistic prospectives. The objects of study are most diverse lexical categories: from words early attested in Church Slavonic manuscripts to modern dialectisms whose recent dating often does not speak against their great antiquity; from lexemes with a firm terminological status to expressive forms and argotisms; from archaisms which can be projected on the deepest levels of the proto-language to fairly recent borrowings into Slavic languages or from Slavic into other languages. Some papers deal with tracing loanwords, their chronological stratification and areal distribution, which all reflect not only the history of contacts between the languages
and nations, but also the cultural history of the respective parts of Europe. A certain number of papers present some disciplines that are closely related to etymology, such as toponomastics or ethnolinguistics. Following the methodological postulation for a comparative study of "Worter und Sachen” type, a number of authors reach for information and data from extralinguistic disciplines, such as ethnology, botany, zoology, etc. Bearing all that in mind, one can assert that this monograph reflects, to a great extent,
the present state of Slavic etymology, with all its complexity and interdisciplinary intertwining.
In accord with the tradition of the discipline in which multilingualism is not only a postulate of the scholarly profile but also a basis for mutual
communication, the papers and their summaries feature almost all Slavic languages, as well as English, German
and French. All the communications in Slavic languages are provided with summaries in one of the world languages, which makes it possible for scholars outside the domain of
Slavistics to have an insight into the contents of this monograph. For the sake of economy, the commonly
known and most frequently used abbreviations are not given among the references accompaning individual articles, but collected in a separate list following the last paper. In order to make the book more reader-friendly, at its very end an index of selected words is provided.
Papers by Snezana Petrovic
Juznoslovenski filolog, 2015
UDK 808 СНЕЖАНА М. ПЕТРОВИЋ* Институт за српски језик САНУ Београд МАРИЈА С. ЂИНЂИЋ** Институт за... more UDK 808 СНЕЖАНА М. ПЕТРОВИЋ* Институт за српски језик САНУ Београд МАРИЈА С. ЂИНЂИЋ** Институт за српски језик САНУ Београд КУЛТУРНА ПОЗАЈМЉЕНИЦА-СРПСКИ ЈОГУРТ ИЗМЕЂУ ИСТОКА И ЗАПАДА*** Рад разматра историјат и порекло речи јогурт у српском језику, самостално и у пословици Ко се једном на млеко опече, тај и у јогурт дува, из перспективе раздвајања слојева османског, постосманског и нетурског културног наслеђа, као и његову семантичку посебност у односу на турски предложак. У светлу процеса културног позајмљивања ова реч посматрана је и у ширем, балканском и европском контексту. Кључне речи: јогурт, српски језик, турцизам, културно по зај мљива ње, фразеологија.
Dictionaries and Societies, Proceedings of the XX Euralex International Congress, 2022
In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the big... more In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the biggest obstacle is the lack of machine-readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment-OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now-OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many open-source and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are not efficient enough. This paper presents the OCR software called "SCyDia", developed to overcome this issue. We demonstrate the organizational structure of the OCR software "SCyDia" and the first results. The "SCyDia" is a web-based software solution that relies on the open-source software "Tesseract" in the background. "SCyDia" also contains a module for semi-automatic text correction. We have already processed over 15000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the "SCyDia" by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.
HISTORICAL LEXICOGRAPHY OF THE SERBIAN LANGUAGE, 2021
The aim of this paper is to present the advantages of writing and publishing dictionaries in the ... more The aim of this paper is to present the advantages of writing and publishing dictionaries in the digital environment from the standpoints of both authors and users. The paper
presents examples from three digitized historical dictionaries – Russian, Dutch (with Frisian)
and Polish – and the possibilities of data retrieval enabled by the ways the lexicographic
material was modeled. On the example of one lemma from the historical dictionary of
Serbian, the authors propose the principles of data modeling in accordance with the international standards for text annotation.
Digital Humanities Workshop, 2022
This paper aims to present the digitization process of a very important piece of Serbian intangib... more This paper aims to present the digitization process of a very important piece of Serbian intangible cultural heritage, Српске народне пословице и друге различне као оне у обичаj узете риjечи (Engl. Serbian folk proverbs), compiled by Vuk Stefanović Karadžić during the first half of the 19th century. In the paper, we discuss the necessary steps in the digitization process, the challenges we had to deal with as well as the solutions we came up with. The goal of this process is to have a fully digitized, user-friendly version of Serbian folk proverbs, that will also easily integrate and be compatible with other digitized resources and/or multi-dictionary portals. CCS Concepts: • Applied computing → Extensible Markup Language (XML); Arts and humanities; Digital libraries and archives; Annotation.
Balcanica, 2003
Институт за српски језик, Београд Путеви лексичког позајмљивања на Балкану Позајмљенице из албанс... more Институт за српски језик, Београд Путеви лексичког позајмљивања на Балкану Позајмљенице из албанског у српском призренском говору Апстракт: У раду се етимолошки анализирају следеће позајмљенице из албанског у српском призренском говору: бајмак m., adj. indecl. "(човек или животиња) кривих ногу", цуб adj. indecl. "кратак; кус", цуб m., adj. indecl. "хајдук, одметник", ћул adj. indecl. "потпуно мокар", ђиза f. "врста ситног, трошног сира", глистра f. "глиста", корсе, корсем, крсем adv. "тобоже, кобајаги", кулме n., куљма f. "врх крова", љајка f. "лаж", љапер m. "мангуп", љочка f. "душа, срце", љум adj. indecl. "драги", љунга f. "израслина, оток", путарка f. "осушена и усољена рибља икра", равш adv. "равно", роктар m. "слуга, најамник", шкрет adj. "пуст", шкрум adj. indecl. "потпуно сув". Проучавање албанско-словенских и, уже гледано, албанско-српских језичких контаката има значајну традицију и обухвата велики број студија и радова посредно или непосредно посвећених тој теми 1. Посебну пажњу научника привлачили су међусобни утицаји ових језика одражени на нивоу лексике, а резултати тих истраживања превасходно су допринели проширивању сазнања из области историје, дијалектологије и етимологије албанског и словенских језика, као и балканологије. Утицај албанског језика на српски, осим у мањем броју случајева, ареално је ограничен на области међусобних додира двају народа и језика-Црну Гору и Косово и Метохију. Досадашња проучавања су показала да позајмљенице из албанског припадају углавном дијалекатској лексици, као и да се могу сврстати у неколико одређених семантичких група. Међутим, за потпунију слику српско-албанских лексичких односа било би потребно детаљније истраживање што већег дијалекатског корпуса оба језика 2. Испитивања српских народних говора, пре свега оних који се налазе у непосредном додиру са албанским језиком, донело је последњих деценија обиље нове лексичке грађе. Етимолошка обрада тог материјала свакако би предочила досада незабележене позајмљенице из албанског језика у српском и на тај начин омогућила успостављање нових и проширивање постојећих изоглоса, као и исцрпнију анализу међујезичке интерференције на Балкану.
The Image of the Monolingual Dictionary Across Europe. Results of the European Survey of Dictionary use and Culture, 2018
The article presents the results of a survey on dictionary use in Europe, focusing on general mon... more The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who have tended to dominate such studies thus far. The survey was delivered via an online survey platform, in language versions specific to each target country. It was completed by 9,562 respondents, over 300 respondents per country on average. The survey consisted of the general section, which was translated and presented to all participants, as well as country-specific sections for a subset of 11 countries, which were drafted by collaborators at the national level. The present report covers the general section.
International Journal of Lexicography, 2018
The article presents the results of a survey on dictionary use in Europe, focusing on general mon... more The article presents the results of a survey on dictionary use in Europe, focusing on
general monolingual dictionaries. The survey is the broadest survey of dictionary
use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty
countries. Our survey covers varied user groups, going beyond the students and
translators who have tended to dominate such studies thus far. The survey was
delivered via an online survey platform, in language versions specific to each target
country. It was completed by 9,562 respondents, over 300 respondents per country
on average. The survey consisted of the general section, which was translated and
presented to all participants, as well as country-specific sections for a subset of 11
countries, which were drafted by collaborators at the national level. The present report covers the general section.
Српска славистика : колективна монографија. Том 1, Језик, Радови српске делегације на XVI међународном конгресу слависта, Vol. 1 (2018) Article 16 (p. 245-258), 2018
Two etymological dictionaries of the Serbian language are currently being compiled – Etimološki r... more Two etymological dictionaries of the Serbian language are currently being compiled – Etimološki rečnik
srpskog jezika (ERSJ), an exhaustive, thesaurus-type, long-term project, and Priručni etimološki rečnik srpskog
jezika (PERSJ),a concise manual.
These dictionaries differ conceptually in two major aspects: a) the lexical material they are based on
– exhaustive literary and dialectal material for ERSJ on the one hand, and standard Serbian for PERSJ on the
other; b) the audiences they address – experts in etymology and linguists in general for ERSJ on the one hand,
and both linguistic experts and the broader audience for PERSJ on the other.
The aim of this paper was to investigate whether the conceptual differences between the two dictionaries
have resulted in differences in methodological treatment. The analysis of the respective material (covering letters
a- and b-) has exposed the following differences: a smaller number of lemmas in PERSJ than in ERSJ (§
3.1); a more nest-oriented approach in the organization of lemmas in PERSJ than in ERSJ (§ 3.2); more recent
borrowings in PERSJ than in ERSJ (§ 3.3); the headword in PERSJ is a form from standard Serbian, accented
and grammatically determined (§ 5.1.2), while in ERSJ the headword is a kind of ’hyperlemma’, an initial derivation
form without accent (§ 5.1.1); the first part of the lemma in PERSJ is significantly shorter than in ERSJ,
due to a more restrictive selection of derivatives and to the absence of dialectal material, semantic definitions,
references and place names (§ 5.2.2); a more precise identification of the etymon in the case of inter-Slavic borrowings
in PERSJ, especially in the case of loanwords from (Old) Church Slavonic (§ 6.3); the explanation of
258 Снежана М. Петровић, Марија Д. Вучковић
synonyms and functional or regional variants, often given in the third part of the lemma in PERSJ, are absent
from ERSJ (§ 7.6); in the case of loanwords, parallels from other languages placed in the second part of the
lemma in ERSJ, are transferred to the third part in PERSJ (§ 6.2); information on word history and cultural history
in a broader sense appears more frequently, and is more detailed in PERSJ than in ERSJ (§ 7.7); references in
PERSJ are clustered at the end of the lemma, whilst in ERSJ each piece of data cited has its own reference (§ 8). / У области савремене српске етимолошке лексикографије у току је израда два речника: вишетомног Етимолошког речника српског језика (ЕРСЈ) и Приручног етимолошког речника српског језика (ПЕРСЈ). Први је речник типа тезауруса, чија грађа обухвата целокупан лексички материјал, док се други ограни-чава на тумачење основног лексичког фонда. Ова два речника имају различите концепције, што се огледа у начину селекције и организације лексичке грађе, у начину формирања одредница и избора насловне речи, затим у структури одреднице, као и у начину цитирања литературе и извора. Циљ рада је да се поређењем одговарајућих сегмената ова два речника (леме на а-и б-) идентификују и представе разлике у методолошкој обради одредница, проистекле из различитих концепција. Кључне речи: етимологија, етимолошки речници, лексикографска методологија, металекси-кографија, српски језик. 0. При Етимолошком одсеку Института за српски језик САНУ напоредо се израђују два етимолошка речника српског језика. Крајем прошлог века отпочео је дугорочни пројекат израдe Етимолошког речника српског језика (ЕРСЈ), 1 који је ко-нципиран као тезаурусни вишетомник, намењен првенствено стручној јавности. Тај речник карактерише широки захват у књижевни језик и дијалекте 2 ван кога остају практично само сасвим рецентни лексички слојеви. 3 Млађи пројекат израде При-ручног етимолошког речника српског језика (ПЕРСЈ), 4 који се фокусира на основни * snezzanaa@gmail.com ** Овај рад је настао у оквиру пројекта " Етимолошка истраживања српског језика и израда Етимо-лошког речника српског језика " (178007), који финансира Министарство просвете, науке и технолошког развоја Републике Србије. 1 До сада су објављене три свеске ЕРСЈ, које обухватају лексеме од а до бјенути. 2 О концепцијским променама у домену дијалекатског обухвата в. детaљније ЕРСЈ 1: 6. 3 О селективном приступу одређеним сегментима лексике в. детaљније ОС XVI–XVIII. 4 Анализа се заснива на радној верзији ПЕРСЈ, која подлеже накнадном редиговању. Како су оба реч-ника коауторски подухвати, одреднице, иако имају исту општу структуру, неминовно носе печат струч-ног профила, па чак и личности својих основних обрађивача.
In the paper, a vocabulary of 91 Turkish loanwords is presented. They are excerpted from the prin... more In the paper, a vocabulary of 91 Turkish loanwords is presented. They are excerpted
from the printed version of the early XVIth century Czech manuscript entitled
Memoirs of a Janissary, written by Konstantin Mihailović. The loanwords are provided with
an etymological explanation and the evidence of their presence in selected dictionaries and
lexical databases of the Czech language is given.
Зборник Матице српске за филологију и лингвистику, 1993
The paper discusses the process of digitization and annotating lexicographic paper slips from th... more The paper discusses the process of digitization and annotating lexicographic paper slips from the manuscript collection of the Serbian dialect of Prizren, compiled by Dimitrije Čemerikić. It emphasizes two problems of the digitization process: the determination of headwords and the standardized forms of phonetically / orthographically nonstandard lemmas. Various types of solutions are proposed as a contribution to the guidelines for the digitization of similar lexical resources. The increased search possibilities, provided by the availability of standardized lemma forms are important not only because they enrich and off er multi-level insights into this valuable lexicographic material, but also because they will help with the integration of various digitized resources into the Dictionary Portal of the Serbian Language Institute of SASA, which is currently under development.
The paper offers an analysis of the cultural borrowing process of the Turkish word yoğurt ‘yogurt... more The paper offers an analysis of the cultural borrowing process of the Turkish word yoğurt ‘yogurt’ into Serbian. The study begins with the analysis of the Serbian lexical material from the 19th century, when jogurt was marked by usage as an exclusively foreign word. Comparison with the present status of the meaning and usage of the word jogurt in the contemporary Serbian, being a part of the standard language, shows two different layers of the borrowing process: the Ottoman Turkish period and the Non-Turkish one. The most recent, Post-Ottoman borrowing layer is illustrated by the usage of jogurt as an element of a proverb, being a calque from the Turk. Sütten ağzı yanan yoğurdu (veya ayranı) üfleyerek yer (veya içer) “who gets burned by milk, blows into yogurt as well”. Aiming to present a complex image of the loaning ways of this cultural borrowing, the Serbian material is compared with relevant data from other Balkan and European languages.
From September 5th to 7th, 2006 the international scholarly symposium “Slavic Etymology today” to... more From September 5th to 7th, 2006 the international scholarly symposium “Slavic Etymology today” took place in Belgrade, Serbia. Organized by the Serbian Academy of Sciences and Arts and the Serbian Language Institute of SASA, it was held under the auspices of the Etymological Commission of the International Slavistic Committee. More than a year later the proceedings of this symposium are now being published, under the same title — a rather prosaic one, which is not due to the lack of imagination on the side of the conference organizers, but rather to their intention to clearly express their initial purpose to gather — by personal invitation
— a representative number of the most prominent specialists in the field of Slavic etymology, coming from all
leading centers of this kind of research, as well as some selected scholars from adjacent disciplines, in order to provide not only the maximum insight into present-
day approaches to etymological studies, but also to present various possibilities of their interdisciplinary connections.
We hope that the readers of this book will be able to share the unanimous impression of the organizers, participants
and guests of the Belgrade symposium that this gathering was successful in fulfilling this task.
This monograph contains the final versions all the articles accepted for the symposium, presented at it and timely submitted to the editorial board —a total of thirty six papers, here arranged in an alphabetical order. Countrywise,
the most numerous individual contribution, with nine articles, comes from Serbia proper, but prevailing are
those from other Slavic and European countries: seven from Russia, four from Bulgaria, three each from Poland and the Czech Republic, two each from Slovenia and Cyprus, and one each from Slovakia, Macedonia, Ukraine, France, Italy and Romania. It is noteworthy that through their collaborators
almost all ongoing projects for etymological thesaurus-type dictionaries are represented: both Proto-Slavic dictionaries,
the Mocow and the Cracow one, as well as the voluminous etymological dictionaries of the Old Church Slavonic,
Bulgarian, Slovenian, Ukrainian and Serbian languages. The methodological scope of these contributions varies in a wide range from the traditional philological approach which employs a contextual analysis of attestations for establishing the original meaning of a word as a basis for its etymology, to
the attempts at amending and supplementing the established phonetic laws, which opens new prospects for reconsidering some old etymologies and proposing new ones. While those works combine etymological quests with historical phonology and accentology, the others focus on some hitherto insufficiently studied word-formation models
or morphological phenomena or on demonstrating the benefit etymology can take from historical syntax as well as from the folklore text linguistics. In the majority of works, not only present but often dominating, are some topics in historical semantics, such as the typology of semantic fields. Duly positioned is the method of studying words by word-families, by which the potential traps of their individual treatment are avoided. The width and depth of achieved comparative insights vary from prehistoric — Indo-European, Balto-Slavic, Proto-Slavic — to present-day Balkanistic prospectives. The objects of study are most diverse lexical categories: from words early attested in Church Slavonic manuscripts to modern dialectisms whose recent dating often does not speak against their great antiquity; from lexemes with a firm terminological status to expressive forms and argotisms; from archaisms which can be projected on the deepest levels of the proto-language to fairly recent borrowings into Slavic languages or from Slavic into other languages. Some papers deal with tracing loanwords, their chronological stratification and areal distribution, which all reflect not only the history of contacts between the languages
and nations, but also the cultural history of the respective parts of Europe. A certain number of papers present some disciplines that are closely related to etymology, such as toponomastics or ethnolinguistics. Following the methodological postulation for a comparative study of "Worter und Sachen” type, a number of authors reach for information and data from extralinguistic disciplines, such as ethnology, botany, zoology, etc. Bearing all that in mind, one can assert that this monograph reflects, to a great extent,
the present state of Slavic etymology, with all its complexity and interdisciplinary intertwining.
In accord with the tradition of the discipline in which multilingualism is not only a postulate of the scholarly profile but also a basis for mutual
communication, the papers and their summaries feature almost all Slavic languages, as well as English, German
and French. All the communications in Slavic languages are provided with summaries in one of the world languages, which makes it possible for scholars outside the domain of
Slavistics to have an insight into the contents of this monograph. For the sake of economy, the commonly
known and most frequently used abbreviations are not given among the references accompaning individual articles, but collected in a separate list following the last paper. In order to make the book more reader-friendly, at its very end an index of selected words is provided.
Juznoslovenski filolog, 2015
UDK 808 СНЕЖАНА М. ПЕТРОВИЋ* Институт за српски језик САНУ Београд МАРИЈА С. ЂИНЂИЋ** Институт за... more UDK 808 СНЕЖАНА М. ПЕТРОВИЋ* Институт за српски језик САНУ Београд МАРИЈА С. ЂИНЂИЋ** Институт за српски језик САНУ Београд КУЛТУРНА ПОЗАЈМЉЕНИЦА-СРПСКИ ЈОГУРТ ИЗМЕЂУ ИСТОКА И ЗАПАДА*** Рад разматра историјат и порекло речи јогурт у српском језику, самостално и у пословици Ко се једном на млеко опече, тај и у јогурт дува, из перспективе раздвајања слојева османског, постосманског и нетурског културног наслеђа, као и његову семантичку посебност у односу на турски предложак. У светлу процеса културног позајмљивања ова реч посматрана је и у ширем, балканском и европском контексту. Кључне речи: јогурт, српски језик, турцизам, културно по зај мљива ње, фразеологија.
Dictionaries and Societies, Proceedings of the XX Euralex International Congress, 2022
In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the big... more In the currently ongoing process of retro-digitization of Serbian dialectal dictionaries, the biggest obstacle is the lack of machine-readable versions of paper editions. Therefore, one essential step is needed before venturing into the dictionary-making process in the digital environment-OCRing the pages with the highest possible accuracy. Successful retro-digitization of Serbian dialectal dictionaries, currently in progress, has shown a dire need for one basic yet necessary step, lacking until now-OCRing the pages with the highest possible accuracy. OCR processing is not a new technology, as many open-source and commercial software solutions can reliably convert scanned images of paper documents into digital documents. Available software solutions are usually efficient enough to process scanned contracts, invoices, financial statements, newspapers, and books. In cases where it is necessary to process documents that contain accented text and precisely extract each character with diacritics, such software solutions are not efficient enough. This paper presents the OCR software called "SCyDia", developed to overcome this issue. We demonstrate the organizational structure of the OCR software "SCyDia" and the first results. The "SCyDia" is a web-based software solution that relies on the open-source software "Tesseract" in the background. "SCyDia" also contains a module for semi-automatic text correction. We have already processed over 15000 pages, 13 dialectal dictionaries, and five dialectal monographs. At this point in our project, we have analyzed the accuracy of the "SCyDia" by processing 13 dialectal dictionaries. The results were analyzed manually by an expert who examined a number of randomly selected pages from each dictionary. The preliminary results show great promise, spanning from 97.19% to 99.87%.
HISTORICAL LEXICOGRAPHY OF THE SERBIAN LANGUAGE, 2021
The aim of this paper is to present the advantages of writing and publishing dictionaries in the ... more The aim of this paper is to present the advantages of writing and publishing dictionaries in the digital environment from the standpoints of both authors and users. The paper
presents examples from three digitized historical dictionaries – Russian, Dutch (with Frisian)
and Polish – and the possibilities of data retrieval enabled by the ways the lexicographic
material was modeled. On the example of one lemma from the historical dictionary of
Serbian, the authors propose the principles of data modeling in accordance with the international standards for text annotation.
Digital Humanities Workshop, 2022
This paper aims to present the digitization process of a very important piece of Serbian intangib... more This paper aims to present the digitization process of a very important piece of Serbian intangible cultural heritage, Српске народне пословице и друге различне као оне у обичаj узете риjечи (Engl. Serbian folk proverbs), compiled by Vuk Stefanović Karadžić during the first half of the 19th century. In the paper, we discuss the necessary steps in the digitization process, the challenges we had to deal with as well as the solutions we came up with. The goal of this process is to have a fully digitized, user-friendly version of Serbian folk proverbs, that will also easily integrate and be compatible with other digitized resources and/or multi-dictionary portals. CCS Concepts: • Applied computing → Extensible Markup Language (XML); Arts and humanities; Digital libraries and archives; Annotation.
Balcanica, 2003
Институт за српски језик, Београд Путеви лексичког позајмљивања на Балкану Позајмљенице из албанс... more Институт за српски језик, Београд Путеви лексичког позајмљивања на Балкану Позајмљенице из албанског у српском призренском говору Апстракт: У раду се етимолошки анализирају следеће позајмљенице из албанског у српском призренском говору: бајмак m., adj. indecl. "(човек или животиња) кривих ногу", цуб adj. indecl. "кратак; кус", цуб m., adj. indecl. "хајдук, одметник", ћул adj. indecl. "потпуно мокар", ђиза f. "врста ситног, трошног сира", глистра f. "глиста", корсе, корсем, крсем adv. "тобоже, кобајаги", кулме n., куљма f. "врх крова", љајка f. "лаж", љапер m. "мангуп", љочка f. "душа, срце", љум adj. indecl. "драги", љунга f. "израслина, оток", путарка f. "осушена и усољена рибља икра", равш adv. "равно", роктар m. "слуга, најамник", шкрет adj. "пуст", шкрум adj. indecl. "потпуно сув". Проучавање албанско-словенских и, уже гледано, албанско-српских језичких контаката има значајну традицију и обухвата велики број студија и радова посредно или непосредно посвећених тој теми 1. Посебну пажњу научника привлачили су међусобни утицаји ових језика одражени на нивоу лексике, а резултати тих истраживања превасходно су допринели проширивању сазнања из области историје, дијалектологије и етимологије албанског и словенских језика, као и балканологије. Утицај албанског језика на српски, осим у мањем броју случајева, ареално је ограничен на области међусобних додира двају народа и језика-Црну Гору и Косово и Метохију. Досадашња проучавања су показала да позајмљенице из албанског припадају углавном дијалекатској лексици, као и да се могу сврстати у неколико одређених семантичких група. Међутим, за потпунију слику српско-албанских лексичких односа било би потребно детаљније истраживање што већег дијалекатског корпуса оба језика 2. Испитивања српских народних говора, пре свега оних који се налазе у непосредном додиру са албанским језиком, донело је последњих деценија обиље нове лексичке грађе. Етимолошка обрада тог материјала свакако би предочила досада незабележене позајмљенице из албанског језика у српском и на тај начин омогућила успостављање нових и проширивање постојећих изоглоса, као и исцрпнију анализу међујезичке интерференције на Балкану.
The Image of the Monolingual Dictionary Across Europe. Results of the European Survey of Dictionary use and Culture, 2018
The article presents the results of a survey on dictionary use in Europe, focusing on general mon... more The article presents the results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries. The survey is the broadest survey of dictionary use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty countries. Our survey covers varied user groups, going beyond the students and translators who have tended to dominate such studies thus far. The survey was delivered via an online survey platform, in language versions specific to each target country. It was completed by 9,562 respondents, over 300 respondents per country on average. The survey consisted of the general section, which was translated and presented to all participants, as well as country-specific sections for a subset of 11 countries, which were drafted by collaborators at the national level. The present report covers the general section.
International Journal of Lexicography, 2018
The article presents the results of a survey on dictionary use in Europe, focusing on general mon... more The article presents the results of a survey on dictionary use in Europe, focusing on
general monolingual dictionaries. The survey is the broadest survey of dictionary
use to date, covering close to 10,000 dictionary users (and non-users) in nearly thirty
countries. Our survey covers varied user groups, going beyond the students and
translators who have tended to dominate such studies thus far. The survey was
delivered via an online survey platform, in language versions specific to each target
country. It was completed by 9,562 respondents, over 300 respondents per country
on average. The survey consisted of the general section, which was translated and
presented to all participants, as well as country-specific sections for a subset of 11
countries, which were drafted by collaborators at the national level. The present report covers the general section.
Српска славистика : колективна монографија. Том 1, Језик, Радови српске делегације на XVI међународном конгресу слависта, Vol. 1 (2018) Article 16 (p. 245-258), 2018
Two etymological dictionaries of the Serbian language are currently being compiled – Etimološki r... more Two etymological dictionaries of the Serbian language are currently being compiled – Etimološki rečnik
srpskog jezika (ERSJ), an exhaustive, thesaurus-type, long-term project, and Priručni etimološki rečnik srpskog
jezika (PERSJ),a concise manual.
These dictionaries differ conceptually in two major aspects: a) the lexical material they are based on
– exhaustive literary and dialectal material for ERSJ on the one hand, and standard Serbian for PERSJ on the
other; b) the audiences they address – experts in etymology and linguists in general for ERSJ on the one hand,
and both linguistic experts and the broader audience for PERSJ on the other.
The aim of this paper was to investigate whether the conceptual differences between the two dictionaries
have resulted in differences in methodological treatment. The analysis of the respective material (covering letters
a- and b-) has exposed the following differences: a smaller number of lemmas in PERSJ than in ERSJ (§
3.1); a more nest-oriented approach in the organization of lemmas in PERSJ than in ERSJ (§ 3.2); more recent
borrowings in PERSJ than in ERSJ (§ 3.3); the headword in PERSJ is a form from standard Serbian, accented
and grammatically determined (§ 5.1.2), while in ERSJ the headword is a kind of ’hyperlemma’, an initial derivation
form without accent (§ 5.1.1); the first part of the lemma in PERSJ is significantly shorter than in ERSJ,
due to a more restrictive selection of derivatives and to the absence of dialectal material, semantic definitions,
references and place names (§ 5.2.2); a more precise identification of the etymon in the case of inter-Slavic borrowings
in PERSJ, especially in the case of loanwords from (Old) Church Slavonic (§ 6.3); the explanation of
258 Снежана М. Петровић, Марија Д. Вучковић
synonyms and functional or regional variants, often given in the third part of the lemma in PERSJ, are absent
from ERSJ (§ 7.6); in the case of loanwords, parallels from other languages placed in the second part of the
lemma in ERSJ, are transferred to the third part in PERSJ (§ 6.2); information on word history and cultural history
in a broader sense appears more frequently, and is more detailed in PERSJ than in ERSJ (§ 7.7); references in
PERSJ are clustered at the end of the lemma, whilst in ERSJ each piece of data cited has its own reference (§ 8). / У области савремене српске етимолошке лексикографије у току је израда два речника: вишетомног Етимолошког речника српског језика (ЕРСЈ) и Приручног етимолошког речника српског језика (ПЕРСЈ). Први је речник типа тезауруса, чија грађа обухвата целокупан лексички материјал, док се други ограни-чава на тумачење основног лексичког фонда. Ова два речника имају различите концепције, што се огледа у начину селекције и организације лексичке грађе, у начину формирања одредница и избора насловне речи, затим у структури одреднице, као и у начину цитирања литературе и извора. Циљ рада је да се поређењем одговарајућих сегмената ова два речника (леме на а-и б-) идентификују и представе разлике у методолошкој обради одредница, проистекле из различитих концепција. Кључне речи: етимологија, етимолошки речници, лексикографска методологија, металекси-кографија, српски језик. 0. При Етимолошком одсеку Института за српски језик САНУ напоредо се израђују два етимолошка речника српског језика. Крајем прошлог века отпочео је дугорочни пројекат израдe Етимолошког речника српског језика (ЕРСЈ), 1 који је ко-нципиран као тезаурусни вишетомник, намењен првенствено стручној јавности. Тај речник карактерише широки захват у књижевни језик и дијалекте 2 ван кога остају практично само сасвим рецентни лексички слојеви. 3 Млађи пројекат израде При-ручног етимолошког речника српског језика (ПЕРСЈ), 4 који се фокусира на основни * snezzanaa@gmail.com ** Овај рад је настао у оквиру пројекта " Етимолошка истраживања српског језика и израда Етимо-лошког речника српског језика " (178007), који финансира Министарство просвете, науке и технолошког развоја Републике Србије. 1 До сада су објављене три свеске ЕРСЈ, које обухватају лексеме од а до бјенути. 2 О концепцијским променама у домену дијалекатског обухвата в. детaљније ЕРСЈ 1: 6. 3 О селективном приступу одређеним сегментима лексике в. детaљније ОС XVI–XVIII. 4 Анализа се заснива на радној верзији ПЕРСЈ, која подлеже накнадном редиговању. Како су оба реч-ника коауторски подухвати, одреднице, иако имају исту општу структуру, неминовно носе печат струч-ног профила, па чак и личности својих основних обрађивача.
In the paper, a vocabulary of 91 Turkish loanwords is presented. They are excerpted from the prin... more In the paper, a vocabulary of 91 Turkish loanwords is presented. They are excerpted
from the printed version of the early XVIth century Czech manuscript entitled
Memoirs of a Janissary, written by Konstantin Mihailović. The loanwords are provided with
an etymological explanation and the evidence of their presence in selected dictionaries and
lexical databases of the Czech language is given.
Зборник Матице српске за филологију и лингвистику, 1993
The paper discusses the process of digitization and annotating lexicographic paper slips from th... more The paper discusses the process of digitization and annotating lexicographic paper slips from the manuscript collection of the Serbian dialect of Prizren, compiled by Dimitrije Čemerikić. It emphasizes two problems of the digitization process: the determination of headwords and the standardized forms of phonetically / orthographically nonstandard lemmas. Various types of solutions are proposed as a contribution to the guidelines for the digitization of similar lexical resources. The increased search possibilities, provided by the availability of standardized lemma forms are important not only because they enrich and off er multi-level insights into this valuable lexicographic material, but also because they will help with the integration of various digitized resources into the Dictionary Portal of the Serbian Language Institute of SASA, which is currently under development.
The paper offers an analysis of the cultural borrowing process of the Turkish word yoğurt ‘yogurt... more The paper offers an analysis of the cultural borrowing process of the Turkish word yoğurt ‘yogurt’ into Serbian. The study begins with the analysis of the Serbian lexical material from the 19th century, when jogurt was marked by usage as an exclusively foreign word. Comparison with the present status of the meaning and usage of the word jogurt in the contemporary Serbian, being a part of the standard language, shows two different layers of the borrowing process: the Ottoman Turkish period and the Non-Turkish one. The most recent, Post-Ottoman borrowing layer is illustrated by the usage of jogurt as an element of a proverb, being a calque from the Turk. Sütten ağzı yanan yoğurdu (veya ayranı) üfleyerek yer (veya içer) “who gets burned by milk, blows into yogurt as well”. Aiming to present a complex image of the loaning ways of this cultural borrowing, the Serbian material is compared with relevant data from other Balkan and European languages.
The paper describes the process of digitizing and annotating some 23,000 lexicographic paper slip... more The paper describes the process of digitizing and annotating some 23,000 lexicographic paper
slips compiled by the amateur lexicographer Dimitrije Čemerikić (1882-1960) to document the
Serbian dialect from the historic city of Prizren. This previously unpublished dictionary of the
Prizren dialect is an important resource not only for dialectologists and linguists, but also for
ethnolinguists and ethnologists who are interested in various aspects of popular culture and
urban life in the city of Prizren. The alphabetic arrangement of the macrostructure, however,
is not conducive to exploratory searches: if users want to find out which dialect word
corresponds to a standard Serbian word, or explore a certain type of vocabulary, they need
access paths to the dictionary content that go beyond the indexing of the macrostructure. The
paper describes an elaborate annotation strategy based on marking up headwords with
standardized orthographic alternatives, providing lexical equivalents and assigning semantic
fields to entries in order to achieve robust navigability and searchability of the collection
without full-text transcription and/or structural data modeling.
The paper presents the methodological approach of identifying Slavic layers in the standard Serb... more The paper presents the methodological approach of identifying Slavic layers in the
standard Serbian lexicon, applied in the “one volume” Etymological dictionary of the Serbian
language,through the case of two Serbian words: svest ‘consciousness’ and savest ‘conscience’.
The presented language material shows that in the Serbian Slavic and Old Serbian sources,
the word сьвhсть was used in texts with Gospel content in the meaning ‘conscience’, and in
a legal context in the meaning ‘consciousness’. In addition to their phonetic and orthographic
forms being identical in the past, their meanings also intertwined through history – both
words used to mean ‘knowledge in general, consciousness’ and ‘moral knowledge, conscience’.
Over time, one word in three orthographic variants (сьвhсть / /съвhсть // свhсть)
developed two separate forms (svest and savest) the semantics of which narrowed and specialized,
thus resulting in the creation of two independent lexemes at the standard Serbian
language level.
In this paper the etymology of the Serbian noun неимар ‘chief architect, constructor’, explained... more In this paper the etymology of the Serbian noun неимар ‘chief architect, constructor’,
explained as a loanword from Turk. mimar ‘id.’, will be reconsidered and some problems
of this interpretation will be analysed. A potential link with the Serb. verbs наимати,
најмити ‘to hire’ will be pointed out. The specific context of epic poetry, in which this word
was initially used, as well as the historical circumstances connected with the loaning process
will be emphasized.
The paper offers an overall view on the significance of Miklosich’s Die türkischen Elemente in d... more The paper offers an overall view on the significance of Miklosich’s Die türkischen
Elemente in den südost- und osteuropäishen Sprachen for the study of the Turkish loans in
the Serbo-Croatian language and compares it with his other etymological studies from this
perspective. The pioneering character of this Miklosich’s work from the methodological
viewpoint is stressed and certain shortcomings and limitations are pointed out. On the basis
of the selected Serbian lexical material and several examples of the etymology of Turcisms,
the paper shows how and to what extent these studies have been used in later etymological
dictionaries. The analysis indicates that even when these Miklosich’s studies are included
in the list of the literature of the etymological dictionaries of the Serbian Turcisms, they are
usually cited from the second-hand lexicographic sources, therefore their material and etymological
explanations are not entirely incorporated in these dictionaries.
Glasnik Etnografskog instituta, 2014
Turcizms attested in written sources from the pre-standard epoch of the Serbian language are sig... more Turcizms attested in written sources from the pre-standard epoch of the
Serbian language are significant for studying the chronology of this type of loanwords,
as well as for the insight into the history of language and cultural
contacts between the Serbs and the Ottoman Turks. Ljubomir Stojanović’s six
volume book Stari srpski zapisi i natpisi often presents the first written evidence
of turcizms in the Serbian language. Relevant lexicographic resources including
this material in its corpora, and thus offering the historical linguistic data
for the Turkish loanwords in the Serbian language, are very few (RJA, STACHOWSKI
1967 and MИХАЈЛОВИЋ) and not always reliable. The paper analyzes
398 Јужнословенски филолог LXIХ (2013)
sixteen Turkish loanwords from the Stojanović’s book, not included in these
three manuals: aršin, dešerme, juzeli, kesa, mal, mašrapa, milć, tefter (absent
from all the manuals), rakija, šinik (absent from RJA and STACHOWSKI 1967),
kavga, kapičija, kujunžija, ortak, čauš, čoha (absent from STACHOWSKI 1967).
The importance of studying Turkish loanwords from the pre-standard epoch is
presented through the etymological analysis of the word dešerme, hitherto not
mentioned in a single dictionary of the Serbian language, descriptive nor
etymological.
Snežana Petrović Turkish Loa nwords in the Dialects of Monte negro (1) S u m m a r y Base... more Snežana
Petrović
Turkish
Loa nwords
in the
Dialects of Monte negro (1)
S u m m a
r y
Based
on the fact
that
the
Turkish
loanwords
attested in
numerous dialectal
dictionaries
from
Montenegro
show considerable
peculiarities
in comparison
with the other
Serbo-
Croatian lexical
material,
this paper
offers
an
analysis
of some phonetic
and semantic
features
significant
for a more
accurate etymological
explanation of the words
analyzed. Phonetic
forms of the
words jangluš ‘wrong’,
mirićep
‘ink’,
džizdan
‘wallet’
and
ćepča ‘ladle’
,
showing
certain
Turkish archaic
and
dialectal
characteristics,
are more adequately
related
to their
Turkish
etyma and are connected
with respective
word-
forms
from
Serbian
dialects
and other Balkan languages. Some
specific semantic
shifts
in words
are pointed out
(tepter
‘package
of cigarette
paper’, seratlija
‘respectable
man,
gentleman’
and
toprak
‘home;
origin’,
otopračiti (se) ‘acting
as if being
at home’
)
,
which have developed
only
locally or regionally
in Montenegro.