en:cnk:intercorp - Příručka ČNK (original) (raw)

Table of Contents

InterCorp is a large parallel synchronic corpus covering a number of languages. The corpus is compiled mostly by teachers and students of the Faculty of Arts, Charles University in Prague, and by other collaborators of the ICNC. It serves as a source of data for theoretical studies, lexicography, student research, (foreign) language learning, computer applications, translators and also for the general public.

All texts in InterCorp and all features of the search interface are available after free registration and login via KonText or Treq interface. The registration is identical for all public ICNC corpora. No special registration for InterCorp is required if you already have user login and password for the Czech part of InterCorp.

InterCorp is a part of the Czech National Corpus, a project funded by the Ministry of Education of the Czech Republic within the programme Large Research, Development and Innovation Infrastructures (LM2018137; 2020–22). In 2016-2019, 2012-2015 and 2005-2011 the project was supported from the same source (projects no. LM2015044, LM2011023 and 0021620823, respectively). The entire project is academic and non-commercial.

Description

Starting with Release 6, InterCorp can be seen as referential: all its previous releases stay available in their originally published form. The volume of texts, the number of languages and the extent of annotation (lemmatization and tagging) may grow with each new release and the introduction of new tools.

For more details about the individual releases of InterCorp see the overview below:

The corpus consists of two parts: core and collections. The core of InterCorp consists mostly of fiction with manually checked alignments. Collections are texts acquired in multiple languages, processed and aligned automatically: concordances may include more misaligned segments. Moreover, collection do not always include all texts from the original source, such as texts without a Czech counterpart. Some texts from the Acquis Communautaire and Europarl corpora have been partially corrected or omitted – as a result, they may differ in form or size if compared with the original source. A similar selection was applied to the Open Subtitles database, where – as an additional reduction – only a single translation was selected per title and language. On the other hand, some metadata items missing in the original resource but detectable from context or other sources have been added.

Each text has a Czech counterpart. As a result, Czech is the pivot language: for every text there is a single Czech version (original or translation), aligned with one or more foreign-language versions.

InterCorp can be accessed via a standard web browser from the integrated search interface of the Czech National Corpus KonText (previously also via NoSketch Engine and Park). There is a Czech tutorial on Kontext.

Specifying a parallel query

Result of a query for substrings lieb and lov

Project coordination, technical support and web pages administration

Discussion group

intercorp(at mark)ff.cuni.cz - group address, please use when appropriate

Participants

Coordinators for specific languages

Arabic Doc. PhDr. Petr Zemánek CSc. Institute of Comparative Linguistics PhDr. Jiří Milička, Ph.D. Institute of the Czech National Corpus
Belarusian PhDr. Veranika Bialkovich
Bulgarian Prof. PhDr. Hana Gladkova, CSc. Department of South Slavonic and Balkan Studies Mgr. Natalie Kalajdžievová Ph.D. Department of South Slavonic and Balkan Studies
Catalan Mgr. Andreu Bauçà i Sastre, Ph.D. Centre Carlemany de Llengua Catalana, Department of Romance Studies, Ensenyament Superior, Recerca i Ajuts a l’Estudi, Govern d'Andorra
Chinese Mgr. Vlastimil Dobečka Department of Asian Studies, Faculty of Arts, Palacký University, Olomouc
Croatian Mgr. Karel Jirásek, Ph.D. Department of South Slavonic and Balkan Studies
Danish Mgr. Jana Pavlisová Mgr. Kateřina Haušildová Department of Germanic Studies
Dutch Mgr. Eliška Boková PhDr. Zdenka Hrnčířová Department of Germanic Studies
English Mgr. Denisa Šebestová Department of English Language and ELT Methodology doc. PhDr. Markéta Malá, Ph.D. Department of Linguistics Mgr. Michal Kubánek Department of English and American Studies, Faculty of Arts, Palacký University Olomouc
Finnish Mgr. Lenka Fárová, Ph.D. Department of Germanic Studies
French PhDr. Olga Nádvorníková Ph.D. Department of Romance Studies
German Mgr. Štěpán Zbytovský, Ph.D. Department of Germanic Studies Mgr. Tomáš Káňa, Ph.D. Department of German Language and Literature, Faculty of Education, Masaryk University, Brno PhDr. Hana Peloušková, Ph.D. Department of German Language and Literature, Faculty of Education, Masaryk University, BrnoPhDr. Vít Dovalil, Ph.D. Department of Germanic Studies
Hindi Mgr. Nora Melnikova, Ph.D. Institute of South and Central AsiaBc. Vojtěch Diatka Department of Linguistics
Hungarian Mgr. Simona Kolmanová, Ph.D. Department of Central European Studies
Italian doc. Pavel Štichauer, Ph.D. Department of Romance Studies
Japanese Mgr. Petra Kanasugi, Ph.D. Institute of East Asian Studies
Latvian Mgr. Michal Škrabal, Ph.D. Institute of the Czech National CorpusMgr. Marija Lazar
Lithuanian Mgr. Věra Kociánová RNDr. Hana Skoumalová, Ph.D.
Macedonian PhDr. Michala Adamová Institute of the Czech National Corpus Mgr. Vojkan Milenković
Norwegian Mgr. Pavel Vondřička Ph.D. Institute of the Czech National Corpus
Polish Mgr. Łucja Bańczyk Dr. Renata Dybalska Department of Central European Studies
Portuguese PhDr. Jaroslava Jindrová Ph.D. Department of Romance Studies
Romani Ruben Pellar, Master of Arts, Ph.D.
Romanian Ing. Alexandr KrestovskýUniverzita Karlova v Praze CERGE
Russian PhDr. Natálie Rajnochová, Ph.D. Department of East European Studies Mgr. Naděžda Runštuková
Serbian PhDr. Ana Adamovičová Institute of Czech Studies
Slovak doc. PhDr. Mira Nábělková CSc. Department of East European Studies
Slovenian Mgr. Leoš Soustružník Mgr. David Blažek, Ph.D. Institute of Slavonic Studies, Czech Academy of Sciences
Spanish Doc. PhDr. Petr Čermák, Ph.D. Department of Romance Studies
Swedish Lenka John Embassy of SwedenMgr. Silvie Cinková, Ph.D.
Ukrainian Dr. Natalia Kotsyba

Citing InterCorp

Specific language combination: Author 1, Author 2 & Author 32) (2022): InterCorp – English, German 3), Release 15 of 11 November 2022. Institute of the Czech National Corpus, Charles University, Prague. Available from: http://www.korpus.cz

Whole corpus: Rosen, A., Vavřín, M. & Zasina, A. J. (2022): InterCorp, Release 15 of 11 November 2022. Institute of the Czech National Corpus, Charles University. Available from: http://www.korpus.cz

Čermák, F. & Rosen, A. (2012): The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics, 17(3), 411–427. electronic version at IngentaConnect, preprint version

See also