Sara Castagnoli | University Of Macerata, Italy (original) (raw)
Papers by Sara Castagnoli
Lingue e Linguaggi, 2019
Corporate Social Responsibility (CSR) reports constitute a relatively new form of corporate discl... more Corporate Social Responsibility (CSR) reports constitute a relatively new form of corporate disclosure used by companies to present their values and philosophy with respect to socially relevant themes on which they may have an impact, mainly the environment, the community and employees. Companies thus publish CSR reports to communicate with a variety of stakeholders and provide information about their sustainability initiatives, with the ultimate aim of building, reinforcing, and promoting their corporate image. Personalisation plays an important role in the discursive construction of identity and in the definition of relationships between social actors. The personification of the companyobtained through 1 st person plural deixis within corporate reportsis a very powerful rhetorical tool to convey a collective subject which takes responsibility for the actions and results it is giving account of, indicating and enacting a specific relationship with the reader. As a sociopragmatic item, however, it is largely language/culturedependent, and thus represents an interesting locus to observe the impact of translation strategies on the meaning conveyed to the target audience. This paper sets out to analyse how CSR reports translated into English from Italian compareas regards personalisation with reports originally produced in English, in order to detect differences in the way corporate identity is construed and conveyed. The study is based on a bilingual corpus which includes translated English reports and their Italian source texts, as well as comparable originals in English and Italian. Corroborating previous research conducted on similar genres, the study shows that (im)personalisation patterns are considerably different in original and translated English CSR reports, largely due to a tendency for the latter to reproduce Italian conventions in this form of specialised discourse.
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
The paper describes the creation of a manually validated dataset of Italian multiword expressions... more The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
This paper reports on work carried out in the framework of an ongoing project aimed at building a... more This paper reports on work carried out in the framework of an ongoing project aimed at building an online, corpus-based lexicographic resource for Italian Word Combinations. Our aim is to compare two of the most commonly used methods for the automatic extraction of word combinations from corpora, with a view to evaluate their performance – and ultimately their efficacy – with respect to the task of acquiring word combinations for inclusion in the lexicographic combinatory resource.
This article aims to investigate trainee translators’ contrastive pragmalinguistic competence, st... more This article aims to investigate trainee translators’ contrastive pragmalinguistic competence, starting from the assumption that – although more elusively than knowledge about culture-specific references – it represents an important subcomponent of intercultural competence, which can determine the adequacy of translated texts. The study focuses in particular on the translation of interclausal linkage, as it is a form of cohesion which displays different preferences across languages. A multi-parallel corpus of English-to-Italian learner translations of the same source text is analysed to detect regularities and variation in the language behaviour of trainee translators. The frequency of connectives in target texts is compared to both the respective source texts and comparable non-translated Italian texts, in order to determine whether translations are mainly shaped by interference or normalisation. The results of the quantitative analysis confirm previous findings that interference is predominant, with students closely reproducing source text conjunctive patterns at the risk of making translations sound unnatural; more refined qualitative observations, however, reveal that there are also attempts at normalisation. A discussion of the relevance to translator training of the insights obtained is provided, together with suggestions for inclusion in training programmes.
This work introduces SYMPAThy, a data representation model in which the combinatorial properties ... more This work introduces SYMPAThy, a data representation model in which the combinatorial properties of a lexical item are described by merging surface and deeper linguistic information. The proposed approach is then evaluated by comparing, for a sample list of verbal idioms, a set of SYMPAThy-based fixedness indexes against the relevant speaker-elicited indexes available in the descriptive norms collected by Tabossi et al. (2011).
We report on three experiments aimed at comparing two popular methods for the automatic extractio... more We report on three experiments aimed at comparing two popular methods for the automatic extraction of Word Combinations from corpora, with a view to evaluate: i) their efficacy in acquiring data to be included in a combinatory resource for Italian; ii) the impact of different types of benchmarks on the evaluation itself.
The legal knowledge base resulting from the LOIS (Lexical Ontologies for legal Information Sharin... more The legal knowledge base resulting from the LOIS (Lexical Ontologies for legal Information Sharing) project consists of legal WordNets in six languages (Italian, Dutch, Portuguese, German, Czech, English). Its architecture is based on the EuroWordNet (EWN) framework (Vossen et al, 1997). Using the EWN framework assures compatibility of the LOIS WordNets with EWN, allowing them to function as an extension of EWN for the legal domain. For each legal system, the document-derived legal concepts are integrated into a taxonomy, which links into existing formal ontologies. These give the legal wordnets a first formal backbone, which can, in future, be further extended. The database consists of 33,000 synsets, and is aimed to be used in information retrieval, where it provides mono-and multi-lingual access to European legal databases for legal experts as well as for laymen. The LOIS knowledge base also provides a flexible, modular architecture that allows integration of multiple classificat...
Daniel Gile, Gyde Hansen and Nike K. Pokorn (eds.) Why Translation Studies Matter. Amsterdam: John Benjamins Publishing, 2010
Within recent years, corpora have gained considerable importance in Translation Studies, and a nu... more Within recent years, corpora have gained considerable importance in Translation Studies, and a number of studies have also appeared which show their value for translator training (e.g. Zanettin et al. 2003). However, results from a recent survey reveal that current practising and trainee translators still have insufficient awareness of corpora and expertise in using them to help in their translation workflow. In addition, while corpus linguistics courses are offered at some universities, no materials for self-learning are available to our knowledge: such materials might not only complement traditional courses, but would also be of special interest for professional translators, who are often under serious time constraints. This paper presents a free eLearning course on “Corpora for Translators” which has been developed by the EU-funded MeLLANGE project in an attempt to fill this gap. It deals with the use of corpora for different translation-related activities (e.g. source text analy...
“Designing a Learner Translator Corpus for Training Purposes”. In N. Kübler. (ed) (2011) Corpora, Language, Teaching, and Resources : From Theory to Practice. Bern: Peter Lang. 221-248., 2011
Il presente contributo si propone di condividere le finalità, la metodologia di sviluppo e gli es... more Il presente contributo si propone di condividere le finalità, la metodologia di sviluppo e gli esiti delle prime ricerche condotte sul corpus PAISÀ, un corpus di testi in italiano contemporaneo scaricati dal web, ideato con finalità glottodidattiche e di ricerca nell'ambito del progetto omonimo ( §1.1). Presenteremo come il progetto si inserisce nel panorama, sempre più vasto, dei web-derived corpora, gli accorgimenti che sono risultati necessari in fase di creazione per evitare la spinosa questione del copyright ( §1.2), e le ripercussioni che ciò ha avuto sui contenuti ( §1.3). Ci concentreremo poi sui diversi livelli di annotazione che arricchiscono il corpus PAISÀ, soffermandoci in particolare sullo sforzo di classificazione dei testi per argomento, intenzione comunicativa e genere testuale ( §2 e §3), tre parametri che, una volta trasformati in criteri di ricerca e esplorazione del corpus, permetteranno agli utentiinsegnanti di lingua in primis -una consultazione estremamente mirata e raffinata dei testi.
Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 , 2014
The paper presents SYMPAThy, a new approach to the extraction of Word Combinations. The approach ... more The paper presents SYMPAThy, a new approach to the extraction of Word Combinations. The approach is new in that it combines pattern-based (P-based) and syntax-based (S-based) methods in order to obtain an integrated and unified view of a lexeme’s combinatory potential.
Dans cet essai, nous présenterons des données collectées dans le cadre d’un projet pilote sur l’i... more Dans cet essai, nous présenterons des données collectées dans le cadre d’un projet pilote sur l’interprétation téléphonique auprès d’une unité socio-sanitaire de la région Émilie-Romagne. En nous basant sur l’analyse conversationnelle de quelques appels téléphoniques authentiques, nous montrerons tout d’abord en quoi consiste la complexité de l’interprétation à distance par rapport à d’autres formes de traduction orale, en nous concentrant sur quelques aspects qui semblent avoir un impact particulier sur l’efficacité des interactions téléphoniques avec interprète. Nous réfléchirons ensuite aux implications de cette nouvelle technologie tant pour le personnel soignant que pour les interprètes. Nous terminerons par souligner l’importance de former les uns et les autres à une bonne utilisation du dispositif, pour éviter de diminuer la qualité du service.
An established method for MWE extraction is the combined use of previously identified POS-pattern... more An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found — and new meaningful candidate patterns emerge — among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.
Proceedings of the 9th Web as Corpus Workshop (WaC-9), 2014
PAISÀ is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the d... more PAISÀ is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.
Rassegna Italiana di Linguistica Applicata 1-2/2011., 2011
This paper reports on a study on the use of connectives in a multiple translation corpus (i.e. a ... more This paper reports on a study on the use of connectives in a multiple translation corpus (i.e. a parallel corpus in which several translations are available for each source text) aimed at investigating whether the analysis of regularities and variations in the way different translators cope with the same source text can provide new insights as regards the presence and nature of explicitation and interferencetwo alleged Translation Universalsin translated texts. In particular, the study set out to explore whether identified regularities can be indicative of a translators' attempt to conform to target language (TL) standards or, vice versa, whether and to what extent they represent a breach of TL norms, possibly caused by source language interference. L'articolo presenta alcune considerazioni emerse da uno studio teso a verificare se la possibilità di osservare il comportamento di traduttori diversi di fronte ad uno stesso testo di partenza permetta di ottenere, attraverso l'analisi di regolarità e variazioni, nuove indicazioni circa l'esistenza e la natura di fenomeni di esplicitazione e interferenza, presentati da diversi autori come "universali della traduzione". In particolare lo studio, incentrato sull'utilizzo dei connettivi all'interno di un multiple translation corpus, si propone di verificare se le regolarità riscontrate possano indicare un tentativo di adattamento alle norme della lingua d'arrivo da parte dei traduttori o, viceversa, se e in quale misura rappresentino violazioni di tali norme, eventualmente causate da interferenza.
Lingue e Linguaggi, 2019
Corporate Social Responsibility (CSR) reports constitute a relatively new form of corporate discl... more Corporate Social Responsibility (CSR) reports constitute a relatively new form of corporate disclosure used by companies to present their values and philosophy with respect to socially relevant themes on which they may have an impact, mainly the environment, the community and employees. Companies thus publish CSR reports to communicate with a variety of stakeholders and provide information about their sustainability initiatives, with the ultimate aim of building, reinforcing, and promoting their corporate image. Personalisation plays an important role in the discursive construction of identity and in the definition of relationships between social actors. The personification of the companyobtained through 1 st person plural deixis within corporate reportsis a very powerful rhetorical tool to convey a collective subject which takes responsibility for the actions and results it is giving account of, indicating and enacting a specific relationship with the reader. As a sociopragmatic item, however, it is largely language/culturedependent, and thus represents an interesting locus to observe the impact of translation strategies on the meaning conveyed to the target audience. This paper sets out to analyse how CSR reports translated into English from Italian compareas regards personalisation with reports originally produced in English, in order to detect differences in the way corporate identity is construed and conveyed. The study is based on a bilingual corpus which includes translated English reports and their Italian source texts, as well as comparable originals in English and Italian. Corroborating previous research conducted on similar genres, the study shows that (im)personalisation patterns are considerably different in original and translated English CSR reports, largely due to a tendency for the latter to reproduce Italian conventions in this form of specialised discourse.
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
The paper describes the creation of a manually validated dataset of Italian multiword expressions... more The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
This paper reports on work carried out in the framework of an ongoing project aimed at building a... more This paper reports on work carried out in the framework of an ongoing project aimed at building an online, corpus-based lexicographic resource for Italian Word Combinations. Our aim is to compare two of the most commonly used methods for the automatic extraction of word combinations from corpora, with a view to evaluate their performance – and ultimately their efficacy – with respect to the task of acquiring word combinations for inclusion in the lexicographic combinatory resource.
This article aims to investigate trainee translators’ contrastive pragmalinguistic competence, st... more This article aims to investigate trainee translators’ contrastive pragmalinguistic competence, starting from the assumption that – although more elusively than knowledge about culture-specific references – it represents an important subcomponent of intercultural competence, which can determine the adequacy of translated texts. The study focuses in particular on the translation of interclausal linkage, as it is a form of cohesion which displays different preferences across languages. A multi-parallel corpus of English-to-Italian learner translations of the same source text is analysed to detect regularities and variation in the language behaviour of trainee translators. The frequency of connectives in target texts is compared to both the respective source texts and comparable non-translated Italian texts, in order to determine whether translations are mainly shaped by interference or normalisation. The results of the quantitative analysis confirm previous findings that interference is predominant, with students closely reproducing source text conjunctive patterns at the risk of making translations sound unnatural; more refined qualitative observations, however, reveal that there are also attempts at normalisation. A discussion of the relevance to translator training of the insights obtained is provided, together with suggestions for inclusion in training programmes.
This work introduces SYMPAThy, a data representation model in which the combinatorial properties ... more This work introduces SYMPAThy, a data representation model in which the combinatorial properties of a lexical item are described by merging surface and deeper linguistic information. The proposed approach is then evaluated by comparing, for a sample list of verbal idioms, a set of SYMPAThy-based fixedness indexes against the relevant speaker-elicited indexes available in the descriptive norms collected by Tabossi et al. (2011).
We report on three experiments aimed at comparing two popular methods for the automatic extractio... more We report on three experiments aimed at comparing two popular methods for the automatic extraction of Word Combinations from corpora, with a view to evaluate: i) their efficacy in acquiring data to be included in a combinatory resource for Italian; ii) the impact of different types of benchmarks on the evaluation itself.
The legal knowledge base resulting from the LOIS (Lexical Ontologies for legal Information Sharin... more The legal knowledge base resulting from the LOIS (Lexical Ontologies for legal Information Sharing) project consists of legal WordNets in six languages (Italian, Dutch, Portuguese, German, Czech, English). Its architecture is based on the EuroWordNet (EWN) framework (Vossen et al, 1997). Using the EWN framework assures compatibility of the LOIS WordNets with EWN, allowing them to function as an extension of EWN for the legal domain. For each legal system, the document-derived legal concepts are integrated into a taxonomy, which links into existing formal ontologies. These give the legal wordnets a first formal backbone, which can, in future, be further extended. The database consists of 33,000 synsets, and is aimed to be used in information retrieval, where it provides mono-and multi-lingual access to European legal databases for legal experts as well as for laymen. The LOIS knowledge base also provides a flexible, modular architecture that allows integration of multiple classificat...
Daniel Gile, Gyde Hansen and Nike K. Pokorn (eds.) Why Translation Studies Matter. Amsterdam: John Benjamins Publishing, 2010
Within recent years, corpora have gained considerable importance in Translation Studies, and a nu... more Within recent years, corpora have gained considerable importance in Translation Studies, and a number of studies have also appeared which show their value for translator training (e.g. Zanettin et al. 2003). However, results from a recent survey reveal that current practising and trainee translators still have insufficient awareness of corpora and expertise in using them to help in their translation workflow. In addition, while corpus linguistics courses are offered at some universities, no materials for self-learning are available to our knowledge: such materials might not only complement traditional courses, but would also be of special interest for professional translators, who are often under serious time constraints. This paper presents a free eLearning course on “Corpora for Translators” which has been developed by the EU-funded MeLLANGE project in an attempt to fill this gap. It deals with the use of corpora for different translation-related activities (e.g. source text analy...
“Designing a Learner Translator Corpus for Training Purposes”. In N. Kübler. (ed) (2011) Corpora, Language, Teaching, and Resources : From Theory to Practice. Bern: Peter Lang. 221-248., 2011
Il presente contributo si propone di condividere le finalità, la metodologia di sviluppo e gli es... more Il presente contributo si propone di condividere le finalità, la metodologia di sviluppo e gli esiti delle prime ricerche condotte sul corpus PAISÀ, un corpus di testi in italiano contemporaneo scaricati dal web, ideato con finalità glottodidattiche e di ricerca nell'ambito del progetto omonimo ( §1.1). Presenteremo come il progetto si inserisce nel panorama, sempre più vasto, dei web-derived corpora, gli accorgimenti che sono risultati necessari in fase di creazione per evitare la spinosa questione del copyright ( §1.2), e le ripercussioni che ciò ha avuto sui contenuti ( §1.3). Ci concentreremo poi sui diversi livelli di annotazione che arricchiscono il corpus PAISÀ, soffermandoci in particolare sullo sforzo di classificazione dei testi per argomento, intenzione comunicativa e genere testuale ( §2 e §3), tre parametri che, una volta trasformati in criteri di ricerca e esplorazione del corpus, permetteranno agli utentiinsegnanti di lingua in primis -una consultazione estremamente mirata e raffinata dei testi.
Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014 , 2014
The paper presents SYMPAThy, a new approach to the extraction of Word Combinations. The approach ... more The paper presents SYMPAThy, a new approach to the extraction of Word Combinations. The approach is new in that it combines pattern-based (P-based) and syntax-based (S-based) methods in order to obtain an integrated and unified view of a lexeme’s combinatory potential.
Dans cet essai, nous présenterons des données collectées dans le cadre d’un projet pilote sur l’i... more Dans cet essai, nous présenterons des données collectées dans le cadre d’un projet pilote sur l’interprétation téléphonique auprès d’une unité socio-sanitaire de la région Émilie-Romagne. En nous basant sur l’analyse conversationnelle de quelques appels téléphoniques authentiques, nous montrerons tout d’abord en quoi consiste la complexité de l’interprétation à distance par rapport à d’autres formes de traduction orale, en nous concentrant sur quelques aspects qui semblent avoir un impact particulier sur l’efficacité des interactions téléphoniques avec interprète. Nous réfléchirons ensuite aux implications de cette nouvelle technologie tant pour le personnel soignant que pour les interprètes. Nous terminerons par souligner l’importance de former les uns et les autres à une bonne utilisation du dispositif, pour éviter de diminuer la qualité du service.
An established method for MWE extraction is the combined use of previously identified POS-pattern... more An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found — and new meaningful candidate patterns emerge — among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.
Proceedings of the 9th Web as Corpus Workshop (WaC-9), 2014
PAISÀ is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the d... more PAISÀ is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.
Rassegna Italiana di Linguistica Applicata 1-2/2011., 2011
This paper reports on a study on the use of connectives in a multiple translation corpus (i.e. a ... more This paper reports on a study on the use of connectives in a multiple translation corpus (i.e. a parallel corpus in which several translations are available for each source text) aimed at investigating whether the analysis of regularities and variations in the way different translators cope with the same source text can provide new insights as regards the presence and nature of explicitation and interferencetwo alleged Translation Universalsin translated texts. In particular, the study set out to explore whether identified regularities can be indicative of a translators' attempt to conform to target language (TL) standards or, vice versa, whether and to what extent they represent a breach of TL norms, possibly caused by source language interference. L'articolo presenta alcune considerazioni emerse da uno studio teso a verificare se la possibilità di osservare il comportamento di traduttori diversi di fronte ad uno stesso testo di partenza permetta di ottenere, attraverso l'analisi di regolarità e variazioni, nuove indicazioni circa l'esistenza e la natura di fenomeni di esplicitazione e interferenza, presentati da diversi autori come "universali della traduzione". In particolare lo studio, incentrato sull'utilizzo dei connettivi all'interno di un multiple translation corpus, si propone di verificare se le regolarità riscontrate possano indicare un tentativo di adattamento alle norme della lingua d'arrivo da parte dei traduttori o, viceversa, se e in quale misura rappresentino violazioni di tali norme, eventualmente causate da interferenza.
Within recent years, corpora have gained considerable importance in Translation Studies, and a nu... more Within recent years, corpora have gained considerable importance in Translation Studies, and a number of studies have also appeared which show their value for translator training (e.g. Zanettin et al. 2003). However, results from a recent survey reveal that current practising and trainee translators still have insufficient awareness of corpora and expertise in using them to help in their translation workflow. In addition, while corpus linguistics courses are offered at some universities, no materials for self-learning are available to our knowledge: such materials might not only complement traditional courses, but would also be of special interest for professional translators, who are often under serious time constraints. This paper presents a free eLearning course on “Corpora for Translators” which has been developed by the EU-funded MeLLANGE project in an attempt to fill this gap. It deals with the use of corpora for different translation-related activities (e.g. source text analy...