Sonia Cenceschi | Scuola universitaria professionale della Svizzera italiana (original) (raw)
Papers by Sonia Cenceschi
The paper presents the preliminary protocols and sessions designed and realized to evaluate the i... more The paper presents the preliminary protocols and sessions designed and realized to evaluate the initial communicative behaviors and the willingness to be engaged of kids and young persons involved into the LYV Project for technological mediated prosodic storytelling sessions. The LYV project focuses on stimulating and improving prosodic skills of Italian young speakers with autism, intellectual and linguistic disabilities through the use of vocal interactive stories
This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crise... more This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crises are correlated to a set of acoustic and psychological features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project (Sbattella Tedesco Trivilini, 2015). DIKE represents several aspects of examinations, according to psychological, juridical, and linguistic the- ories, in an audio/textual annotated corpus (“crisis” being one of such annotations)
Proceedings of 11th International Conference of Experimental Linguistics, 2020
This research offers a preliminary survey on vowels and diphthong variation between two Irish Eng... more This research offers a preliminary survey on vowels and diphthong variation between two Irish English varieties: Galway (GW) and Letterkenny (LK). The results showed only a smaller difference between GW and LK with respect to the monophthongs, whereas a larger difference was found for the MOUTH diphthong. Despite the great amount of literature on English dialects, a phonetic investigation of these specific varieties is still lacking. This study may open the path to further investigations of sociophonetic values and the stereotypes associated with different varieties, in particular those of the northern regions.
Indagatio Didactica, 2021
In forensic phonetics, speaker's recognition is considered as a conventional chore. The purpose o... more In forensic phonetics, speaker's recognition is considered as a conventional chore. The purpose of this work is to analyse whether and to what extent (1) the expertise of the evaluators and (2) reading and spontaneous speaking styles influence the speakers' identification. Our analysis is founded on two different perception experiments. The first one is a real case we worked on in which we challenged both speaker with experience in the audio field and lay speakers to compare two voices in short and low-quality audio files, obtaining very weak result. From these findings, we settled a second 'laboratory' experiment with made-up files recorded by 1 Italian female speaker in 3 settings both for reading and spontaneous speech: high quality recordings, WhatsApp audio and phone call recordings. These data were used in a perceptive test where respondents were asked to point out the recording modality of each sample and specify whether the speaker was the same. Results of this second test show that self-declared experts in audio analysis or transcriptions behave similarly to lay speakers, and that the comparison is more reliable for spontaneous speech when the audio quality is not the same. This result confirms the need to adequately train professionals combining subjective listening with in-depth acoustic and linguistic analyses, and take speech style into account when recording speech samples for comparative analyses.
Estudios de Fonética Experimental Journal of Experimental Phonetics, 2021
CALLIOPE is a conceptual multi-dimensional model that aims at approximating and categorizing the ... more CALLIOPE is a conceptual multi-dimensional model that aims at approximating and categorizing the prosodic phenomena taking into account of all possible independent factors affecting the sound of so-called Information Units (IUs). In CALLIOPE, each IU is associated with a tuple composed of 12 labels, each belonging to a different dimension representing a characteristic influencing the prosodic behaviour. Its ultimate aim is creating well-defined corpora suitable for linguistic and engineering research.
Speech Prosody 2018
The term prosody defines the group of audio paralinguistic and suprasegmental cues involved in th... more The term prosody defines the group of audio paralinguistic and suprasegmental cues involved in the communicative and understanding process of human speech. This paper presents our approach to automatic recognition of prosodic forms. In particular , we present: CALLIOPE, a multi-dimensional and abstract model, aiming at categorising all prosodic forms; SI-CALLIOPE, a sub-space for which we defined a corpus of recorder prosodic forms; and the psychoacoustic experiment we are currently carrying on for investigating main acoustic behaviours and features involved into the discrimination of prosodic forms. The experiment results will be useful for defining the feature set to rely on for automatic recognition of prosodies. For that reason, we are also defining a classifier, based on Neural Nets. This study is part of the LYV project, which focuses on improving prosodic expressiveness skills of Italian speakers with autism and other cognitive disabilities.
TEANGA, the Journal of the Irish Association for Applied Linguistics
What are speech data? The question is not as trivial as it may seem: every day both theoretical a... more What are speech data? The question is not as trivial as it may seem: every day both theoretical and applied linguistic research come up against problems deriving from bad data management. This topic is particularly thorny in interdisciplinary approaches such as the speech forensics analysis, whereby the recorded speech can be exploited as legal clues, with important repercussions on public security and citizens’ rights. The datum does not exist in nature, being it a consequence of the human analysis of a given phenomenon. In fact, data extraction is based on explicit and implicit theories implemented by the researcher within the application of specific frameworks. Researchers and professionals working on empiric data should be more aware of these underlined processes in order to avoid data misuse and, indeed, maximize results. In this paper, we will briefly address the issue of the speech data epistemology with a particular focus on the interdisciplinary required in forensics among ...
In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), ... more In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), a software application dealing with the problematic of learning English as a foreign language, for Italian (or other transparent, romance languages) mother-tongue subjects with dyslexia. We aim at enriching the traditional learning-based methods, and leveraging a multi-sensorial and emotional approach. The basic idea is to invite subjects to imitate pronunciation and prosody of an English mother-tongue speaker with visual-auditory and real-time feedback, to stimulate the modulation and the control of their oral linguistic productions. The project uses knowledge coming from different fields of study, first of all, creating a link between learning English prosodic problems in dyslexics and the extraction of acoustic features, already known in the Music Information Retrieval field and MPEG-7 encoding. Analysis protocols, based on multidimensional analysis techniques of data, collected from ...
This work investigates babblings during crises in Italian persons undergoing Court-of-Law examina... more This work investigates babblings during crises in Italian persons undergoing Court-of-Law examinations. The analysis was conducted on an audio/textual corpus that extends the one provided by the DIKE project. We found that most of crises (more than 80%) were characterized by babblings. Thus, we tried to characterize babblings looking at low-level acoustic features –such as speech pauses, word durations, intensity/pitch contours, intensity/pitch mean values– and found interesting results. Then we analyzed words in babblings, highlighting semantic roles and grammatical typologies; and again we found interesting clues. We conclude that in stressful setting, as Court- of-Law examinations, during crises, babblings exhibit a precise behavior, in terms of both acoustic and grammatical features.
Appears in: ICERI2014 Proceedings (browse) Pages: 1-8 Publication year: 2014 ISBN: 978-84-617-2484-0 ISSN: 2340-1095
In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), ... more In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), a software application dealing with the problematic of learning English as a foreign language, for Italian (or other transparent, romance languages) mother-tongue subjects with dyslexia. We aim at enriching the traditional learning-based methods, and leveraging a multi-sensorial and emotional approach.
The basic idea is to invite to imitate pronunciation and prosody of an English mother-tongue speaker, giving to the student a visual-auditory real-time feedback, and a global evaluation.
Of course, it is impossible to obtain a perfect imitation of the mother-tongue speaker but, following what happens during a musical performance, imitation is a natural and efficient way to stimulate the subject’s abilities (in our case, modulating and controlling her/his oral linguistic production).
Two types of activities are defined: a Prosodic Session and a Pronunciation Session. Both sessions are conducted in a room equipped with a sound system and a smart board.
The Prosodic Session will consist of a simple, real-time visualization (similar to "karaoke") of the most important vocal parameters: pitch, amplitude, silences, timbre evaluation, and harmonicity. The subject will imitate the speaker’s voice, leveraging the vocal parameter graphs generated by the system.
The Pronunciation Session is dedicated to the pronunciation of vowels, consonant groups, and syllabic groups belonging to specific words (a typical problem of people with dyslexia). Subjects pronounce specific words, containing sensible linguistic patterns (like “dad” / “did”) and the system generates a graph showing the position of the phoneme of interest, within a reference schema (for example the vowels’ triangle). The smart board will provide the student with the opportunity of interacting with the graph, listening the correct pronunciations of phonemes.
Analysis protocols, based on data collected from the sessions, will assess improvements of the subject’s speech abilities, and the impact on her/his specific issues.
The project is based on the extraction of acoustic features, already known in the Music Information Retrieval field, and MPEG-7 encoding. For the prosodic session, both on-line and off-line processes will be carried out, with specific attention to making the final parameters independent of equipment and environmental conditions. For the pronunciation session, individual phonemes will be recognized thanks to Gaussian Markov Models; multi-dimensional feature vectors, collected by English mother-tongue speakers, will be used for training such models. In order to decrease the computational complexity, the initial list of features will be reduced by means of the PCA and SVD techniques.
Design of mathematical models and software functionalities has been completed; implementation is in progress. Testing, and validation will occur at La Musa, a new joint applied research laboratory of Politecnico di Milano and Fondazione Sequeri Esagramma, a non-profit organization providing support and rehabilitation programs to children and adults with cognitive or mental problems.
@ ICMC-SMC 2014 of Athen This research integrates sensory and scientific instruments to analyze t... more @ ICMC-SMC 2014 of Athen
This research integrates sensory and scientific instruments
to analyze the relationship between subjective evaluations
of digitally restored audio and its computer extracted perceptual
descriptors. Statistical methods have been used
to compare the displacement of three types of remediated
content in subspaces obtained by data expressed both by
individuals and by feature extraction algorithms.
Qualitative demands in audio restoration are tightly connected
to the information embedded in remediated content:
it is crucial the awareness that every choice is re-balancing
it and affecting its reception. Listeners in their turn don’t
do an acousmatic reduction of auditory information but recode
it interleaving contextual and aesthetic approaches,
according to their sensitivity and being influenced by their
cultural background.
Thanks to the analysis of the displacement in subspaces
related to the descriptive characteristics with greater variability,
the semantic divergence resulting from the operations
of improving the quality of sound was interpreted
and a predictive model aimed at their optimization was assumed.
Conference Presentations by Sonia Cenceschi
La prosodia e il turn-taking nell’era di WhatsApp -SSSL 2019 TRENTO, 2019
L'intervento vuole stimolare la discussione circa una nuova tipologia di parlato spontaneo: quell... more L'intervento vuole stimolare la discussione circa una nuova tipologia di parlato spontaneo: quello delle applicazioni di messaggistica istantanea. WhatsApp, Messenger, Telegram, sono solo alcuni dei nuovi mezzi che permettono l'invio di messaggi audio in, quasi, tempo reale (connessione permettendo). Quanto e come questo parlato può essere considerato spontaneo? Quanto la prosodia che utilizzano gli utenti, può essere paragonata a quella del parlato nel caso di compresenza fisica di parlante ed interlocutore? Come cambia ed evolve l'interazione tra i diversi interventi? Molti di noi sanno bene cosa significhi ritrovarsi ascoltatore inerme di inquietanti soliloqui inviati dall'amica/o in crisi esistenziale, o magari siamo proprio noi gli autori di qualche intervento "fiume". Cosa cambia nell'aspettativa e cosa cambia rispetto alla messaggistica testuale? Un momento di riflessione per introdurre quella che secondo l'autrice è una nuova categoria di parlato, dotata di nuove regole, ritmi e musicalità, ma che riserva interessanti possibilità, in particolare per quanto concerne la creazione di corpora ed il loro utilizzo per attività di ricerca e indagine nel settore forense. Le considerazioni saranno legate alle attività e alle necessità del Servizio di Informatica Forense SUPSI della Svizzera Italiana e contestualizzate nel panorama dell'audio forense delle indagini preliminari. Cenceschi S., Trivilini A., Sbattella L., Tedesco R. (2019) Collecting Italian spontaneous social media speech: the WAsp2 project. XV convegno AISV,
The paper presents the preliminary protocols and sessions designed and realized to evaluate the i... more The paper presents the preliminary protocols and sessions designed and realized to evaluate the initial communicative behaviours and the willingness to be engaged of 'extra-ordinary' kids and young persons during the Land Your Voice (LYV) Project. At Politecnico di Milano and at Fondazione Sequeri Esagramma the LYV Project focuses on stimulating and improving prosodic skills of Italian young speakers with autism, intellectual and linguistic disabilities through the use of vocal, and technological mediated, prosodic storytelling sessions.
IAFL conferenze - Porto, 2017
The purpose of this paper is to carry out a comparative survey on similarities and differences of... more The purpose of this paper is to carry out a comparative survey on similarities and differences of the application of phonetics evidence in Brazilian1 and Italian2 courtrooms. Brazilian legal system is historically influenced by Roman Law whose principles keep being applied in contemporary democracies in Europe. Nevertheless, regarding the use of linguistic evidence – mainly coming from the phonetic science – the main influence in both countries – academically and in caseworks – are inspired by, or similar to, the United Kingdom. The following main questions guide this investigation: (1) Is there any legal requirement to use phonetics knowledge as evidence in legal processes? (2) What are the needed conditions to become an expert in forensic phonetics and to work on real cases? (3) Once existing a phonetics evidence, are the judges obliged to use it as an argument to build their conviction towards the legal case? (4) Is there any difference in the payment of experts working for the public (Courts, law enforcement etc.) and for privates (Lawyers, private citizens etc.)? Has the possible phenomenon repercussions on the legal system? The incorrect use of recordings is an obstacle for the entire legal process: it has an important impact3 on justice, perception of safety and social costs. In this investigation we will also try to summarize problems in both countries, lay the ground for a future study on their possible solutions.
This work investigates crises in Italian Courts of Law examinations, where for " crisis " we mean... more This work investigates crises in Italian Courts of Law examinations, where for " crisis " we meant an abruptly changes/hesitations in speech production. The main goal is to lay the groundwork for a possible automatic detection of crises in examinations and other stressful contexts, such as emergency phone calls. Our hypothesis is that crises are correlated to a set of acoustic, psychological and grammatical features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project. DIKE represents several aspects of examinations, according to psychological, juridical, and linguistic theories, in an audio/textual annotated corpus (" crisis " being one of such annotations). Materials The corpus we adopted consists of 44 examinations, tagged according to the DIKE (Sbattella et al., 2015) specifications. Such examinations involved 38 speakers (17 examinees and 21 examiners) and contained 127 crises, 95 belonging to examinees and 32 to examiners. In order to consider a corpus as homogeneous as possible, we chose to concentrate on examinees (distributed among 5 females and 12 males), as most of crises belonged to them. We found that babblings were present in 78 of examinees' crisis utterances (82%), while absent in non-crisis utterances. This result permits to infer that babbling is significantly correlated to examinee's crisis. For " babblings " we mean a repetition of a word for two or more times. More precisely, given a sequence of n instances of the word R, followed by a different word W, a babbling instance (B) can be represented as the following sequence: B = R 1 , R 2 , …, R n , W n+1 Vocal features Following the Shriberg's approach (1995) we started detecting pauses, where pause indicate a measurable silence between two consecutive words (Zellner, 1994). Shriberg labels each babbling as perspective or retrospective depending on the presence of a pause after the last repetition R n : in a retrospective crisis, R n and W n+1 are not separated by pause, while in a perspective crisis, they are (highlighting a stalling moment). In our corpus, we found 8 perspectives and 70 retrospective crises. Calculating the average intensity on each word, I(R i), we found that I(R 1) is characterized by higher values than the following I(R i). Moreover, all R i exhibited a similar pitch contour; this means that persons do not change intonation while babbling. We classified each babbling according to the duration D(R i) of all involved words (Plaucheé, 1999). Two typologies of crisis behaviour, according to the difference of duration between the words, were present: in about 91% of babblings, each word was longer than (or equal to) the following one and in particular, for 71 samples, D(R 1) ≥ D(R 2), while for 7 of them D(R 1) < D(R 1). Grammatical features We tried to search for relevant grammatical elements as mentioned in Martin & Robustelli (2007). Notice that grammatical elements are related to semantics; in particular: names, adjectives, non-auxiliary verbs, and adverbs are considered words carrying semantics; the adverb of negation non
IAFPA 2016 Annual Conference
This work investigates crises in Italian Courts of Law examinations, where for “crisis” we meant ... more This work investigates crises in Italian Courts of Law examinations, where for “crisis” we meant an abruptly changes/hesitations in speech production. The main goal is to lay the groundwork for a possible automatic detection of crises in examinations and other stressful contexts, such as emer- gency phone calls. Our hypothesis is that crises are correlated to a set of acoustic, psychological and grammatical features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project. DIKE represents several aspects of examinations, according to psychological, juridi- cal, and linguistic theories, in an audio/textual annotated corpus (“crisis” being one of such annota- tions).
This research integrates sensory and scientific instrumentsto analyze the relationship between sub... more This research integrates sensory and scientific instrumentsto analyze the relationship between subjective evaluationsof digitally restored audio and its computer extracted per-ceptual descriptors. Statistical methods have been usedto compare the displacement of three types of remediatedcontent in subspaces obtained by data expressed both byindividuals and by feature extraction algorithms.
Per il settore forense, le analisi audio sono un ambito scientifico ancora in piena espansione ed... more Per il settore forense, le analisi audio sono un ambito scientifico ancora in piena espansione ed estremamente frammentato. Nonostante ciò, esso racchiude grandissime potenzialità, non soltanto per quanto concerne la voce umana, evidente portatrice di indizi e significante, ma anche per l’estrapolazione di tutti quei particolari sonori utili per l’indagine quali il colpo di pistola (di conseguenza la tipologia di arma), il contesto ambientale, il tipo di autovettura, il sesso del parlatore, eccetera. Durante l’incontro verranno esposte le principali modalità di analisi digitale delle registrazioni tramite esempi e casi reali inerenti temi quali: il riconoscimento del parlatore, la verifica dell’integrità del materiale audio o il problema dell’interpretazione soggettiva del contenuto vocale in presenza di forti disturbi ambientali (quali radio, traffico automobilistico, pioggia, eccetera). L’intercettazione del resto, sia essa telefonica o di altra tipologia, è uno strumento fondamentale per l’indagine preliminare o l’impianto accusatorio e difensivo, ma nel contempo rappresenta un materiale delicato, che necessita l’utilizzo di corrette metodologie di analisi. Dato un quesito, la scienza in tale campo non fornisce ancora certezze probatorie, bensì indizi a favore o sfavore. È invece purtroppo molto più facile creare un danno irreparabile per la società quando l’eventuale prova sia stata analizzata in maniera errata. Questo concetto sarà esplicitato tramite l’esempio di un caso reale (la revisione di una sentenza di condanna) per fornire un ulteriore strumento di difesa da analisi basate su metodi scientificamente non corretti.
inproceedings by Sonia Cenceschi
XV AISV conference, 2019
This work presents the WAsp2 project (WhatsApp SPontaneous SPeech), focused on collecting private... more This work presents the WAsp2 project (WhatsApp SPontaneous SPeech), focused on collecting private social media speech thanks to the WhatsApp application. The main purpose is to collect a wide corpus of spontaneous vocal messages useful in forensic investigations but also in other research topics such as experimental linguistics, speech therapy, artificial intelligence, and ASR systems.
Abstracts of the 25th Annual Conference of the International Association for Forensic Phonetics and Acoustics
This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crise... more This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crises are correlated to a set of acoustic and psychological features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project (Sbattella Tedesco Trivilini, 2015). DIKE represents several aspects of examinations, according to psychological, juridical, and linguistic the- ories, in an audio/textual annotated corpus (“crisis” being one of such annotations).
The paper presents the preliminary protocols and sessions designed and realized to evaluate the i... more The paper presents the preliminary protocols and sessions designed and realized to evaluate the initial communicative behaviors and the willingness to be engaged of kids and young persons involved into the LYV Project for technological mediated prosodic storytelling sessions. The LYV project focuses on stimulating and improving prosodic skills of Italian young speakers with autism, intellectual and linguistic disabilities through the use of vocal interactive stories
This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crise... more This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crises are correlated to a set of acoustic and psychological features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project (Sbattella Tedesco Trivilini, 2015). DIKE represents several aspects of examinations, according to psychological, juridical, and linguistic the- ories, in an audio/textual annotated corpus (“crisis” being one of such annotations)
Proceedings of 11th International Conference of Experimental Linguistics, 2020
This research offers a preliminary survey on vowels and diphthong variation between two Irish Eng... more This research offers a preliminary survey on vowels and diphthong variation between two Irish English varieties: Galway (GW) and Letterkenny (LK). The results showed only a smaller difference between GW and LK with respect to the monophthongs, whereas a larger difference was found for the MOUTH diphthong. Despite the great amount of literature on English dialects, a phonetic investigation of these specific varieties is still lacking. This study may open the path to further investigations of sociophonetic values and the stereotypes associated with different varieties, in particular those of the northern regions.
Indagatio Didactica, 2021
In forensic phonetics, speaker's recognition is considered as a conventional chore. The purpose o... more In forensic phonetics, speaker's recognition is considered as a conventional chore. The purpose of this work is to analyse whether and to what extent (1) the expertise of the evaluators and (2) reading and spontaneous speaking styles influence the speakers' identification. Our analysis is founded on two different perception experiments. The first one is a real case we worked on in which we challenged both speaker with experience in the audio field and lay speakers to compare two voices in short and low-quality audio files, obtaining very weak result. From these findings, we settled a second 'laboratory' experiment with made-up files recorded by 1 Italian female speaker in 3 settings both for reading and spontaneous speech: high quality recordings, WhatsApp audio and phone call recordings. These data were used in a perceptive test where respondents were asked to point out the recording modality of each sample and specify whether the speaker was the same. Results of this second test show that self-declared experts in audio analysis or transcriptions behave similarly to lay speakers, and that the comparison is more reliable for spontaneous speech when the audio quality is not the same. This result confirms the need to adequately train professionals combining subjective listening with in-depth acoustic and linguistic analyses, and take speech style into account when recording speech samples for comparative analyses.
Estudios de Fonética Experimental Journal of Experimental Phonetics, 2021
CALLIOPE is a conceptual multi-dimensional model that aims at approximating and categorizing the ... more CALLIOPE is a conceptual multi-dimensional model that aims at approximating and categorizing the prosodic phenomena taking into account of all possible independent factors affecting the sound of so-called Information Units (IUs). In CALLIOPE, each IU is associated with a tuple composed of 12 labels, each belonging to a different dimension representing a characteristic influencing the prosodic behaviour. Its ultimate aim is creating well-defined corpora suitable for linguistic and engineering research.
Speech Prosody 2018
The term prosody defines the group of audio paralinguistic and suprasegmental cues involved in th... more The term prosody defines the group of audio paralinguistic and suprasegmental cues involved in the communicative and understanding process of human speech. This paper presents our approach to automatic recognition of prosodic forms. In particular , we present: CALLIOPE, a multi-dimensional and abstract model, aiming at categorising all prosodic forms; SI-CALLIOPE, a sub-space for which we defined a corpus of recorder prosodic forms; and the psychoacoustic experiment we are currently carrying on for investigating main acoustic behaviours and features involved into the discrimination of prosodic forms. The experiment results will be useful for defining the feature set to rely on for automatic recognition of prosodies. For that reason, we are also defining a classifier, based on Neural Nets. This study is part of the LYV project, which focuses on improving prosodic expressiveness skills of Italian speakers with autism and other cognitive disabilities.
TEANGA, the Journal of the Irish Association for Applied Linguistics
What are speech data? The question is not as trivial as it may seem: every day both theoretical a... more What are speech data? The question is not as trivial as it may seem: every day both theoretical and applied linguistic research come up against problems deriving from bad data management. This topic is particularly thorny in interdisciplinary approaches such as the speech forensics analysis, whereby the recorded speech can be exploited as legal clues, with important repercussions on public security and citizens’ rights. The datum does not exist in nature, being it a consequence of the human analysis of a given phenomenon. In fact, data extraction is based on explicit and implicit theories implemented by the researcher within the application of specific frameworks. Researchers and professionals working on empiric data should be more aware of these underlined processes in order to avoid data misuse and, indeed, maximize results. In this paper, we will briefly address the issue of the speech data epistemology with a particular focus on the interdisciplinary required in forensics among ...
In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), ... more In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), a software application dealing with the problematic of learning English as a foreign language, for Italian (or other transparent, romance languages) mother-tongue subjects with dyslexia. We aim at enriching the traditional learning-based methods, and leveraging a multi-sensorial and emotional approach. The basic idea is to invite subjects to imitate pronunciation and prosody of an English mother-tongue speaker with visual-auditory and real-time feedback, to stimulate the modulation and the control of their oral linguistic productions. The project uses knowledge coming from different fields of study, first of all, creating a link between learning English prosodic problems in dyslexics and the extraction of acoustic features, already known in the Music Information Retrieval field and MPEG-7 encoding. Analysis protocols, based on multidimensional analysis techniques of data, collected from ...
This work investigates babblings during crises in Italian persons undergoing Court-of-Law examina... more This work investigates babblings during crises in Italian persons undergoing Court-of-Law examinations. The analysis was conducted on an audio/textual corpus that extends the one provided by the DIKE project. We found that most of crises (more than 80%) were characterized by babblings. Thus, we tried to characterize babblings looking at low-level acoustic features –such as speech pauses, word durations, intensity/pitch contours, intensity/pitch mean values– and found interesting results. Then we analyzed words in babblings, highlighting semantic roles and grammatical typologies; and again we found interesting clues. We conclude that in stressful setting, as Court- of-Law examinations, during crises, babblings exhibit a precise behavior, in terms of both acoustic and grammatical features.
Appears in: ICERI2014 Proceedings (browse) Pages: 1-8 Publication year: 2014 ISBN: 978-84-617-2484-0 ISSN: 2340-1095
In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), ... more In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), a software application dealing with the problematic of learning English as a foreign language, for Italian (or other transparent, romance languages) mother-tongue subjects with dyslexia. We aim at enriching the traditional learning-based methods, and leveraging a multi-sensorial and emotional approach.
The basic idea is to invite to imitate pronunciation and prosody of an English mother-tongue speaker, giving to the student a visual-auditory real-time feedback, and a global evaluation.
Of course, it is impossible to obtain a perfect imitation of the mother-tongue speaker but, following what happens during a musical performance, imitation is a natural and efficient way to stimulate the subject’s abilities (in our case, modulating and controlling her/his oral linguistic production).
Two types of activities are defined: a Prosodic Session and a Pronunciation Session. Both sessions are conducted in a room equipped with a sound system and a smart board.
The Prosodic Session will consist of a simple, real-time visualization (similar to "karaoke") of the most important vocal parameters: pitch, amplitude, silences, timbre evaluation, and harmonicity. The subject will imitate the speaker’s voice, leveraging the vocal parameter graphs generated by the system.
The Pronunciation Session is dedicated to the pronunciation of vowels, consonant groups, and syllabic groups belonging to specific words (a typical problem of people with dyslexia). Subjects pronounce specific words, containing sensible linguistic patterns (like “dad” / “did”) and the system generates a graph showing the position of the phoneme of interest, within a reference schema (for example the vowels’ triangle). The smart board will provide the student with the opportunity of interacting with the graph, listening the correct pronunciations of phonemes.
Analysis protocols, based on data collected from the sessions, will assess improvements of the subject’s speech abilities, and the impact on her/his specific issues.
The project is based on the extraction of acoustic features, already known in the Music Information Retrieval field, and MPEG-7 encoding. For the prosodic session, both on-line and off-line processes will be carried out, with specific attention to making the final parameters independent of equipment and environmental conditions. For the pronunciation session, individual phonemes will be recognized thanks to Gaussian Markov Models; multi-dimensional feature vectors, collected by English mother-tongue speakers, will be used for training such models. In order to decrease the computational complexity, the initial list of features will be reduced by means of the PCA and SVD techniques.
Design of mathematical models and software functionalities has been completed; implementation is in progress. Testing, and validation will occur at La Musa, a new joint applied research laboratory of Politecnico di Milano and Fondazione Sequeri Esagramma, a non-profit organization providing support and rehabilitation programs to children and adults with cognitive or mental problems.
@ ICMC-SMC 2014 of Athen This research integrates sensory and scientific instruments to analyze t... more @ ICMC-SMC 2014 of Athen
This research integrates sensory and scientific instruments
to analyze the relationship between subjective evaluations
of digitally restored audio and its computer extracted perceptual
descriptors. Statistical methods have been used
to compare the displacement of three types of remediated
content in subspaces obtained by data expressed both by
individuals and by feature extraction algorithms.
Qualitative demands in audio restoration are tightly connected
to the information embedded in remediated content:
it is crucial the awareness that every choice is re-balancing
it and affecting its reception. Listeners in their turn don’t
do an acousmatic reduction of auditory information but recode
it interleaving contextual and aesthetic approaches,
according to their sensitivity and being influenced by their
cultural background.
Thanks to the analysis of the displacement in subspaces
related to the descriptive characteristics with greater variability,
the semantic divergence resulting from the operations
of improving the quality of sound was interpreted
and a predictive model aimed at their optimization was assumed.
La prosodia e il turn-taking nell’era di WhatsApp -SSSL 2019 TRENTO, 2019
L'intervento vuole stimolare la discussione circa una nuova tipologia di parlato spontaneo: quell... more L'intervento vuole stimolare la discussione circa una nuova tipologia di parlato spontaneo: quello delle applicazioni di messaggistica istantanea. WhatsApp, Messenger, Telegram, sono solo alcuni dei nuovi mezzi che permettono l'invio di messaggi audio in, quasi, tempo reale (connessione permettendo). Quanto e come questo parlato può essere considerato spontaneo? Quanto la prosodia che utilizzano gli utenti, può essere paragonata a quella del parlato nel caso di compresenza fisica di parlante ed interlocutore? Come cambia ed evolve l'interazione tra i diversi interventi? Molti di noi sanno bene cosa significhi ritrovarsi ascoltatore inerme di inquietanti soliloqui inviati dall'amica/o in crisi esistenziale, o magari siamo proprio noi gli autori di qualche intervento "fiume". Cosa cambia nell'aspettativa e cosa cambia rispetto alla messaggistica testuale? Un momento di riflessione per introdurre quella che secondo l'autrice è una nuova categoria di parlato, dotata di nuove regole, ritmi e musicalità, ma che riserva interessanti possibilità, in particolare per quanto concerne la creazione di corpora ed il loro utilizzo per attività di ricerca e indagine nel settore forense. Le considerazioni saranno legate alle attività e alle necessità del Servizio di Informatica Forense SUPSI della Svizzera Italiana e contestualizzate nel panorama dell'audio forense delle indagini preliminari. Cenceschi S., Trivilini A., Sbattella L., Tedesco R. (2019) Collecting Italian spontaneous social media speech: the WAsp2 project. XV convegno AISV,
The paper presents the preliminary protocols and sessions designed and realized to evaluate the i... more The paper presents the preliminary protocols and sessions designed and realized to evaluate the initial communicative behaviours and the willingness to be engaged of 'extra-ordinary' kids and young persons during the Land Your Voice (LYV) Project. At Politecnico di Milano and at Fondazione Sequeri Esagramma the LYV Project focuses on stimulating and improving prosodic skills of Italian young speakers with autism, intellectual and linguistic disabilities through the use of vocal, and technological mediated, prosodic storytelling sessions.
IAFL conferenze - Porto, 2017
The purpose of this paper is to carry out a comparative survey on similarities and differences of... more The purpose of this paper is to carry out a comparative survey on similarities and differences of the application of phonetics evidence in Brazilian1 and Italian2 courtrooms. Brazilian legal system is historically influenced by Roman Law whose principles keep being applied in contemporary democracies in Europe. Nevertheless, regarding the use of linguistic evidence – mainly coming from the phonetic science – the main influence in both countries – academically and in caseworks – are inspired by, or similar to, the United Kingdom. The following main questions guide this investigation: (1) Is there any legal requirement to use phonetics knowledge as evidence in legal processes? (2) What are the needed conditions to become an expert in forensic phonetics and to work on real cases? (3) Once existing a phonetics evidence, are the judges obliged to use it as an argument to build their conviction towards the legal case? (4) Is there any difference in the payment of experts working for the public (Courts, law enforcement etc.) and for privates (Lawyers, private citizens etc.)? Has the possible phenomenon repercussions on the legal system? The incorrect use of recordings is an obstacle for the entire legal process: it has an important impact3 on justice, perception of safety and social costs. In this investigation we will also try to summarize problems in both countries, lay the ground for a future study on their possible solutions.
This work investigates crises in Italian Courts of Law examinations, where for " crisis " we mean... more This work investigates crises in Italian Courts of Law examinations, where for " crisis " we meant an abruptly changes/hesitations in speech production. The main goal is to lay the groundwork for a possible automatic detection of crises in examinations and other stressful contexts, such as emergency phone calls. Our hypothesis is that crises are correlated to a set of acoustic, psychological and grammatical features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project. DIKE represents several aspects of examinations, according to psychological, juridical, and linguistic theories, in an audio/textual annotated corpus (" crisis " being one of such annotations). Materials The corpus we adopted consists of 44 examinations, tagged according to the DIKE (Sbattella et al., 2015) specifications. Such examinations involved 38 speakers (17 examinees and 21 examiners) and contained 127 crises, 95 belonging to examinees and 32 to examiners. In order to consider a corpus as homogeneous as possible, we chose to concentrate on examinees (distributed among 5 females and 12 males), as most of crises belonged to them. We found that babblings were present in 78 of examinees' crisis utterances (82%), while absent in non-crisis utterances. This result permits to infer that babbling is significantly correlated to examinee's crisis. For " babblings " we mean a repetition of a word for two or more times. More precisely, given a sequence of n instances of the word R, followed by a different word W, a babbling instance (B) can be represented as the following sequence: B = R 1 , R 2 , …, R n , W n+1 Vocal features Following the Shriberg's approach (1995) we started detecting pauses, where pause indicate a measurable silence between two consecutive words (Zellner, 1994). Shriberg labels each babbling as perspective or retrospective depending on the presence of a pause after the last repetition R n : in a retrospective crisis, R n and W n+1 are not separated by pause, while in a perspective crisis, they are (highlighting a stalling moment). In our corpus, we found 8 perspectives and 70 retrospective crises. Calculating the average intensity on each word, I(R i), we found that I(R 1) is characterized by higher values than the following I(R i). Moreover, all R i exhibited a similar pitch contour; this means that persons do not change intonation while babbling. We classified each babbling according to the duration D(R i) of all involved words (Plaucheé, 1999). Two typologies of crisis behaviour, according to the difference of duration between the words, were present: in about 91% of babblings, each word was longer than (or equal to) the following one and in particular, for 71 samples, D(R 1) ≥ D(R 2), while for 7 of them D(R 1) < D(R 1). Grammatical features We tried to search for relevant grammatical elements as mentioned in Martin & Robustelli (2007). Notice that grammatical elements are related to semantics; in particular: names, adjectives, non-auxiliary verbs, and adverbs are considered words carrying semantics; the adverb of negation non
IAFPA 2016 Annual Conference
This work investigates crises in Italian Courts of Law examinations, where for “crisis” we meant ... more This work investigates crises in Italian Courts of Law examinations, where for “crisis” we meant an abruptly changes/hesitations in speech production. The main goal is to lay the groundwork for a possible automatic detection of crises in examinations and other stressful contexts, such as emer- gency phone calls. Our hypothesis is that crises are correlated to a set of acoustic, psychological and grammatical features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project. DIKE represents several aspects of examinations, according to psychological, juridi- cal, and linguistic theories, in an audio/textual annotated corpus (“crisis” being one of such annota- tions).
This research integrates sensory and scientific instrumentsto analyze the relationship between sub... more This research integrates sensory and scientific instrumentsto analyze the relationship between subjective evaluationsof digitally restored audio and its computer extracted per-ceptual descriptors. Statistical methods have been usedto compare the displacement of three types of remediatedcontent in subspaces obtained by data expressed both byindividuals and by feature extraction algorithms.
Per il settore forense, le analisi audio sono un ambito scientifico ancora in piena espansione ed... more Per il settore forense, le analisi audio sono un ambito scientifico ancora in piena espansione ed estremamente frammentato. Nonostante ciò, esso racchiude grandissime potenzialità, non soltanto per quanto concerne la voce umana, evidente portatrice di indizi e significante, ma anche per l’estrapolazione di tutti quei particolari sonori utili per l’indagine quali il colpo di pistola (di conseguenza la tipologia di arma), il contesto ambientale, il tipo di autovettura, il sesso del parlatore, eccetera. Durante l’incontro verranno esposte le principali modalità di analisi digitale delle registrazioni tramite esempi e casi reali inerenti temi quali: il riconoscimento del parlatore, la verifica dell’integrità del materiale audio o il problema dell’interpretazione soggettiva del contenuto vocale in presenza di forti disturbi ambientali (quali radio, traffico automobilistico, pioggia, eccetera). L’intercettazione del resto, sia essa telefonica o di altra tipologia, è uno strumento fondamentale per l’indagine preliminare o l’impianto accusatorio e difensivo, ma nel contempo rappresenta un materiale delicato, che necessita l’utilizzo di corrette metodologie di analisi. Dato un quesito, la scienza in tale campo non fornisce ancora certezze probatorie, bensì indizi a favore o sfavore. È invece purtroppo molto più facile creare un danno irreparabile per la società quando l’eventuale prova sia stata analizzata in maniera errata. Questo concetto sarà esplicitato tramite l’esempio di un caso reale (la revisione di una sentenza di condanna) per fornire un ulteriore strumento di difesa da analisi basate su metodi scientificamente non corretti.
XV AISV conference, 2019
This work presents the WAsp2 project (WhatsApp SPontaneous SPeech), focused on collecting private... more This work presents the WAsp2 project (WhatsApp SPontaneous SPeech), focused on collecting private social media speech thanks to the WhatsApp application. The main purpose is to collect a wide corpus of spontaneous vocal messages useful in forensic investigations but also in other research topics such as experimental linguistics, speech therapy, artificial intelligence, and ASR systems.
Abstracts of the 25th Annual Conference of the International Association for Forensic Phonetics and Acoustics
This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crise... more This work investigates crises in Italian Courts of Law examinations. Our hypothesis is that crises are correlated to a set of acoustic and psychological features that, therefore, could be leveraged to detect them. The analysis is based on the DIKE project (Sbattella Tedesco Trivilini, 2015). DIKE represents several aspects of examinations, according to psychological, juridical, and linguistic the- ories, in an audio/textual annotated corpus (“crisis” being one of such annotations).
This study is a web-based, psychoacoustic test for adult, Italian native-speakers, investigating ... more This study is a web-based, psychoacoustic test for adult, Italian native-speakers, investigating detection of different prosodic phenomena in Standard Italian utterances. The purpose was to investigate the influence of semantics on human ability to recognise different prosodic aspects, in order to understand the basic pieces of information involved into the psychoacoustic process of verbal comprehension. In particular, one section of the test regarded the ability to recognize the presence of a Corrective Focus, which is a spoken constituent that is a direct rejection of an alternative. Results show Corrective Focus seems difficult to detect into isolated audio utterances. Semantics seems to improve detection accuracy; phonotactics, instead, seems not to add useful information; finally, our test confirms correlation with prominent syllables.
Proceedings of International Conference of Education, Research and Innovation (ICERI)
In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), ... more In this work we describe our roadmap to KaSPAR (Karaoke Speech-Prosody Analyzer and Recognizer), a software application dealing with the problematic of learning English as a foreign language, for Italian (or other transparent, romance languages) mother-tongue subjects with dyslexia. We aim at enriching the traditional learning-based methods, and leveraging a multi-sensorial and emotional approach. The basic idea is to invite subjects to imitate pronunciation and prosody of an English mother-tongue speaker with visual-auditory and real-time feedback, to stimulate the modulation and the control of their oral linguistic productions. The project uses knowledge coming from different fields of study, first of all, creating a link between learning English prosodic problems in dyslexics and the extraction of acoustic features, already known in the Music Information Retrieval field and MPEG-7 encoding. Analysis protocols, based on multidimensional analysis techniques of data, collected from sessions, will assess improvements of the subject’s speech abilities, and the impact on her/his specific issues.
Fattori sociali e biologici nella variazione fonetica, 2018
This paper presents our approach to automatic recognition of prosodic forms. In particular, we pr... more This paper presents our approach to automatic recognition of prosodic forms. In particular, we present: CALLIOPE, a multi-dimensional model aiming at categorizing all prosodic forms; SI-CALLIOPE, a sub-space for which we defined a corpus of recorder prosodic forms; and the psychoacoustic experiment we are currently planning for investigating main acoustic behaviours and features involved into the discrimination of prosodic forms. The results of the experiment will be useful for defining the acoustic/textual features to rely on for automatic recognition of prosodic forms. For that reason, we are also defining a classifier, based on Neural Nets. This study is part of the LYV project, which focuses on improving prosodic expressiveness skills of Italian speakers with autism and other cognitive disabilities.