Luca Cristoforetti | Bruno Kessler Foundation (original) (raw)
Papers by Luca Cristoforetti
Springer eBooks, 1998
The object oriented paradigm has become widely used to develop large information systems. This pa... more The object oriented paradigm has become widely used to develop large information systems. This paper presents a method for estimating the size and effort of developing object oriented software. The approach is analogous to function points, and it is based on counting rules that pick up the elements in a static object model and combine them in order to produce a composite measure. Rules are proposed for counting "Object Oriented Function Points" from an object model, and several questions are identified for empirical research. A key aspect of this method is its flexibility. An organization can experiment with different counting policies, to find the most accurate predictors of size, effort, etc. in its environment. "Object Oriented Function Points" counting has been implemented in a Ja~ tool, and results on size estimation obtained from a pilot project with an industrial partner are encouraging.
Lecture Notes in Computer Science, 2023
Multi-microphone data collection and WOZ experiments for the analysis of user behaviour in the DI... more Multi-microphone data collection and WOZ experiments for the analysis of user behaviour in the DICIT scenarios Authors: Affiliations:
Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems
Analysis models, technologies and tools are extensively used in the automotive domain to validate... more Analysis models, technologies and tools are extensively used in the automotive domain to validate and optimize the design and implementation of SW systems. This is especially true for modern systems including advanced autonomous (and complex) features. The range of analysis methods that can be applied is extremely wide and goes from functional correctness to functional safety to timing (and schedulability), security, and possibly even more. The AUTOSAR automotive standard has been defined with the purpose of standardizing the SW architecture of automotive systems and enable the construction of systems by composing SW components that are portable and abstract with respect to the underlying HW/SW platform. However, AUTOSAR was originally developed with portability of code in mind, and even if it quickly evolved to include a system-level modeling language (with its metamodel) and later extensions to deal with the needs of analysis methods (and tools), it is hardly comprehensive and still affected by several omissions and limitations. To fix the limitations with respect to timing and schedulability analysis Bosch developed the Amalthea (later App4MC) metamodel and tools. In Huawei, a more general (and ambitious) approach was undertaken to support not only timing analysis, but also model checking (or other types of formal verification), safety analysis and even design optimization. The approach is based on the concepts of a unified (modular) metamodel and a framework based on Eclipse to integrate analysis methods and tools. In this paper we describe the framework and the results obtained
This convention paper has been reproduced from the author’s advance manuscript, without editing, ... more This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be
Procedia Computer Science, 2011
International Conference on Language Resources and Evaluation, 2008
This paper describes a multichannel acoustic data collection recorded under the European DICIT pr... more This paper describes a multichannel acoustic data collection recorded under the European DICIT project, during the Wizard of Oz (WOZ) experiments carried out at FAU and FBK-irst laboratories. The scenario is a distant-talking interface for interactive control of a TV. The experiments involve the acquisition of multichannel data for signal processing front-end and were carried out due to the need to collect a database for testing acoustic pre-processing algorithms. In this way, realistic scenarios can be simulated at a preliminary stage, instead of real-time implementations, allowing for repeatable experiments. To match the project requirements, the WOZ experiments were recorded in three languages: English, German and Italian. Besides the user inputs, the database also contains non-speech related acoustic events, room impulse response measurements and video data, the latter used to compute 3D labels. Sessions were manually transcribed and segmented at word level, introducing also specific labels for acoustic events.
This paper describes the activity of annotation of an Italian corpus of in-car speech material, w... more This paper describes the activity of annotation of an Italian corpus of in-car speech material, with specific reference to the JavaSgram tool, developed with the purpose of annotating multichannel speech corpora. Some pre/post processing tools used with JavaSgram are briefly described together with a synthetic description of the annotation criteria which were adopted. The final objective is that of using the resulting corpus for training and testing a hands-free speech recognizer under development.
Journal of The Audio Engineering Society, 2011
SCENIC is an EC-funded project aimed at developing a harmonized corpus of methodologies for envir... more SCENIC is an EC-funded project aimed at developing a harmonized corpus of methodologies for environmentaware acoustic sensing and rendering. The project focusses on space-time acoustic processing solutions that do not just accommodate the environment in the modeling process but that make the environment help towards achieving the goal at hand. The solutions developed within this project cover a wide range of applications, including acoustic self-calibration, aimed at estimating the parameters of the acoustic system; environment inference, aimed at identifying and characterizing all the relevant acoustic reflectors in the environment. The information gathered through such steps is then used to boost the performance of wavefield rendering methods as well as source localization/characterization/extraction in reverberant environments. AES Annibale, Antonacci, et. al. SCENIC Project: Environment-Aware Space-Time Audio Processing AES 131st Convention, New York, NY, USA, 2011 October 20–23...
This paper describes some activities being conducted at IRST with the aim of developing a technol... more This paper describes some activities being conducted at IRST with the aim of developing a technology for hands-free speech recognition in car environment. This technology is based on Hidden Markov Models and is being developed and evaluated by using the car database collected in the European projects SpeechDatCar and VODIS-II. Preliminary experiments are based on the use of filtered clean speech corpora for HMM training and on the application of MLLR adaptation to further reduce the mismatch between training and testing conditions. Results are promising but show the difficulty of this task, even when exploiting some material collected in the test environment for HMM adaptation.
This demo demonstrates a small and portable system embedding microphone array processing and robu... more This demo demonstrates a small and portable system embedding microphone array processing and robust speech recognition for distant-speech interaction to control a floor lamp. The system is entirely contained inside the lamp and operates in real-time, “always-listening” mode. It runs on a small, low-power, fanless board and acts as the light control interface. The prototype shows the feasibility, potential and limits of the integration of speech technology in devices of everyday use.
7th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2017)
In this paper we describe a system for analyzing the reading errors made by children of the prima... more In this paper we describe a system for analyzing the reading errors made by children of the primary and middle schools. To assess the reading skills of children in terms of reading accuracy and speed, a standard reading achievement test, developed by educational psychologists and named "Prove MT" (MT reading test), is used in the Italian schools. This test is based on a set of texts specific for different ages, from 7 to 13 years old. At present, during the test, children are asked to read aloud short stories, while teachers manually write down the reading errors on a sheet and then compute a total score based on several measures, such as duration of the whole reading, number of read syllables per second, number and type of errors, etc. The system we have developed is aimed to support the teachers in this task by automatically detecting the reading errors and estimating the needed measures. To do this we use an automatic speech-totext transcription system that employs a language model (LM) trained over the texts containing the stories to read. In addition, we embed in the LM an error model that allows to take into account typical reading errors, mostly consisting in pronunciation errors, substitutions of syllables or words, word truncation, etc. To evaluate the performance of our system we collected 20 audio recordings, uttered by 8-13 years old children, reading a novel belonging to "Prove MT" set. It is worth mentioning that the error model proposed in this paper for assessing the reading capabilities of children performs closely to an "oracle" error model obtained from manual transcriptions of the readings themselves.
7th ISCA Workshop on Speech and Language Technology in Education, Aug 25, 2017
In this paper we describe a system for analyzing the reading errors made by children of the prima... more In this paper we describe a system for analyzing the reading errors made by children of the primary and middle schools. To assess the reading skills of children in terms of reading accuracy and speed, a standard reading achievement test, developed by educational psychologists and named "Prove MT" (MT reading test), is used in the Italian schools. This test is based on a set of texts specific for different ages, from 7 to 13 years old. At present, during the test, children are asked to read aloud short stories, while teachers manually write down the reading errors on a sheet and then compute a total score based on several measures, such as duration of the whole reading, number of read syllables per second, number and type of errors, etc. The system we have developed is aimed to support the teachers in this task by automatically detecting the reading errors and estimating the needed measures. To do this we use an automatic speech-totext transcription system that employs a language model (LM) trained over the texts containing the stories to read. In addition, we embed in the LM an error model that allows to take into account typical reading errors, mostly consisting in pronunciation errors, substitutions of syllables or words, word truncation, etc. To evaluate the performance of our system we collected 20 audio recordings, uttered by 8-13 years old children, reading a novel belonging to "Prove MT" set. It is worth mentioning that the error model proposed in this paper for assessing the reading capabilities of children performs closely to an "oracle" error model obtained from manual transcriptions of the readings themselves.
7th ISCA Workshop on Speech and Language Technology in Education, Aug 25, 2017
In this paper we describe a system for analyzing the reading errors made by children of the prima... more In this paper we describe a system for analyzing the reading errors made by children of the primary and middle schools. To assess the reading skills of children in terms of reading accuracy and speed, a standard reading achievement test, developed by educational psychologists and named "Prove MT" (MT reading test), is used in the Italian schools. This test is based on a set of texts specific for different ages, from 7 to 13 years old. At present, during the test, children are asked to read aloud short stories, while teachers manually write down the reading errors on a sheet and then compute a total score based on several measures, such as duration of the whole reading, number of read syllables per second, number and type of errors, etc. The system we have developed is aimed to support the teachers in this task by automatically detecting the reading errors and estimating the needed measures. To do this we use an automatic speech-totext transcription system that employs a language model (LM) trained over the texts containing the stories to read. In addition, we embed in the LM an error model that allows to take into account typical reading errors, mostly consisting in pronunciation errors, substitutions of syllables or words, word truncation, etc. To evaluate the performance of our system we collected 20 audio recordings, uttered by 8-13 years old children, reading a novel belonging to "Prove MT" set. It is worth mentioning that the error model proposed in this paper for assessing the reading capabilities of children performs closely to an "oracle" error model obtained from manual transcriptions of the readings themselves.
Lrec, 2010
The EC-funded project DICIT developed distant-talking interfaces for interactive TV. The final DI... more The EC-funded project DICIT developed distant-talking interfaces for interactive TV. The final DICIT prototype system processes multimodal user input by speech and remote control. It was designed to understand both natural language and command-and-control-style speech input. We conducted an evaluation campaign to examine the usability and performance of the prototype. The task-oriented evaluation involved naïve test persons and consisted of a subjective part with a usability questionnaire and an objective part. We used three groups of objective metrics to assess the system: one group related to speech component performance, one related to interface design and user awareness, and a final group related to task-based effectiveness and usability. These metrics were acquired with a dedicated transcription and annotation tool. The evaluation revealed a quite positive subjective assessments of the system and reasonable objective results. We report how the objective metrics helped us to determine problems in specific areas and to distinguish design-related issues from technical problems. The metrics computed over modality-specific groups also show that speech input gives a usability advantage over remote control for certain types of tasks.
This paper describes a multi-microphone multi-language acoustic corpus being developed under the ... more This paper describes a multi-microphone multi-language acoustic corpus being developed under the EC project Distant-speech Interaction for Robust Home Applications (DIRHA). The corpus is composed of several sequences obtained by convolution of dry acoustic events with more than 9000 impulse responses measured in a real apartment equipped with 40 microphones. The acoustic events include in-domain sentences of different typologies uttered by native speakers in four different languages and non-speech events representing typical domestic noises. To increase the realism of the resulting corpus, background noises were recorded in the real home environment and then added to the generated sequences. The purpose of this work is to describe the simulation procedure and the data sets that were created and used to derive the corpus. The corpus contains signals of different characteristics making it suitable for various multi-microphone signal processing and distant speech recognition tasks.
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015
This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone c... more This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.
Springer eBooks, 1998
The object oriented paradigm has become widely used to develop large information systems. This pa... more The object oriented paradigm has become widely used to develop large information systems. This paper presents a method for estimating the size and effort of developing object oriented software. The approach is analogous to function points, and it is based on counting rules that pick up the elements in a static object model and combine them in order to produce a composite measure. Rules are proposed for counting "Object Oriented Function Points" from an object model, and several questions are identified for empirical research. A key aspect of this method is its flexibility. An organization can experiment with different counting policies, to find the most accurate predictors of size, effort, etc. in its environment. "Object Oriented Function Points" counting has been implemented in a Ja~ tool, and results on size estimation obtained from a pilot project with an industrial partner are encouraging.
Lecture Notes in Computer Science, 2023
Multi-microphone data collection and WOZ experiments for the analysis of user behaviour in the DI... more Multi-microphone data collection and WOZ experiments for the analysis of user behaviour in the DICIT scenarios Authors: Affiliations:
Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems
Analysis models, technologies and tools are extensively used in the automotive domain to validate... more Analysis models, technologies and tools are extensively used in the automotive domain to validate and optimize the design and implementation of SW systems. This is especially true for modern systems including advanced autonomous (and complex) features. The range of analysis methods that can be applied is extremely wide and goes from functional correctness to functional safety to timing (and schedulability), security, and possibly even more. The AUTOSAR automotive standard has been defined with the purpose of standardizing the SW architecture of automotive systems and enable the construction of systems by composing SW components that are portable and abstract with respect to the underlying HW/SW platform. However, AUTOSAR was originally developed with portability of code in mind, and even if it quickly evolved to include a system-level modeling language (with its metamodel) and later extensions to deal with the needs of analysis methods (and tools), it is hardly comprehensive and still affected by several omissions and limitations. To fix the limitations with respect to timing and schedulability analysis Bosch developed the Amalthea (later App4MC) metamodel and tools. In Huawei, a more general (and ambitious) approach was undertaken to support not only timing analysis, but also model checking (or other types of formal verification), safety analysis and even design optimization. The approach is based on the concepts of a unified (modular) metamodel and a framework based on Eclipse to integrate analysis methods and tools. In this paper we describe the framework and the results obtained
This convention paper has been reproduced from the author’s advance manuscript, without editing, ... more This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be
Procedia Computer Science, 2011
International Conference on Language Resources and Evaluation, 2008
This paper describes a multichannel acoustic data collection recorded under the European DICIT pr... more This paper describes a multichannel acoustic data collection recorded under the European DICIT project, during the Wizard of Oz (WOZ) experiments carried out at FAU and FBK-irst laboratories. The scenario is a distant-talking interface for interactive control of a TV. The experiments involve the acquisition of multichannel data for signal processing front-end and were carried out due to the need to collect a database for testing acoustic pre-processing algorithms. In this way, realistic scenarios can be simulated at a preliminary stage, instead of real-time implementations, allowing for repeatable experiments. To match the project requirements, the WOZ experiments were recorded in three languages: English, German and Italian. Besides the user inputs, the database also contains non-speech related acoustic events, room impulse response measurements and video data, the latter used to compute 3D labels. Sessions were manually transcribed and segmented at word level, introducing also specific labels for acoustic events.
This paper describes the activity of annotation of an Italian corpus of in-car speech material, w... more This paper describes the activity of annotation of an Italian corpus of in-car speech material, with specific reference to the JavaSgram tool, developed with the purpose of annotating multichannel speech corpora. Some pre/post processing tools used with JavaSgram are briefly described together with a synthetic description of the annotation criteria which were adopted. The final objective is that of using the resulting corpus for training and testing a hands-free speech recognizer under development.
Journal of The Audio Engineering Society, 2011
SCENIC is an EC-funded project aimed at developing a harmonized corpus of methodologies for envir... more SCENIC is an EC-funded project aimed at developing a harmonized corpus of methodologies for environmentaware acoustic sensing and rendering. The project focusses on space-time acoustic processing solutions that do not just accommodate the environment in the modeling process but that make the environment help towards achieving the goal at hand. The solutions developed within this project cover a wide range of applications, including acoustic self-calibration, aimed at estimating the parameters of the acoustic system; environment inference, aimed at identifying and characterizing all the relevant acoustic reflectors in the environment. The information gathered through such steps is then used to boost the performance of wavefield rendering methods as well as source localization/characterization/extraction in reverberant environments. AES Annibale, Antonacci, et. al. SCENIC Project: Environment-Aware Space-Time Audio Processing AES 131st Convention, New York, NY, USA, 2011 October 20–23...
This paper describes some activities being conducted at IRST with the aim of developing a technol... more This paper describes some activities being conducted at IRST with the aim of developing a technology for hands-free speech recognition in car environment. This technology is based on Hidden Markov Models and is being developed and evaluated by using the car database collected in the European projects SpeechDatCar and VODIS-II. Preliminary experiments are based on the use of filtered clean speech corpora for HMM training and on the application of MLLR adaptation to further reduce the mismatch between training and testing conditions. Results are promising but show the difficulty of this task, even when exploiting some material collected in the test environment for HMM adaptation.
This demo demonstrates a small and portable system embedding microphone array processing and robu... more This demo demonstrates a small and portable system embedding microphone array processing and robust speech recognition for distant-speech interaction to control a floor lamp. The system is entirely contained inside the lamp and operates in real-time, “always-listening” mode. It runs on a small, low-power, fanless board and acts as the light control interface. The prototype shows the feasibility, potential and limits of the integration of speech technology in devices of everyday use.
7th ISCA Workshop on Speech and Language Technology in Education (SLaTE 2017)
In this paper we describe a system for analyzing the reading errors made by children of the prima... more In this paper we describe a system for analyzing the reading errors made by children of the primary and middle schools. To assess the reading skills of children in terms of reading accuracy and speed, a standard reading achievement test, developed by educational psychologists and named "Prove MT" (MT reading test), is used in the Italian schools. This test is based on a set of texts specific for different ages, from 7 to 13 years old. At present, during the test, children are asked to read aloud short stories, while teachers manually write down the reading errors on a sheet and then compute a total score based on several measures, such as duration of the whole reading, number of read syllables per second, number and type of errors, etc. The system we have developed is aimed to support the teachers in this task by automatically detecting the reading errors and estimating the needed measures. To do this we use an automatic speech-totext transcription system that employs a language model (LM) trained over the texts containing the stories to read. In addition, we embed in the LM an error model that allows to take into account typical reading errors, mostly consisting in pronunciation errors, substitutions of syllables or words, word truncation, etc. To evaluate the performance of our system we collected 20 audio recordings, uttered by 8-13 years old children, reading a novel belonging to "Prove MT" set. It is worth mentioning that the error model proposed in this paper for assessing the reading capabilities of children performs closely to an "oracle" error model obtained from manual transcriptions of the readings themselves.
7th ISCA Workshop on Speech and Language Technology in Education, Aug 25, 2017
In this paper we describe a system for analyzing the reading errors made by children of the prima... more In this paper we describe a system for analyzing the reading errors made by children of the primary and middle schools. To assess the reading skills of children in terms of reading accuracy and speed, a standard reading achievement test, developed by educational psychologists and named "Prove MT" (MT reading test), is used in the Italian schools. This test is based on a set of texts specific for different ages, from 7 to 13 years old. At present, during the test, children are asked to read aloud short stories, while teachers manually write down the reading errors on a sheet and then compute a total score based on several measures, such as duration of the whole reading, number of read syllables per second, number and type of errors, etc. The system we have developed is aimed to support the teachers in this task by automatically detecting the reading errors and estimating the needed measures. To do this we use an automatic speech-totext transcription system that employs a language model (LM) trained over the texts containing the stories to read. In addition, we embed in the LM an error model that allows to take into account typical reading errors, mostly consisting in pronunciation errors, substitutions of syllables or words, word truncation, etc. To evaluate the performance of our system we collected 20 audio recordings, uttered by 8-13 years old children, reading a novel belonging to "Prove MT" set. It is worth mentioning that the error model proposed in this paper for assessing the reading capabilities of children performs closely to an "oracle" error model obtained from manual transcriptions of the readings themselves.
7th ISCA Workshop on Speech and Language Technology in Education, Aug 25, 2017
In this paper we describe a system for analyzing the reading errors made by children of the prima... more In this paper we describe a system for analyzing the reading errors made by children of the primary and middle schools. To assess the reading skills of children in terms of reading accuracy and speed, a standard reading achievement test, developed by educational psychologists and named "Prove MT" (MT reading test), is used in the Italian schools. This test is based on a set of texts specific for different ages, from 7 to 13 years old. At present, during the test, children are asked to read aloud short stories, while teachers manually write down the reading errors on a sheet and then compute a total score based on several measures, such as duration of the whole reading, number of read syllables per second, number and type of errors, etc. The system we have developed is aimed to support the teachers in this task by automatically detecting the reading errors and estimating the needed measures. To do this we use an automatic speech-totext transcription system that employs a language model (LM) trained over the texts containing the stories to read. In addition, we embed in the LM an error model that allows to take into account typical reading errors, mostly consisting in pronunciation errors, substitutions of syllables or words, word truncation, etc. To evaluate the performance of our system we collected 20 audio recordings, uttered by 8-13 years old children, reading a novel belonging to "Prove MT" set. It is worth mentioning that the error model proposed in this paper for assessing the reading capabilities of children performs closely to an "oracle" error model obtained from manual transcriptions of the readings themselves.
Lrec, 2010
The EC-funded project DICIT developed distant-talking interfaces for interactive TV. The final DI... more The EC-funded project DICIT developed distant-talking interfaces for interactive TV. The final DICIT prototype system processes multimodal user input by speech and remote control. It was designed to understand both natural language and command-and-control-style speech input. We conducted an evaluation campaign to examine the usability and performance of the prototype. The task-oriented evaluation involved naïve test persons and consisted of a subjective part with a usability questionnaire and an objective part. We used three groups of objective metrics to assess the system: one group related to speech component performance, one related to interface design and user awareness, and a final group related to task-based effectiveness and usability. These metrics were acquired with a dedicated transcription and annotation tool. The evaluation revealed a quite positive subjective assessments of the system and reasonable objective results. We report how the objective metrics helped us to determine problems in specific areas and to distinguish design-related issues from technical problems. The metrics computed over modality-specific groups also show that speech input gives a usability advantage over remote control for certain types of tasks.
This paper describes a multi-microphone multi-language acoustic corpus being developed under the ... more This paper describes a multi-microphone multi-language acoustic corpus being developed under the EC project Distant-speech Interaction for Robust Home Applications (DIRHA). The corpus is composed of several sequences obtained by convolution of dry acoustic events with more than 9000 impulse responses measured in a real apartment equipped with 40 microphones. The acoustic events include in-domain sentences of different typologies uttered by native speakers in four different languages and non-speech events representing typical domestic noises. To increase the realism of the resulting corpus, background noises were recorded in the real home environment and then added to the generated sequences. The purpose of this work is to describe the simulation procedure and the data sets that were created and used to derive the corpus. The corpus contains signals of different characteristics making it suitable for various multi-microphone signal processing and distant speech recognition tasks.
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015
This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone c... more This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.