Walcir Cardoso - Academia.edu (original) (raw)
Papers by Walcir Cardoso
This study explores the potential of Automatic Speech Recognition (ASR) as a writing tool by inve... more This study explores the potential of Automatic Speech Recognition (ASR) as a writing tool by investigating user behaviours (strategies henceforth) and text quality (lexical diversity) when users engage with the technology. Thirty English second language writers dictated texts into an ASR system (Google Voice Typing) while also using optional additional input devices, such as keyboards and mice. Analysis of video recordings and field observations revealed four strategies employed by users to produce texts: use of ASR exclusively, ASR in tandem with keyboarding, ASR followed by keyboarding, and ASR followed by both keyboarding and ASR. These strategies reflected cognitive differences and text generation challenges. Text quality was operationalized through lexical diversity metrics. Results showed that ASR use in tandem with keyboarding and ASR followed by both keyboarding and ASR yielded greater lexical diversity, whereas the use of ASR exclusively or ASR followed by keyboarding had lower diversity. Findings suggest that the integrated use of ASR and keyboarding activates dual channels, thus dispersing cognitive load and possibly improving text quality (i.e. lexical diversity). This exploratory study demonstrates potential for ASR as a complementary writing tool and lays groundwork for further research on the strategic integration of ASR and keyboarding to improve the quality of written texts.
This case study examines the self-regulated use of Pimsleur, a Language Learning Platform (LLP), ... more This case study examines the self-regulated use of Pimsleur, a Language Learning Platform (LLP), as a tool to aid in the acquisition of spoken phrases in Brazilian Portuguese (BP) and their related pronunciation. Like many LLPs, research on Pimsleur is scant, as is the number of studies done on BP compared to other major languages. This study aims to address this gap in research. The participant-researcher completed the Pimsleur program through daily study over a 10-week period, after which quantitative data were collected through a post-test and delayed post-test. The results showed that Pimsleur contributed to the learning of the target phrases in the short term and that the participant produced speech that was highly intelligible, moderately comprehensible, but heavily accented. This shows that Pimsleur can be an effective tool for developing spoken BP and can offer a unique learning experience with its methodology and mobile capability that mitigates some of the issues around mobile-assisted language learning (e.g. app attrition).
Improvements in Automatic Speech Recognition (ASR) have created opportunities for using it as a t... more Improvements in Automatic Speech Recognition (ASR) have created opportunities for using it as a tool to facilitate second and foreign language (L2) assessment. These technical improvements have not only enabled automation of language proficiency test scoring but also reduced evaluator bias and errors, decreased processing time, and lowered costs for testing organizations. The purpose of this study was to evaluate English as a Second Language (ESL) pronunciation using the ASR feature in the Microsoft 365 product suite, Transcribe (MS-T). The study involved adult ESL learners at a Canadian university that partook in a language proficiency test. We examined the audio recordings of 56 candidates during the pronunciation portion of the test. Building on previous studies that found a strong correlation between scores from Google Voice Typing and human raters, the current study conducted a similar analysis comparing scores derived from MS-T to both human ratings and Google Voice Typing. Our findings indicate that the ASR capabilities of MS-T, similar to Google Voice Typing, can assume an important role in L2 speaking assessment by providing objectivity and reliability to the testing process, expediting scoring, and reducing costs.
This paper has two objectives. In Part 1, we report the findings of a mixed-method study that exa... more This paper has two objectives. In Part 1, we report the findings of a mixed-method study that examines the pronunciation needs of non-francophone immigrants in Quebec after they complete the Program for the Linguistic Integration of Immigrants (known as "francization"), a language learning initiative to equip non-French-speaking immigrants with essential French skills. The findings indicate a noticeable disparity between the instruction provided to learners in the program and their practical requirements in real-life situations, and a strong need by the participants to improve their pronunciation autonomously post francization. Part 2 of the study addresses the pedagogical implications of these findings, in which we address our participants' needs with a set of technology-enhanced pedagogical recommendations for blended and autonomous learning.
Beyond the walls of classrooms: Exploring the pedagogical effectiveness of text-to-speech-based s... more Beyond the walls of classrooms: Exploring the pedagogical effectiveness of text-to-speech-based shadowing on the development of Mandarin tones. In CALL for all Languages-EUROCALL 2023 Short Papers.
Following previous research into predictable sentence contexts, this study assesses the pronuncia... more Following previous research into predictable sentence contexts, this study assesses the pronunciation feedback provided by Google Translate's (GT) Automatic Speech Recognition (ASR) in unpredictable contexts. We examined the accuracy of GT transcriptions for target items recorded by male and female Quebec Francophones (QFs). The items occurred in neutral carrier sentences such that no contextual cues help ASR identify the targets. Th-initial vs t-initial (thanktank) and h-initial vs vowel-initial (heat-eat) items were used to investigate the potential for feedback on the QF errors of th-substitution, h-deletion, and h-epenthesis, comparing real-word (thank→tank) vs nonword output (thief→tief). As with predictable contexts in our previous research, we observed high transcription accuracy for real words only. Without contextual cues, accuracy rates were lower than in predictable contexts for correctly pronounced items but higher than for incorrect pronunciations constituting real words. Unpredictable contexts are thus inferior at confirming correct pronunciation (confirmative feedback) but superior at flagging real-word errors (corrective feedback). Contrary to the anticipated ASR gender bias, female recordings showed higher transcription accuracy than male recordings. Our findings both confirm the usefulness of GT's ASR for generating pronunciation feedback and highlight the importance of context (predictable vs unpredictable) and lexical status (real vs nonword).
the CALICO Journal, Aug 25, 2021
This pilot study examines users’ perceptions of Bande à Part, a music application designed for le... more This pilot study examines users’ perceptions of Bande à Part, a music application designed for learners of French. The technology acceptance model (TAM) was adopted to investigate users’ perceptions of the app’s usability and potential for second language (L2) learning. The model’s two constructs, perceived usefulness and perceived ease of use, and one added factor, perceived enjoyment, formed the main predictors of users’ intentions to continue using the app. Mean scores for the predictors were: perceived usefulness = 4.27/6, perceived ease of use = 3.88/6, and perceived enjoyment = 3.95/6, which are confirmed by the survey results that show that 10 of 13 participants intend to continue using the app. Qualitative results suggest that the app enhances users’ ability to notice targeted forms in the musical input (e.g., liaison, gender) and, corroborating the quantitative data, suggest that users find the features in the app useful. Several comments also indicate that the ease of use could be improved (e.g., improved mobile device access). This study helps to establish the TAM in Computer-Assisted Language Learning (CALL) literature and forms the basis for future work evaluating how songs aid L2 acquisition.
EDULEARN proceedings, Jul 1, 2022
De Gruyter eBooks, Dec 5, 2022
The struggles that second language learners experience when navigating the sociophonetic variabil... more The struggles that second language learners experience when navigating the sociophonetic variability of speakers is often explained by the lack of exposure to varied input in the classroom because of its emphasis on teaching (usually invariable) standard varieties. In the realm of French as a second language (FSL) learning, our understanding of variational input in the classroom comes primarily from textbook studies; little empirical evidence has quantified the amount and kind of social speech markers (e.g., age) found in the FSL audiovisual curriculum. This study examines the audiovisual input of two FSL classrooms. Interviews and a questionnaire were used to elicit FSL instructors' criteria for selecting input, and their experiences with and attitude towards including variation in their lessons. Additionally, an analysis of the audiovisual input derived from one semester of each instructor was categorized by clip length and by five social markers: age, gender, race, region, and native speaker status. Results showed that the instructors held positive viewpoints towards including variation; however, the audiovisual input in both settings was invariant across multiple social markers, accounting for less than 5% of total class time. Suggestions for incorporating more varied input in the language learning curriculum will be discussed.
Springer eBooks, Oct 15, 2018
Computer Assisted Language Learning, Mar 27, 2022
This study explores the potential of Automatic Speech Recognition (ASR) as a writing tool by inve... more This study explores the potential of Automatic Speech Recognition (ASR) as a writing tool by investigating user behaviours (strategies henceforth) and text quality (lexical diversity) when users engage with the technology. Thirty English second language writers dictated texts into an ASR system (Google Voice Typing) while also using optional additional input devices, such as keyboards and mice. Analysis of video recordings and field observations revealed four strategies employed by users to produce texts: use of ASR exclusively, ASR in tandem with keyboarding, ASR followed by keyboarding, and ASR followed by both keyboarding and ASR. These strategies reflected cognitive differences and text generation challenges. Text quality was operationalized through lexical diversity metrics. Results showed that ASR use in tandem with keyboarding and ASR followed by both keyboarding and ASR yielded greater lexical diversity, whereas the use of ASR exclusively or ASR followed by keyboarding had lower diversity. Findings suggest that the integrated use of ASR and keyboarding activates dual channels, thus dispersing cognitive load and possibly improving text quality (i.e. lexical diversity). This exploratory study demonstrates potential for ASR as a complementary writing tool and lays groundwork for further research on the strategic integration of ASR and keyboarding to improve the quality of written texts.
This case study examines the self-regulated use of Pimsleur, a Language Learning Platform (LLP), ... more This case study examines the self-regulated use of Pimsleur, a Language Learning Platform (LLP), as a tool to aid in the acquisition of spoken phrases in Brazilian Portuguese (BP) and their related pronunciation. Like many LLPs, research on Pimsleur is scant, as is the number of studies done on BP compared to other major languages. This study aims to address this gap in research. The participant-researcher completed the Pimsleur program through daily study over a 10-week period, after which quantitative data were collected through a post-test and delayed post-test. The results showed that Pimsleur contributed to the learning of the target phrases in the short term and that the participant produced speech that was highly intelligible, moderately comprehensible, but heavily accented. This shows that Pimsleur can be an effective tool for developing spoken BP and can offer a unique learning experience with its methodology and mobile capability that mitigates some of the issues around mobile-assisted language learning (e.g. app attrition).
Improvements in Automatic Speech Recognition (ASR) have created opportunities for using it as a t... more Improvements in Automatic Speech Recognition (ASR) have created opportunities for using it as a tool to facilitate second and foreign language (L2) assessment. These technical improvements have not only enabled automation of language proficiency test scoring but also reduced evaluator bias and errors, decreased processing time, and lowered costs for testing organizations. The purpose of this study was to evaluate English as a Second Language (ESL) pronunciation using the ASR feature in the Microsoft 365 product suite, Transcribe (MS-T). The study involved adult ESL learners at a Canadian university that partook in a language proficiency test. We examined the audio recordings of 56 candidates during the pronunciation portion of the test. Building on previous studies that found a strong correlation between scores from Google Voice Typing and human raters, the current study conducted a similar analysis comparing scores derived from MS-T to both human ratings and Google Voice Typing. Our findings indicate that the ASR capabilities of MS-T, similar to Google Voice Typing, can assume an important role in L2 speaking assessment by providing objectivity and reliability to the testing process, expediting scoring, and reducing costs.
This paper has two objectives. In Part 1, we report the findings of a mixed-method study that exa... more This paper has two objectives. In Part 1, we report the findings of a mixed-method study that examines the pronunciation needs of non-francophone immigrants in Quebec after they complete the Program for the Linguistic Integration of Immigrants (known as "francization"), a language learning initiative to equip non-French-speaking immigrants with essential French skills. The findings indicate a noticeable disparity between the instruction provided to learners in the program and their practical requirements in real-life situations, and a strong need by the participants to improve their pronunciation autonomously post francization. Part 2 of the study addresses the pedagogical implications of these findings, in which we address our participants' needs with a set of technology-enhanced pedagogical recommendations for blended and autonomous learning.
Beyond the walls of classrooms: Exploring the pedagogical effectiveness of text-to-speech-based s... more Beyond the walls of classrooms: Exploring the pedagogical effectiveness of text-to-speech-based shadowing on the development of Mandarin tones. In CALL for all Languages-EUROCALL 2023 Short Papers.
Following previous research into predictable sentence contexts, this study assesses the pronuncia... more Following previous research into predictable sentence contexts, this study assesses the pronunciation feedback provided by Google Translate's (GT) Automatic Speech Recognition (ASR) in unpredictable contexts. We examined the accuracy of GT transcriptions for target items recorded by male and female Quebec Francophones (QFs). The items occurred in neutral carrier sentences such that no contextual cues help ASR identify the targets. Th-initial vs t-initial (thanktank) and h-initial vs vowel-initial (heat-eat) items were used to investigate the potential for feedback on the QF errors of th-substitution, h-deletion, and h-epenthesis, comparing real-word (thank→tank) vs nonword output (thief→tief). As with predictable contexts in our previous research, we observed high transcription accuracy for real words only. Without contextual cues, accuracy rates were lower than in predictable contexts for correctly pronounced items but higher than for incorrect pronunciations constituting real words. Unpredictable contexts are thus inferior at confirming correct pronunciation (confirmative feedback) but superior at flagging real-word errors (corrective feedback). Contrary to the anticipated ASR gender bias, female recordings showed higher transcription accuracy than male recordings. Our findings both confirm the usefulness of GT's ASR for generating pronunciation feedback and highlight the importance of context (predictable vs unpredictable) and lexical status (real vs nonword).
the CALICO Journal, Aug 25, 2021
This pilot study examines users’ perceptions of Bande à Part, a music application designed for le... more This pilot study examines users’ perceptions of Bande à Part, a music application designed for learners of French. The technology acceptance model (TAM) was adopted to investigate users’ perceptions of the app’s usability and potential for second language (L2) learning. The model’s two constructs, perceived usefulness and perceived ease of use, and one added factor, perceived enjoyment, formed the main predictors of users’ intentions to continue using the app. Mean scores for the predictors were: perceived usefulness = 4.27/6, perceived ease of use = 3.88/6, and perceived enjoyment = 3.95/6, which are confirmed by the survey results that show that 10 of 13 participants intend to continue using the app. Qualitative results suggest that the app enhances users’ ability to notice targeted forms in the musical input (e.g., liaison, gender) and, corroborating the quantitative data, suggest that users find the features in the app useful. Several comments also indicate that the ease of use could be improved (e.g., improved mobile device access). This study helps to establish the TAM in Computer-Assisted Language Learning (CALL) literature and forms the basis for future work evaluating how songs aid L2 acquisition.
EDULEARN proceedings, Jul 1, 2022
De Gruyter eBooks, Dec 5, 2022
The struggles that second language learners experience when navigating the sociophonetic variabil... more The struggles that second language learners experience when navigating the sociophonetic variability of speakers is often explained by the lack of exposure to varied input in the classroom because of its emphasis on teaching (usually invariable) standard varieties. In the realm of French as a second language (FSL) learning, our understanding of variational input in the classroom comes primarily from textbook studies; little empirical evidence has quantified the amount and kind of social speech markers (e.g., age) found in the FSL audiovisual curriculum. This study examines the audiovisual input of two FSL classrooms. Interviews and a questionnaire were used to elicit FSL instructors' criteria for selecting input, and their experiences with and attitude towards including variation in their lessons. Additionally, an analysis of the audiovisual input derived from one semester of each instructor was categorized by clip length and by five social markers: age, gender, race, region, and native speaker status. Results showed that the instructors held positive viewpoints towards including variation; however, the audiovisual input in both settings was invariant across multiple social markers, accounting for less than 5% of total class time. Suggestions for incorporating more varied input in the language learning curriculum will be discussed.
Springer eBooks, Oct 15, 2018
Computer Assisted Language Learning, Mar 27, 2022