Renu Balyan | Arizona State University (original) (raw)
Papers by Renu Balyan
Communications in computer and information science, 2023
Lecture Notes in Computer Science, 2022
2022 International Conference on Computational Science and Computational Intelligence (CSCI)
International Journal of Human-Computer Studies
Zenodo (CERN European Organization for Nuclear Research), Jul 18, 2022
The National Council of Teachers of Mathematics (NCTM) has been emphasizing the importance of tea... more The National Council of Teachers of Mathematics (NCTM) has been emphasizing the importance of teachers' pedagogical communication as part of mathematical teaching and learning for decades. Specifically, NCTM has provided guidance on how teachers can foster mathematical communication that positively impacts student learning. A teacher may have different academic goals towards what needs to be achieved in a classroom, which require a variety of discourse-based tools that allow students to engage fully in mathematical thinking and reasoning. Accountable or academically productive talk is one such approach for classroom discourse that may ensure that the discussions are coherent, purposeful and productive. This paper discusses the use of a transformer model for classifying classroom talk moves based on the accountable talk framework. We investigate the extent to which the classroom Accountable Talk framework can be successfully applied to one-onone online mathematics tutoring environments. We further propose a framework adapted from Accountable Talk, but more specifically aligned to one-on-one online tutoring. The model performance for the proposed framework is evaluated and compared with a small sample of expert coding. The results obtained from the proposed framework for one-on-one tutoring are promising and improve classification performance of the talk moves for our dataset.
Computers & Education
Practice and Experience in Advanced Research Computing
Computers
Academic discourse communities and learning circles are characterized by collaboration, sharing c... more Academic discourse communities and learning circles are characterized by collaboration, sharing commonalities in terms of social interactions and language. The discourse of these communities is composed of jargon, common terminologies, and similarities in how they construe and communicate meaning. This study examines the extent to which discourse reveals “shared language” among its participants that can promote inclusion or affinity. Shared language is characterized in terms of linguistic features and lexical, syntactical, and semantic similarities. We leverage a multi-method approach, including (1) feature engineering using state-of-the-art natural language processing techniques to select the most appropriate features, (2) the bag-of-words classification model to predict linguistic similarity, (3) explainable AI using the local interpretable model-agnostic explanations to explain the model, and (4) a two-step cluster analysis to extract innate groupings between linguistic similarit...
While hierarchical machine learning approaches have been used to classify texts into different co... more While hierarchical machine learning approaches have been used to classify texts into different content areas, this approach has, to our knowledge, not been used in the automated assessment of text difficulty. This study compared the accuracy of four classification machine learning approaches (flat, one-vs-one, one-vs-all, and hierarchical) using natural language processing features in predicting human ratings of text difficulty for two sets of texts. The hierarchical classification was the most accurate for the two text sets considered individually (Set A, 77.78%; Set B, 82.05%), while the nonhierarchical approaches, one-vs-one and one-vs-all, performed similar to the hierarchical classification for the combined set (71.43%). These findings suggest both promise and limitations for applying hierarchical approaches to text difficulty classification. It may be beneficial to apply a recursive top-down approach to discriminate the subsets of classes that are at the top of the hierarchy a...
CRC Press eBooks, Aug 9, 2022
This study leverages natural language processing to assess dimensions of language and discourse i... more This study leverages natural language processing to assess dimensions of language and discourse in students’ discussion board posts and comments within an online learning platform, Math Nation. This study focusses on 1,035 students whose aggregated posts included more than 100 words. Students’ wall post discourse was assessed using two linguistic tools, Coh-Metrix and SEANCE, which report linguistic indices related to language sophistication, cohesion, and sentiment. A linear model including prior math scores (i.e., Mathematics Florida Standards Assessments), grade level, semantic overlap (i.e., LSA givenness), incidence of pronouns, and noun hypernymy accounted for 64.48% of the variance for the Algebra I end of course scores (RMSE=13.73). Students with stronger course outcomes used more sophisticated language, across a wider range of topics, and with less personalized language. Overall, this study confirms the contributions of language and communication skills over and above prior...
Proceedings of the Ninth ACM Conference on Learning @ Scale
This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languag... more This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languages, English to Hindi translation in particular. Evaluation of MT output is an important but difficult task. The difficulty arises primarily from some inherent characteristics of the language pairs, which range from simple word-level discrepancies to more difficult structural variations for Hindi from English, such as reduplication of words, free word order etc. The proposed scheme is based on identification of linguistic units (often referred to as checkpoints). We use the diagnostic evaluation tool DELiC4MT to analyze the contribution of various PoS classes for different categories. We further suggest some additional checkpoints based on named entities, ambiguous words, word order and inflections that are relevant for the evaluation of Hindi. The evaluation of these checkpoints provides a detailed analysis and helps in monitoring how an MT system handles these linguistic phenomena as we...
Science Advances, 2021
Description
Journal of biomedical informatics, 2021
OBJECTIVE In the National Library of Medicine funded ECLIPPSE Project (Employing Computational Li... more OBJECTIVE In the National Library of Medicine funded ECLIPPSE Project (Employing Computational Linguistics to Improve Patient-Provider Secure Emails exchange), we attempted to create novel, valid, and scalable measures of both patients' health literacy (HL) and physicians' linguistic complexity by employing natural language processing (NLP) techniques and machine learning (ML). We applied these techniques to >400,000 patients' and physicians' secure messages (SMs) exchanged via an electronic patient portal, developing and validating an automated patient literacy profile (LP) and physician complexity profile (CP). Herein, we describe the challenges faced and the solutions implemented during this innovative endeavor. MATERIALS AND METHODS To describe challenges and solutions, we used two data sources: study documents and interviews with study investigators. Over the five years of the project, the team tracked their research process using a combination of Google Docs...
Artificial Intelligence in Education, 2018
Intelligent Tutoring Systems (ITSs) focus on promoting knowledge acquisition, while providing rel... more Intelligent Tutoring Systems (ITSs) focus on promoting knowledge acquisition, while providing relevant feedback during students’ practice. Self-explanation practice is an effective method used to help students understand complex texts by leveraging comprehension. Our aim is to introduce a deep learning neural model for automatically scoring student self-explanations that are targeted at specific sentences. The first stage of the processing pipeline performs an initial text cleaning and applies a set of predefined rules established by human experts in order to identify specific cases (e.g., students who do not understand the text, or students who simply copy and paste their self-explanations from the given input text). The second step uses a Recurrent Neural Network with pre-trained Glove word embeddings to predict self-explanation scores on a scale of 1 to 3. In contrast to previous SVM models trained on the same dataset of 4109 self-explanations, we obtain a significant increase of...
Communications in computer and information science, 2023
Lecture Notes in Computer Science, 2022
2022 International Conference on Computational Science and Computational Intelligence (CSCI)
International Journal of Human-Computer Studies
Zenodo (CERN European Organization for Nuclear Research), Jul 18, 2022
The National Council of Teachers of Mathematics (NCTM) has been emphasizing the importance of tea... more The National Council of Teachers of Mathematics (NCTM) has been emphasizing the importance of teachers' pedagogical communication as part of mathematical teaching and learning for decades. Specifically, NCTM has provided guidance on how teachers can foster mathematical communication that positively impacts student learning. A teacher may have different academic goals towards what needs to be achieved in a classroom, which require a variety of discourse-based tools that allow students to engage fully in mathematical thinking and reasoning. Accountable or academically productive talk is one such approach for classroom discourse that may ensure that the discussions are coherent, purposeful and productive. This paper discusses the use of a transformer model for classifying classroom talk moves based on the accountable talk framework. We investigate the extent to which the classroom Accountable Talk framework can be successfully applied to one-onone online mathematics tutoring environments. We further propose a framework adapted from Accountable Talk, but more specifically aligned to one-on-one online tutoring. The model performance for the proposed framework is evaluated and compared with a small sample of expert coding. The results obtained from the proposed framework for one-on-one tutoring are promising and improve classification performance of the talk moves for our dataset.
Computers & Education
Practice and Experience in Advanced Research Computing
Computers
Academic discourse communities and learning circles are characterized by collaboration, sharing c... more Academic discourse communities and learning circles are characterized by collaboration, sharing commonalities in terms of social interactions and language. The discourse of these communities is composed of jargon, common terminologies, and similarities in how they construe and communicate meaning. This study examines the extent to which discourse reveals “shared language” among its participants that can promote inclusion or affinity. Shared language is characterized in terms of linguistic features and lexical, syntactical, and semantic similarities. We leverage a multi-method approach, including (1) feature engineering using state-of-the-art natural language processing techniques to select the most appropriate features, (2) the bag-of-words classification model to predict linguistic similarity, (3) explainable AI using the local interpretable model-agnostic explanations to explain the model, and (4) a two-step cluster analysis to extract innate groupings between linguistic similarit...
While hierarchical machine learning approaches have been used to classify texts into different co... more While hierarchical machine learning approaches have been used to classify texts into different content areas, this approach has, to our knowledge, not been used in the automated assessment of text difficulty. This study compared the accuracy of four classification machine learning approaches (flat, one-vs-one, one-vs-all, and hierarchical) using natural language processing features in predicting human ratings of text difficulty for two sets of texts. The hierarchical classification was the most accurate for the two text sets considered individually (Set A, 77.78%; Set B, 82.05%), while the nonhierarchical approaches, one-vs-one and one-vs-all, performed similar to the hierarchical classification for the combined set (71.43%). These findings suggest both promise and limitations for applying hierarchical approaches to text difficulty classification. It may be beneficial to apply a recursive top-down approach to discriminate the subsets of classes that are at the top of the hierarchy a...
CRC Press eBooks, Aug 9, 2022
This study leverages natural language processing to assess dimensions of language and discourse i... more This study leverages natural language processing to assess dimensions of language and discourse in students’ discussion board posts and comments within an online learning platform, Math Nation. This study focusses on 1,035 students whose aggregated posts included more than 100 words. Students’ wall post discourse was assessed using two linguistic tools, Coh-Metrix and SEANCE, which report linguistic indices related to language sophistication, cohesion, and sentiment. A linear model including prior math scores (i.e., Mathematics Florida Standards Assessments), grade level, semantic overlap (i.e., LSA givenness), incidence of pronouns, and noun hypernymy accounted for 64.48% of the variance for the Algebra I end of course scores (RMSE=13.73). Students with stronger course outcomes used more sophisticated language, across a wider range of topics, and with less personalized language. Overall, this study confirms the contributions of language and communication skills over and above prior...
Proceedings of the Ninth ACM Conference on Learning @ Scale
This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languag... more This paper addresses diagnostic evaluation of machine translation (MT) systems for Indian languages, English to Hindi translation in particular. Evaluation of MT output is an important but difficult task. The difficulty arises primarily from some inherent characteristics of the language pairs, which range from simple word-level discrepancies to more difficult structural variations for Hindi from English, such as reduplication of words, free word order etc. The proposed scheme is based on identification of linguistic units (often referred to as checkpoints). We use the diagnostic evaluation tool DELiC4MT to analyze the contribution of various PoS classes for different categories. We further suggest some additional checkpoints based on named entities, ambiguous words, word order and inflections that are relevant for the evaluation of Hindi. The evaluation of these checkpoints provides a detailed analysis and helps in monitoring how an MT system handles these linguistic phenomena as we...
Science Advances, 2021
Description
Journal of biomedical informatics, 2021
OBJECTIVE In the National Library of Medicine funded ECLIPPSE Project (Employing Computational Li... more OBJECTIVE In the National Library of Medicine funded ECLIPPSE Project (Employing Computational Linguistics to Improve Patient-Provider Secure Emails exchange), we attempted to create novel, valid, and scalable measures of both patients' health literacy (HL) and physicians' linguistic complexity by employing natural language processing (NLP) techniques and machine learning (ML). We applied these techniques to >400,000 patients' and physicians' secure messages (SMs) exchanged via an electronic patient portal, developing and validating an automated patient literacy profile (LP) and physician complexity profile (CP). Herein, we describe the challenges faced and the solutions implemented during this innovative endeavor. MATERIALS AND METHODS To describe challenges and solutions, we used two data sources: study documents and interviews with study investigators. Over the five years of the project, the team tracked their research process using a combination of Google Docs...
Artificial Intelligence in Education, 2018
Intelligent Tutoring Systems (ITSs) focus on promoting knowledge acquisition, while providing rel... more Intelligent Tutoring Systems (ITSs) focus on promoting knowledge acquisition, while providing relevant feedback during students’ practice. Self-explanation practice is an effective method used to help students understand complex texts by leveraging comprehension. Our aim is to introduce a deep learning neural model for automatically scoring student self-explanations that are targeted at specific sentences. The first stage of the processing pipeline performs an initial text cleaning and applies a set of predefined rules established by human experts in order to identify specific cases (e.g., students who do not understand the text, or students who simply copy and paste their self-explanations from the given input text). The second step uses a Recurrent Neural Network with pre-trained Glove word embeddings to predict self-explanation scores on a scale of 1 to 3. In contrast to previous SVM models trained on the same dataset of 4109 self-explanations, we obtain a significant increase of...