Sahan Bulla Bulathwela | University College London (original) (raw)

Papers by Sahan Bulla Bulathwela

Research paper thumbnail of Could AI Democratise Education? Socio-Technical Imaginaries of an EdTech Revolution

arXiv (Cornell University), Dec 3, 2021

Artificial Intelligence (AI) in Education has been said to have the potential for building more p... more Artificial Intelligence (AI) in Education has been said to have the potential for building more personalised curricula, as well as democratising education worldwide and creating a Renaissance of new ways of teaching and learning. Millions of students are already starting to benefit from the use of these technologies, but millions more around the world are not. If this trend continues, the first delivery of AI in Education could be greater educational inequality, along with a global misallocation of educational resources motivated by the current technological determinism narrative. In this paper, we focus on speculating and posing questions around the future of AI in Education, with the aim of starting the pressing conversation that would set the right foundations for the new generation of education that is permeated by technology. This paper starts by synthesising how AI might change how we learn and teach, focusing specifically on the case of personalised learning companions, and then move to discuss some socio-technical features that will be crucial for avoiding the perils of these AI systems worldwide (and perhaps ensuring their success). This paper also discusses the potential of using AI together with free, participatory and democratic resources, such as Wikipedia, Open Educational Resources and open-source tools. We also emphasise the need for collectively designing human-centered, transparent, interactive and collaborative AI-based algorithms that empower and give complete agency to stakeholders, as well as support new emerging pedagogies. Finally, we ask what would it take for this educational revolution to provide egalitarian and empowering access to education, beyond any political, cultural, language, geographical and learning ability barriers.

Research paper thumbnail of Leveraging Semantic Knowledge Graphs in Educational Recommenders to Address the Cold-Start Problem

CRC Press eBooks, Jun 28, 2023

Research paper thumbnail of Towards Proactive Information Retrieval in Noisy Text with Wikipedia Concepts

arXiv (Cornell University), Oct 18, 2022

Extracting useful information from the user history to clearly understand informational needs is ... more Extracting useful information from the user history to clearly understand informational needs is a crucial feature of a proactive information retrieval system. Regarding understanding information and relevance, Wikipedia can provide the background knowledge that an intelligent system needs. This work explores how exploiting the context of a query using Wikipedia concepts can improve proactive information retrieval on noisy text. We formulate two models that use entity linking to associate Wikipedia topics with the relevance model. Our experiments around a podcast segment retrieval task demonstrate that there is a clear signal of relevance in Wikipedia concepts while a ranking model can improve precision by incorporating them. We also find Wikifying the background context of a query can help disambiguate the meaning of the query, further helping proactive information retrieval.

Research paper thumbnail of Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation Systems

arXiv (Cornell University), Dec 8, 2021

In informational recommenders, many challenges arise from the need to handle the semantic and hie... more In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph, with the aim to better predict learner engagement and latent knowledge in a lifelong learning scenario. In this sense, Semantic TrueLearn builds a humanly intuitive knowledge representation while leveraging Bayesian machine learning to improve the predictive performance of educational engagement. Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance with a simple extension that adds semantic awareness to the model.

Research paper thumbnail of Towards an Integrative Educational Recommender for Lifelong Learners (Student Abstract)

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

One of the most ambitious use cases of computer-assisted learning is to build a recommendation sy... more One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. Related Work Contrary to conventional recommender systems, educational recommenders face different challenges, that stem from attempting to bring learners closer to their goals effectively.

Research paper thumbnail of TrueLearn: A Family of Bayesian Algorithms to Match Lifelong Learners to Open Educational Resources

arXiv (Cornell University), Nov 21, 2019

The recent advances in computer-assisted learning systems and the availability of open educationa... more The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique challenges, requiring sophisticated recommendation models that account for a wide range of factors such as background knowledge of learners or novelty of the material while effectively maintaining knowledge states of masses of learners for significantly longer periods of time (ideally, a lifetime). This work presents the foundations towards building a dynamic, scalable and transparent recommendation system for education, modelling learner's knowledge from implicit data in the form of engagement with open educational resources. We i) use a text ontology based on Wikipedia to automatically extract knowledge components of educational resources and, ii) propose a set of online Bayesian strategies inspired by the well-known areas of item response theory and knowledge tracing. Our proposal, TrueLearn, focuses on recommendations for which the learner has enough background knowledge (so they are able to understand and learn from the material), and the material has enough novelty that would help the learner improve their knowledge about the subject and keep them engaged. We further construct a large open educational video lectures dataset and test the performance of the proposed algorithms, which show clear promise towards building an effective educational recommendation system.

Research paper thumbnail of PEEK: A Large Dataset of Learner Engagement with Educational Videos

arXiv (Cornell University), Sep 3, 2021

In this work, we release a large and novel dataset of learners engaging with educational videos i... more In this work, we release a large and novel dataset of learners engaging with educational videos in-the-wild. The dataset, named Personalised Educational Engagement with Knowledge Topics (PEEK), is one of the first publicly available datasets that address personalised educational engagement. Educational recommenders have received much less attention in comparison to e-commerce and entertainment-related recommenders, even though efficient personalised learning systems could improve learning gains significantly. One of the main challenges in advancing this research direction is the scarcity of large, publicly available datasets. In the PEEK dataset, educational video lectures have been associated with Wikipedia concepts related to the material of the lecture, thus providing a humanly intuitive taxonomy. We believe that granular learner engagement signals, in unison with rich content representations, will pave the way to building powerful personalisation algorithms that will revolutionise educational and informational recommendation systems. Towards this goal, we 1) construct a novel dataset from a popular video lecture repository, 2) identify a set of benchmark algorithms to model engagement, and 3) run extensive experimentation on the PEEK dataset to demonstrate its value. Our experiments with the dataset show promise in building powerful informational recommender systems. The dataset and the support code is available at https://github.com/sahanbull/PEEK-Dataset.

Research paper thumbnail of VLEngagement: A Dataset of Scientific Video Lectures for Evaluating Population-based Engagement

arXiv (Cornell University), Nov 2, 2020

With the emergence of e-learning and personalised education, the production and distribution of d... more With the emergence of e-learning and personalised education, the production and distribution of digital educational resources have boomed. Video lectures have now become one of the primary modalities to impart knowledge to masses in the current digital age. The rapid creation of video lecture content challenges the currently established human-centred moderation and quality assurance pipeline, demanding for more efficient, scalable and automatic solutions for managing learning resources. Although a few datasets related to engagement with educational videos exist, there is still an important need for data and research aimed at understanding learner engagement with scientific video lectures. This paper introduces VLEngagement, a novel dataset that consists of content-based and video-specific features extracted from publicly available scientific video lectures and several metrics related to user engagement. We introduce several novel tasks related to predicting and understanding context-agnostic engagement in video lectures, providing preliminary baselines. This is the largest and most diverse publicly available dataset to our knowledge that deals with such tasks. The extraction of Wikipedia topic-based features also allows associating more sophisticated Wikipedia based features to the dataset to improve the performance in these tasks. The dataset, helper tools and example code snippets are available publicly at https://github.com/sahanbull/context-agnostic-engagement .

Research paper thumbnail of Predicting Engagement in Video Lectures

arXiv (Cornell University), May 31, 2020

The explosion of Open Educational Resources (OERs) in the recent years creates the demand for sca... more The explosion of Open Educational Resources (OERs) in the recent years creates the demand for scalable, automatic approaches to process and evaluate OERs, with the end goal of identifying and recommending the most suitable educational materials for learners. We focus on building models to find the characteristics and features involved in contextagnostic engagement (i.e. population-based), a seldom researched topic compared to other contextualised and personalised approaches that focus more on individual learner engagement. Learner engagement, is arguably a more reliable measure than popularity/number of views, is more abundant than user ratings and has also been shown to be a crucial component in achieving learning outcomes. In this work, we explore the idea of building a predictive model for population-based engagement in education. We introduce a novel, large dataset of video lectures for predicting context-agnostic engagement and propose both cross-modal and modality specific feature sets to achieve this task. We further test different strategies for quantifying learner engagement signals. We demonstrate the use of our approach in the case of data scarcity. Additionally, we perform a sensitivity analysis of the best performing model, which shows promising performance and can be easily integrated into an educational recommender system for OERs.

Research paper thumbnail of Towards an Integrative Educational Recommender for Lifelong Learners

arXiv (Cornell University), Dec 3, 2019

One of the most ambitious use cases of computer-assisted learning is to build a recommendation sy... more One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. Related Work Conventional recommendation systems that exist today mainly focus on exploiting user interests. On the contrary, educational recommenders face different challenges as a

Research paper thumbnail of Power to the Learner: Towards Human-Intuitive and Integrative Recommendations with Open Educational Resources

Sustainability, Sep 17, 2022

This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

Research paper thumbnail of SUM'20: State-based User Modelling

Research paper thumbnail of Scalable Educational Question Generation with Pre-trained Language Models

arXiv (Cornell University), May 13, 2023

The automatic generation of educational questions will play a key role in scaling online educatio... more The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that EduQG can produce superior educational questions by further pre-training and fine-tuning a pre-trained language model on the scientific text and science question data.

Research paper thumbnail of Pre-Training With Scientific Text Improves Educational Question Generation

arXiv (Cornell University), Dec 7, 2022

With the boom of digital educational materials and scalable e-learning systems, the potential for... more With the boom of digital educational materials and scalable e-learning systems, the potential for realising AI-assisted personalised learning has skyrocketed. In this landscape, the automatic generation of educational questions will play a key role, enabling scalable self-assessment when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our initial experiments demonstrate that EduQG can produce superior educational questions by pre-training on scientific text.

Research paper thumbnail of An AI-based Learning Companion Promoting Lifelong Learning Opportunities for All

arXiv (Cornell University), Nov 16, 2021

Throughout 2020 the IRCAI programme committees authored an Opinion Series on a range of topics ac... more Throughout 2020 the IRCAI programme committees authored an Opinion Series on a range of topics across AI and sustainability as part of a wider portfolio of IRCAIs international initiatives. This opinion series explores what AI currently means to researchers across the world and across a variety of disciplines both in AI and sustainable development, and was led by the IRCAI executive. The views and opinions expressed by authors are their own and do not reflect the position of IRCAI, but are simply an illustration of the various opinions reflective of the diverse initiatives that IRCAI is pursuing, from basic research to policy creation in and around AI. As IRCAI is working from home for the foreseeable future, it means we arenot organising in person events. Introducing this new quick report format has the same spirit-original stripped-down research pieces, an attempt to an intellectual dialogue-just a different space.

Research paper thumbnail of Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiments

arXiv (Cornell University), Jun 22, 2022

This work explores how population-based engagement prediction can address cold-start at scale in ... more This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to predicting and ranking context-agnostic engagement in video lectures with preliminary baselines and iii) a set of experiments that validate the usefulness of the proposed dataset. Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models that are significantly performant than ones based on previous datasets, mainly attributing to the increase of training examples. VLE dataset's suitability in building models towards Computer Science/ Artificial Intelligence education focused on e-learning/ MOOC use-cases is also evidenced. Further experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders. This is the largest and most diverse publicly available dataset to our knowledge that deals with learner engagement prediction tasks. The dataset, helper tools, descriptive statistics and example code snippets are available publicly.

Research paper thumbnail of What's in it for me?

Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, 2020

(This is the accepted manuscript). This paper introduces an interface that enables the user to qu... more (This is the accepted manuscript). This paper introduces an interface that enables the user to quickly identify relevant fragments within multiple long documents. The proposed method relies on a machine-generated layer of annotations that reveals the coverage of topics per fragment and document. To illustrate how the annotations double as a tool for preview as well as navigation, an example application is presented in the form of a personalised learning system that recommends relevant fragments of video lectures according to user's history. Potential implications of this approach for lifelong learning are discussed. We argue that this approach is generally applicable to recommender and information retrieval systems, across multiple knowledge domains and document types.

Research paper thumbnail of Watch Less and Uncover More: Could Navigation Tools Help Users Search and Explore Videos?

ACM SIGIR Conference on Human Information Interaction and Retrieval, 2022

Prior research has shown how 'content preview tools' improve speed and accuracy of user relevance... more Prior research has shown how 'content preview tools' improve speed and accuracy of user relevance judgements across different information retrieval tasks. This paper describes a novel user interface tool, the Content Flow Bar, designed to allow users to quickly identify relevant fragments within informational videos to facilitate browsing, through a cognitively augmented form of navigation. It achieves this by providing semantic "snippets" that enable the user to rapidly scan through video content. The tool provides visuallyappealing pop-ups that appear in a time series bar at the bottom of each video, allowing to see in advance and at a glance how topics evolve in the content. We conducted a user study to evaluate how the tool changes the users search experience in video retrieval, as well as how it supports exploration and information seeking. The user questionnaire revealed that participants found the Content Flow Bar helpful and enjoyable for finding relevant information in videos. The interaction logs of the user study, where participants interacted with the tool for completing two informational tasks, showed that it holds promise for enhancing discoverability of content both across and within videos. This discovered potential could leverage a new generation of navigation tools in search and information retrieval.

Research paper thumbnail of X5Learn: A Personalised Learning Companion at the Intersection of AI and HCI

26th International Conference on Intelligent User Interfaces, 2021

X5Learn (available at https:// x5learn.org) is a human-centered AIpowered platform for supporting... more X5Learn (available at https:// x5learn.org) is a human-centered AIpowered platform for supporting access to free online educational resources. X5Learn provides users with a number of educational tools for interacting with open educational videos, and a set of tools adapted to suit the pedagogical preferences of users. It is intended to support both teachers and students, alike. For teachers, it provides a powerful platform to reuse, revise, remix, and redistribute open courseware produced by others. These can be videos, pdfs, exercises and other online material. For students, it provides a scaffolded and informative interface to select content to watch, read, make notes and write reviews, as well as a powerful personalised recommendation system that can optimise learning paths and adjust to the user's learning preferences. What makes X5Learn stand out from other educational platforms, is how it combines human-centered design with AI algorithms and software tools with the goal of making it intuitive and easy to use, as well as making the AI transparent to the user. We present the core search tool of X5Learn, intended to support exploring open educational materials.

Research paper thumbnail of Report on the WSDM 2020 workshop on state-based user modelling (SUM'20)

ACM SIGIR Forum, 2020

The SUM'20 workshop was held at the 13th ACM International WSDM Conference on Web Search and ... more The SUM'20 workshop was held at the 13th ACM International WSDM Conference on Web Search and Data Mining (WSDM 2020) in Houston, Texas. The purpose of the workshop was to stimulate the research community to explore open challenges in building systems that can capture the user's state, context and goals, as well as effectively use these for leveraging intelligent user-centric systems in a wide range of applications. The workshop incorporated different plenary sessions and contributed talks. The workshop website and proceedings are available at https://www.k4all.org/event/wsdmsum20.

Research paper thumbnail of Could AI Democratise Education? Socio-Technical Imaginaries of an EdTech Revolution

arXiv (Cornell University), Dec 3, 2021

Artificial Intelligence (AI) in Education has been said to have the potential for building more p... more Artificial Intelligence (AI) in Education has been said to have the potential for building more personalised curricula, as well as democratising education worldwide and creating a Renaissance of new ways of teaching and learning. Millions of students are already starting to benefit from the use of these technologies, but millions more around the world are not. If this trend continues, the first delivery of AI in Education could be greater educational inequality, along with a global misallocation of educational resources motivated by the current technological determinism narrative. In this paper, we focus on speculating and posing questions around the future of AI in Education, with the aim of starting the pressing conversation that would set the right foundations for the new generation of education that is permeated by technology. This paper starts by synthesising how AI might change how we learn and teach, focusing specifically on the case of personalised learning companions, and then move to discuss some socio-technical features that will be crucial for avoiding the perils of these AI systems worldwide (and perhaps ensuring their success). This paper also discusses the potential of using AI together with free, participatory and democratic resources, such as Wikipedia, Open Educational Resources and open-source tools. We also emphasise the need for collectively designing human-centered, transparent, interactive and collaborative AI-based algorithms that empower and give complete agency to stakeholders, as well as support new emerging pedagogies. Finally, we ask what would it take for this educational revolution to provide egalitarian and empowering access to education, beyond any political, cultural, language, geographical and learning ability barriers.

Research paper thumbnail of Leveraging Semantic Knowledge Graphs in Educational Recommenders to Address the Cold-Start Problem

CRC Press eBooks, Jun 28, 2023

Research paper thumbnail of Towards Proactive Information Retrieval in Noisy Text with Wikipedia Concepts

arXiv (Cornell University), Oct 18, 2022

Extracting useful information from the user history to clearly understand informational needs is ... more Extracting useful information from the user history to clearly understand informational needs is a crucial feature of a proactive information retrieval system. Regarding understanding information and relevance, Wikipedia can provide the background knowledge that an intelligent system needs. This work explores how exploiting the context of a query using Wikipedia concepts can improve proactive information retrieval on noisy text. We formulate two models that use entity linking to associate Wikipedia topics with the relevance model. Our experiments around a podcast segment retrieval task demonstrate that there is a clear signal of relevance in Wikipedia concepts while a ranking model can improve precision by incorporating them. We also find Wikifying the background context of a query can help disambiguate the meaning of the query, further helping proactive information retrieval.

Research paper thumbnail of Semantic TrueLearn: Using Semantic Knowledge Graphs in Recommendation Systems

arXiv (Cornell University), Dec 8, 2021

In informational recommenders, many challenges arise from the need to handle the semantic and hie... more In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph, with the aim to better predict learner engagement and latent knowledge in a lifelong learning scenario. In this sense, Semantic TrueLearn builds a humanly intuitive knowledge representation while leveraging Bayesian machine learning to improve the predictive performance of educational engagement. Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance with a simple extension that adds semantic awareness to the model.

Research paper thumbnail of Towards an Integrative Educational Recommender for Lifelong Learners (Student Abstract)

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

One of the most ambitious use cases of computer-assisted learning is to build a recommendation sy... more One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. Related Work Contrary to conventional recommender systems, educational recommenders face different challenges, that stem from attempting to bring learners closer to their goals effectively.

Research paper thumbnail of TrueLearn: A Family of Bayesian Algorithms to Match Lifelong Learners to Open Educational Resources

arXiv (Cornell University), Nov 21, 2019

The recent advances in computer-assisted learning systems and the availability of open educationa... more The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique challenges, requiring sophisticated recommendation models that account for a wide range of factors such as background knowledge of learners or novelty of the material while effectively maintaining knowledge states of masses of learners for significantly longer periods of time (ideally, a lifetime). This work presents the foundations towards building a dynamic, scalable and transparent recommendation system for education, modelling learner's knowledge from implicit data in the form of engagement with open educational resources. We i) use a text ontology based on Wikipedia to automatically extract knowledge components of educational resources and, ii) propose a set of online Bayesian strategies inspired by the well-known areas of item response theory and knowledge tracing. Our proposal, TrueLearn, focuses on recommendations for which the learner has enough background knowledge (so they are able to understand and learn from the material), and the material has enough novelty that would help the learner improve their knowledge about the subject and keep them engaged. We further construct a large open educational video lectures dataset and test the performance of the proposed algorithms, which show clear promise towards building an effective educational recommendation system.

Research paper thumbnail of PEEK: A Large Dataset of Learner Engagement with Educational Videos

arXiv (Cornell University), Sep 3, 2021

In this work, we release a large and novel dataset of learners engaging with educational videos i... more In this work, we release a large and novel dataset of learners engaging with educational videos in-the-wild. The dataset, named Personalised Educational Engagement with Knowledge Topics (PEEK), is one of the first publicly available datasets that address personalised educational engagement. Educational recommenders have received much less attention in comparison to e-commerce and entertainment-related recommenders, even though efficient personalised learning systems could improve learning gains significantly. One of the main challenges in advancing this research direction is the scarcity of large, publicly available datasets. In the PEEK dataset, educational video lectures have been associated with Wikipedia concepts related to the material of the lecture, thus providing a humanly intuitive taxonomy. We believe that granular learner engagement signals, in unison with rich content representations, will pave the way to building powerful personalisation algorithms that will revolutionise educational and informational recommendation systems. Towards this goal, we 1) construct a novel dataset from a popular video lecture repository, 2) identify a set of benchmark algorithms to model engagement, and 3) run extensive experimentation on the PEEK dataset to demonstrate its value. Our experiments with the dataset show promise in building powerful informational recommender systems. The dataset and the support code is available at https://github.com/sahanbull/PEEK-Dataset.

Research paper thumbnail of VLEngagement: A Dataset of Scientific Video Lectures for Evaluating Population-based Engagement

arXiv (Cornell University), Nov 2, 2020

With the emergence of e-learning and personalised education, the production and distribution of d... more With the emergence of e-learning and personalised education, the production and distribution of digital educational resources have boomed. Video lectures have now become one of the primary modalities to impart knowledge to masses in the current digital age. The rapid creation of video lecture content challenges the currently established human-centred moderation and quality assurance pipeline, demanding for more efficient, scalable and automatic solutions for managing learning resources. Although a few datasets related to engagement with educational videos exist, there is still an important need for data and research aimed at understanding learner engagement with scientific video lectures. This paper introduces VLEngagement, a novel dataset that consists of content-based and video-specific features extracted from publicly available scientific video lectures and several metrics related to user engagement. We introduce several novel tasks related to predicting and understanding context-agnostic engagement in video lectures, providing preliminary baselines. This is the largest and most diverse publicly available dataset to our knowledge that deals with such tasks. The extraction of Wikipedia topic-based features also allows associating more sophisticated Wikipedia based features to the dataset to improve the performance in these tasks. The dataset, helper tools and example code snippets are available publicly at https://github.com/sahanbull/context-agnostic-engagement .

Research paper thumbnail of Predicting Engagement in Video Lectures

arXiv (Cornell University), May 31, 2020

The explosion of Open Educational Resources (OERs) in the recent years creates the demand for sca... more The explosion of Open Educational Resources (OERs) in the recent years creates the demand for scalable, automatic approaches to process and evaluate OERs, with the end goal of identifying and recommending the most suitable educational materials for learners. We focus on building models to find the characteristics and features involved in contextagnostic engagement (i.e. population-based), a seldom researched topic compared to other contextualised and personalised approaches that focus more on individual learner engagement. Learner engagement, is arguably a more reliable measure than popularity/number of views, is more abundant than user ratings and has also been shown to be a crucial component in achieving learning outcomes. In this work, we explore the idea of building a predictive model for population-based engagement in education. We introduce a novel, large dataset of video lectures for predicting context-agnostic engagement and propose both cross-modal and modality specific feature sets to achieve this task. We further test different strategies for quantifying learner engagement signals. We demonstrate the use of our approach in the case of data scarcity. Additionally, we perform a sensitivity analysis of the best performing model, which shows promising performance and can be easily integrated into an educational recommender system for OERs.

Research paper thumbnail of Towards an Integrative Educational Recommender for Lifelong Learners

arXiv (Cornell University), Dec 3, 2019

One of the most ambitious use cases of computer-assisted learning is to build a recommendation sy... more One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. Related Work Conventional recommendation systems that exist today mainly focus on exploiting user interests. On the contrary, educational recommenders face different challenges as a

Research paper thumbnail of Power to the Learner: Towards Human-Intuitive and Integrative Recommendations with Open Educational Resources

Sustainability, Sep 17, 2022

This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

Research paper thumbnail of SUM'20: State-based User Modelling

Research paper thumbnail of Scalable Educational Question Generation with Pre-trained Language Models

arXiv (Cornell University), May 13, 2023

The automatic generation of educational questions will play a key role in scaling online educatio... more The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that EduQG can produce superior educational questions by further pre-training and fine-tuning a pre-trained language model on the scientific text and science question data.

Research paper thumbnail of Pre-Training With Scientific Text Improves Educational Question Generation

arXiv (Cornell University), Dec 7, 2022

With the boom of digital educational materials and scalable e-learning systems, the potential for... more With the boom of digital educational materials and scalable e-learning systems, the potential for realising AI-assisted personalised learning has skyrocketed. In this landscape, the automatic generation of educational questions will play a key role, enabling scalable self-assessment when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our initial experiments demonstrate that EduQG can produce superior educational questions by pre-training on scientific text.

Research paper thumbnail of An AI-based Learning Companion Promoting Lifelong Learning Opportunities for All

arXiv (Cornell University), Nov 16, 2021

Throughout 2020 the IRCAI programme committees authored an Opinion Series on a range of topics ac... more Throughout 2020 the IRCAI programme committees authored an Opinion Series on a range of topics across AI and sustainability as part of a wider portfolio of IRCAIs international initiatives. This opinion series explores what AI currently means to researchers across the world and across a variety of disciplines both in AI and sustainable development, and was led by the IRCAI executive. The views and opinions expressed by authors are their own and do not reflect the position of IRCAI, but are simply an illustration of the various opinions reflective of the diverse initiatives that IRCAI is pursuing, from basic research to policy creation in and around AI. As IRCAI is working from home for the foreseeable future, it means we arenot organising in person events. Introducing this new quick report format has the same spirit-original stripped-down research pieces, an attempt to an intellectual dialogue-just a different space.

Research paper thumbnail of Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiments

arXiv (Cornell University), Jun 22, 2022

This work explores how population-based engagement prediction can address cold-start at scale in ... more This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to predicting and ranking context-agnostic engagement in video lectures with preliminary baselines and iii) a set of experiments that validate the usefulness of the proposed dataset. Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models that are significantly performant than ones based on previous datasets, mainly attributing to the increase of training examples. VLE dataset's suitability in building models towards Computer Science/ Artificial Intelligence education focused on e-learning/ MOOC use-cases is also evidenced. Further experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders. This is the largest and most diverse publicly available dataset to our knowledge that deals with learner engagement prediction tasks. The dataset, helper tools, descriptive statistics and example code snippets are available publicly.

Research paper thumbnail of What's in it for me?

Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, 2020

(This is the accepted manuscript). This paper introduces an interface that enables the user to qu... more (This is the accepted manuscript). This paper introduces an interface that enables the user to quickly identify relevant fragments within multiple long documents. The proposed method relies on a machine-generated layer of annotations that reveals the coverage of topics per fragment and document. To illustrate how the annotations double as a tool for preview as well as navigation, an example application is presented in the form of a personalised learning system that recommends relevant fragments of video lectures according to user's history. Potential implications of this approach for lifelong learning are discussed. We argue that this approach is generally applicable to recommender and information retrieval systems, across multiple knowledge domains and document types.

Research paper thumbnail of Watch Less and Uncover More: Could Navigation Tools Help Users Search and Explore Videos?

ACM SIGIR Conference on Human Information Interaction and Retrieval, 2022

Prior research has shown how 'content preview tools' improve speed and accuracy of user relevance... more Prior research has shown how 'content preview tools' improve speed and accuracy of user relevance judgements across different information retrieval tasks. This paper describes a novel user interface tool, the Content Flow Bar, designed to allow users to quickly identify relevant fragments within informational videos to facilitate browsing, through a cognitively augmented form of navigation. It achieves this by providing semantic "snippets" that enable the user to rapidly scan through video content. The tool provides visuallyappealing pop-ups that appear in a time series bar at the bottom of each video, allowing to see in advance and at a glance how topics evolve in the content. We conducted a user study to evaluate how the tool changes the users search experience in video retrieval, as well as how it supports exploration and information seeking. The user questionnaire revealed that participants found the Content Flow Bar helpful and enjoyable for finding relevant information in videos. The interaction logs of the user study, where participants interacted with the tool for completing two informational tasks, showed that it holds promise for enhancing discoverability of content both across and within videos. This discovered potential could leverage a new generation of navigation tools in search and information retrieval.

Research paper thumbnail of X5Learn: A Personalised Learning Companion at the Intersection of AI and HCI

26th International Conference on Intelligent User Interfaces, 2021

X5Learn (available at https:// x5learn.org) is a human-centered AIpowered platform for supporting... more X5Learn (available at https:// x5learn.org) is a human-centered AIpowered platform for supporting access to free online educational resources. X5Learn provides users with a number of educational tools for interacting with open educational videos, and a set of tools adapted to suit the pedagogical preferences of users. It is intended to support both teachers and students, alike. For teachers, it provides a powerful platform to reuse, revise, remix, and redistribute open courseware produced by others. These can be videos, pdfs, exercises and other online material. For students, it provides a scaffolded and informative interface to select content to watch, read, make notes and write reviews, as well as a powerful personalised recommendation system that can optimise learning paths and adjust to the user's learning preferences. What makes X5Learn stand out from other educational platforms, is how it combines human-centered design with AI algorithms and software tools with the goal of making it intuitive and easy to use, as well as making the AI transparent to the user. We present the core search tool of X5Learn, intended to support exploring open educational materials.

Research paper thumbnail of Report on the WSDM 2020 workshop on state-based user modelling (SUM'20)

ACM SIGIR Forum, 2020

The SUM'20 workshop was held at the 13th ACM International WSDM Conference on Web Search and ... more The SUM'20 workshop was held at the 13th ACM International WSDM Conference on Web Search and Data Mining (WSDM 2020) in Houston, Texas. The purpose of the workshop was to stimulate the research community to explore open challenges in building systems that can capture the user's state, context and goals, as well as effectively use these for leveraging intelligent user-centric systems in a wide range of applications. The workshop incorporated different plenary sessions and contributed talks. The workshop website and proceedings are available at https://www.k4all.org/event/wsdmsum20.