Sahan Bulla Bulathwela - Profile on Academia.edu (original) (raw)

Papers by Sahan Bulla Bulathwela

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

The recent advances in computer-assisted learning systems and the availability of open educationa... more The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique challenges, requiring sophisticated recommendation models that account for a wide range of factors such as background knowledge of learners or novelty of the material while effectively maintaining knowledge states of masses of learners for significantly longer periods of time (ideally, a lifetime). This work presents the foundations towards building a dynamic, scalable and transparent recommendation system for education, modelling learner's knowledge from implicit data in the form of engagement with open educational resources. We i) use a text ontology based on Wikipedia to automatically extract knowledge components of educational resources and, ii) propose a set of online Bayesian strategies inspired by the well-known areas of item response theory and knowledge tracing. Our proposal, TrueLearn, focuses on recommendations for which the learner has enough background knowledge (so they are able to understand and learn from the material), and the material has enough novelty that would help the learner improve their knowledge about the subject and keep them engaged. We further construct a large open educational video lectures dataset and test the performance of the proposed algorithms, which show clear promise towards building an effective educational recommendation system.

Scalable Educational Question Generation with Pre-trained Language Models

Lecture Notes in Computer Science, 2023

arXiv (Cornell University), Dec 3, 2021

Artificial Intelligence (AI) in Education has been said to have the potential for building more p... more Artificial Intelligence (AI) in Education has been said to have the potential for building more personalised curricula, as well as democratising education worldwide and creating a Renaissance of new ways of teaching and learning. Millions of students are already starting to benefit from the use of these technologies, but millions more around the world are not. If this trend continues, the first delivery of AI in Education could be greater educational inequality, along with a global misallocation of educational resources motivated by the current technological determinism narrative. In this paper, we focus on speculating and posing questions around the future of AI in Education, with the aim of starting the pressing conversation that would set the right foundations for the new generation of education that is permeated by technology. This paper starts by synthesising how AI might change how we learn and teach, focusing specifically on the case of personalised learning companions, and then move to discuss some socio-technical features that will be crucial for avoiding the perils of these AI systems worldwide (and perhaps ensuring their success). This paper also discusses the potential of using AI together with free, participatory and democratic resources, such as Wikipedia, Open Educational Resources and open-source tools. We also emphasise the need for collectively designing human-centered, transparent, interactive and collaborative AI-based algorithms that empower and give complete agency to stakeholders, as well as support new emerging pedagogies. Finally, we ask what would it take for this educational revolution to provide egalitarian and empowering access to education, beyond any political, cultural, language, geographical and learning ability barriers.

Leveraging Semantic Knowledge Graphs in Educational Recommenders to Address the Cold-Start Problem

CRC Press eBooks, Jun 28, 2023

arXiv (Cornell University), Oct 18, 2022

Extracting useful information from the user history to clearly understand informational needs is ... more Extracting useful information from the user history to clearly understand informational needs is a crucial feature of a proactive information retrieval system. Regarding understanding information and relevance, Wikipedia can provide the background knowledge that an intelligent system needs. This work explores how exploiting the context of a query using Wikipedia concepts can improve proactive information retrieval on noisy text. We formulate two models that use entity linking to associate Wikipedia topics with the relevance model. Our experiments around a podcast segment retrieval task demonstrate that there is a clear signal of relevance in Wikipedia concepts while a ranking model can improve precision by incorporating them. We also find Wikifying the background context of a query can help disambiguate the meaning of the query, further helping proactive information retrieval.

arXiv (Cornell University), Dec 8, 2021

In informational recommenders, many challenges arise from the need to handle the semantic and hie... more In informational recommenders, many challenges arise from the need to handle the semantic and hierarchical structure between knowledge areas. This work aims to advance towards building a state-aware educational recommendation system that incorporates semantic relatedness between knowledge topics, propagating latent information across semantically related topics. We introduce a novel learner model that exploits this semantic relatedness between knowledge components in learning resources using the Wikipedia link graph, with the aim to better predict learner engagement and latent knowledge in a lifelong learning scenario. In this sense, Semantic TrueLearn builds a humanly intuitive knowledge representation while leveraging Bayesian machine learning to improve the predictive performance of educational engagement. Our experiments with a large dataset demonstrate that this new semantic version of TrueLearn algorithm achieves statistically significant improvements in terms of predictive performance with a simple extension that adds semantic awareness to the model.

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

One of the most ambitious use cases of computer-assisted learning is to build a recommendation sy... more One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. Related Work Contrary to conventional recommender systems, educational recommenders face different challenges, that stem from attempting to bring learners closer to their goals effectively.

arXiv (Cornell University), Nov 21, 2019

arXiv (Cornell University), Sep 3, 2021

In this work, we release a large and novel dataset of learners engaging with educational videos i... more In this work, we release a large and novel dataset of learners engaging with educational videos in-the-wild. The dataset, named Personalised Educational Engagement with Knowledge Topics (PEEK), is one of the first publicly available datasets that address personalised educational engagement. Educational recommenders have received much less attention in comparison to e-commerce and entertainment-related recommenders, even though efficient personalised learning systems could improve learning gains significantly. One of the main challenges in advancing this research direction is the scarcity of large, publicly available datasets. In the PEEK dataset, educational video lectures have been associated with Wikipedia concepts related to the material of the lecture, thus providing a humanly intuitive taxonomy. We believe that granular learner engagement signals, in unison with rich content representations, will pave the way to building powerful personalisation algorithms that will revolutionise educational and informational recommendation systems. Towards this goal, we 1) construct a novel dataset from a popular video lecture repository, 2) identify a set of benchmark algorithms to model engagement, and 3) run extensive experimentation on the PEEK dataset to demonstrate its value. Our experiments with the dataset show promise in building powerful informational recommender systems. The dataset and the support code is available at https://github.com/sahanbull/PEEK-Dataset.

arXiv (Cornell University), Nov 2, 2020

With the emergence of e-learning and personalised education, the production and distribution of d... more With the emergence of e-learning and personalised education, the production and distribution of digital educational resources have boomed. Video lectures have now become one of the primary modalities to impart knowledge to masses in the current digital age. The rapid creation of video lecture content challenges the currently established human-centred moderation and quality assurance pipeline, demanding for more efficient, scalable and automatic solutions for managing learning resources. Although a few datasets related to engagement with educational videos exist, there is still an important need for data and research aimed at understanding learner engagement with scientific video lectures. This paper introduces VLEngagement, a novel dataset that consists of content-based and video-specific features extracted from publicly available scientific video lectures and several metrics related to user engagement. We introduce several novel tasks related to predicting and understanding context-agnostic engagement in video lectures, providing preliminary baselines. This is the largest and most diverse publicly available dataset to our knowledge that deals with such tasks. The extraction of Wikipedia topic-based features also allows associating more sophisticated Wikipedia based features to the dataset to improve the performance in these tasks. The dataset, helper tools and example code snippets are available publicly at https://github.com/sahanbull/context-agnostic-engagement .

arXiv (Cornell University), May 31, 2020

The explosion of Open Educational Resources (OERs) in the recent years creates the demand for sca... more The explosion of Open Educational Resources (OERs) in the recent years creates the demand for scalable, automatic approaches to process and evaluate OERs, with the end goal of identifying and recommending the most suitable educational materials for learners. We focus on building models to find the characteristics and features involved in contextagnostic engagement (i.e. population-based), a seldom researched topic compared to other contextualised and personalised approaches that focus more on individual learner engagement. Learner engagement, is arguably a more reliable measure than popularity/number of views, is more abundant than user ratings and has also been shown to be a crucial component in achieving learning outcomes. In this work, we explore the idea of building a predictive model for population-based engagement in education. We introduce a novel, large dataset of video lectures for predicting context-agnostic engagement and propose both cross-modal and modality specific feature sets to achieve this task. We further test different strategies for quantifying learner engagement signals. We demonstrate the use of our approach in the case of data scarcity. Additionally, we perform a sensitivity analysis of the best performing model, which shows promising performance and can be easily integrated into an educational recommender system for OERs.

arXiv (Cornell University), Dec 3, 2019

One of the most ambitious use cases of computer-assisted learning is to build a recommendation sy... more One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. Related Work Conventional recommendation systems that exist today mainly focus on exploiting user interests. On the contrary, educational recommenders face different challenges as a

Sustainability, Sep 17, 2022

This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

arXiv (Cornell University), May 13, 2023

The automatic generation of educational questions will play a key role in scaling online educatio... more The automatic generation of educational questions will play a key role in scaling online education, enabling self-assessment at scale when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our extensive experiments demonstrate that EduQG can produce superior educational questions by further pre-training and fine-tuning a pre-trained language model on the scientific text and science question data.

arXiv (Cornell University), Dec 7, 2022

With the boom of digital educational materials and scalable e-learning systems, the potential for... more With the boom of digital educational materials and scalable e-learning systems, the potential for realising AI-assisted personalised learning has skyrocketed. In this landscape, the automatic generation of educational questions will play a key role, enabling scalable self-assessment when a global population is manoeuvring their personalised learning journeys. We develop EduQG, a novel educational question generation model built by adapting a large language model. Our initial experiments demonstrate that EduQG can produce superior educational questions by pre-training on scientific text.

arXiv (Cornell University), Nov 16, 2021

Throughout 2020 the IRCAI programme committees authored an Opinion Series on a range of topics ac... more Throughout 2020 the IRCAI programme committees authored an Opinion Series on a range of topics across AI and sustainability as part of a wider portfolio of IRCAIs international initiatives. This opinion series explores what AI currently means to researchers across the world and across a variety of disciplines both in AI and sustainable development, and was led by the IRCAI executive. The views and opinions expressed by authors are their own and do not reflect the position of IRCAI, but are simply an illustration of the various opinions reflective of the diverse initiatives that IRCAI is pursuing, from basic research to policy creation in and around AI. As IRCAI is working from home for the foreseeable future, it means we arenot organising in person events. Introducing this new quick report format has the same spirit-original stripped-down research pieces, an attempt to an intellectual dialogue-just a different space.

arXiv (Cornell University), Jun 22, 2022

This work explores how population-based engagement prediction can address cold-start at scale in ... more This work explores how population-based engagement prediction can address cold-start at scale in large learning resource collections. The paper introduces i) VLE, a novel dataset that consists of content and video based features extracted from publicly available scientific video lectures coupled with implicit and explicit signals related to learner engagement, ii) two standard tasks related to predicting and ranking context-agnostic engagement in video lectures with preliminary baselines and iii) a set of experiments that validate the usefulness of the proposed dataset. Our experimental results indicate that the newly proposed VLE dataset leads to building context-agnostic engagement prediction models that are significantly performant than ones based on previous datasets, mainly attributing to the increase of training examples. VLE dataset's suitability in building models towards Computer Science/ Artificial Intelligence education focused on e-learning/ MOOC use-cases is also evidenced. Further experiments in combining the built model with a personalising algorithm show promising improvements in addressing the cold-start problem encountered in educational recommenders. This is the largest and most diverse publicly available dataset to our knowledge that deals with learner engagement prediction tasks. The dataset, helper tools, descriptive statistics and example code snippets are available publicly.

Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, 2020

(This is the accepted manuscript). This paper introduces an interface that enables the user to qu... more (This is the accepted manuscript). This paper introduces an interface that enables the user to quickly identify relevant fragments within multiple long documents. The proposed method relies on a machine-generated layer of annotations that reveals the coverage of topics per fragment and document. To illustrate how the annotations double as a tool for preview as well as navigation, an example application is presented in the form of a personalised learning system that recommends relevant fragments of video lectures according to user's history. Potential implications of this approach for lifelong learning are discussed. We argue that this approach is generally applicable to recommender and information retrieval systems, across multiple knowledge domains and document types.

ACM SIGIR Conference on Human Information Interaction and Retrieval, 2022

Prior research has shown how 'content preview tools' improve speed and accuracy of user relevance... more Prior research has shown how 'content preview tools' improve speed and accuracy of user relevance judgements across different information retrieval tasks. This paper describes a novel user interface tool, the Content Flow Bar, designed to allow users to quickly identify relevant fragments within informational videos to facilitate browsing, through a cognitively augmented form of navigation. It achieves this by providing semantic "snippets" that enable the user to rapidly scan through video content. The tool provides visuallyappealing pop-ups that appear in a time series bar at the bottom of each video, allowing to see in advance and at a glance how topics evolve in the content. We conducted a user study to evaluate how the tool changes the users search experience in video retrieval, as well as how it supports exploration and information seeking. The user questionnaire revealed that participants found the Content Flow Bar helpful and enjoyable for finding relevant information in videos. The interaction logs of the user study, where participants interacted with the tool for completing two informational tasks, showed that it holds promise for enhancing discoverability of content both across and within videos. This discovered potential could leverage a new generation of navigation tools in search and information retrieval.

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

Scalable Educational Question Generation with Pre-trained Language Models

Lecture Notes in Computer Science, 2023

arXiv (Cornell University), Dec 3, 2021

Leveraging Semantic Knowledge Graphs in Educational Recommenders to Address the Cold-Start Problem

CRC Press eBooks, Jun 28, 2023

arXiv (Cornell University), Oct 18, 2022

arXiv (Cornell University), Dec 8, 2021

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

arXiv (Cornell University), Nov 21, 2019

arXiv (Cornell University), Sep 3, 2021

arXiv (Cornell University), Nov 2, 2020

arXiv (Cornell University), May 31, 2020

arXiv (Cornell University), Dec 3, 2019

One of the most ambitious use cases of computer-assisted learning is to build a recommendation sy... more One of the most ambitious use cases of computer-assisted learning is to build a recommendation system for lifelong learning. Most recommender algorithms exploit similarities between content and users, overseeing the necessity to leverage sensible learning trajectories for the learner. Lifelong learning thus presents unique challenges, requiring scalable and transparent models that can account for learner knowledge and content novelty simultaneously, while also retaining accurate learners representations for long periods of time. We attempt to build a novel educational recommender, that relies on an integrative approach combining multiple drivers of learners engagement. Our first step towards this goal is TrueLearn, which models content novelty and background knowledge of learners and achieves promising performance while retaining a human interpretable learner model. Related Work Conventional recommendation systems that exist today mainly focus on exploiting user interests. On the contrary, educational recommenders face different challenges as a

Sustainability, Sep 17, 2022

arXiv (Cornell University), May 13, 2023

arXiv (Cornell University), Dec 7, 2022

arXiv (Cornell University), Nov 16, 2021

arXiv (Cornell University), Jun 22, 2022

Proceedings of the 25th International Conference on Intelligent User Interfaces Companion, 2020

ACM SIGIR Conference on Human Information Interaction and Retrieval, 2022