Michel Desmarais - Academia.edu (original) (raw)

Papers by Michel Desmarais

arXiv (Cornell University), Feb 10, 2023

Context: Teachers and students are increasingly relying on online learning resources to supplemen... more Context: Teachers and students are increasingly relying on online learning resources to supplement the ones provided in school. This increase in the breadth and depth of available resources is a great thing for students, but only provided they are able to find answers to their queries. Question-answering and information retrieval systems alleviate the task of finding the relevant resources and public datasets of labeled data are essential to train and evaluate their algorithms, but most of these datasets are English text written by and for adults. Objectives: We introduce a new public French questionanswering dataset collected from Alloprof, a Quebec-based primary and high-school help website, containing 29 349 questions and their explanations in a variety of school subjects from 10 368 students, with more than half of the explanations containing links to other questions or some of the 2 596 reference pages on the website. We also present a case study using this dataset for an information retrieval task. Method: This dataset was collected on the Alloprof public forum, with all questions verified for their appropriateness and the explanations verified both for their appropriateness and their relevance to the question. To predict relevant documents (questions or reference pages), architectures using pre-trained BERT models were fine-tuned and evaluated on their ability to predict at least one document referred to in the explanation from their top-3 predictions for a question, as well as on the MRR and nDCG metrics. Results: The best model obtains a prediction score of 58.5% (MRR: 0.54, nDCG: 0.62). These results are substantially better than a TF-IDF vector space model (prediction score: 0.24, MRR: 0.20, nDCG: 0.32), but their computational cost is also greater. The fastest model obtains a score of 51.2% (MRR:0.467, nDCG: 0.560) with an inference time of 0.7s, which is still considered acceptable in the context of Alloprof. Conclusions: This dataset will allow researchers to develop question-answering, information retrieval and other algorithms specifically for the French speaking education context. Furthermore, the range of language proficiency, images, mathematical symbols and spelling mistakes will necessitate algorithms based on a multimodal comprehension. The case study we present as a baseline shows an approach that relies on recent techniques provides an acceptable performance level, but more work is necessary before it can reliably be used and trusted in a production setting.

arXiv (Cornell University), Jul 11, 2022

Accurate assessment of the domain expertise of developers is important for assigning the proper c... more Accurate assessment of the domain expertise of developers is important for assigning the proper candidate to contribute to a project, or to attend a job role. Since the potential candidate can come from a large pool, the automated assessment of this domain expertise is a desirable goal. While previous methods have had some success within a single software project, the assessment of a developer's domain expertise from contributions across multiple projects is more challenging. In this paper, we employ doc2vec to represent the domain expertise of developers as embedding vectors. These vectors are derived from different sources that contain evidence of developers' expertise, such as the description of repositories that they contributed, their issue resolving history, and API calls in their commits. We name it dev2vec and demonstrate its effectiveness in representing the technical specialization of developers. Our results indicate that encoding the expertise of developers in an embedding vector outperforms state-of-the-art methods and improves the F1score up to 21%. Moreover, our findings suggest that "issue resolving history" of developers is the most informative source of information to represent the domain expertise of developers in embedding spaces.

Adaptive and Adaptable Learning, 2016

There are numerous algorithms and tools to help an expert map exercises and tasks to underlying s... more There are numerous algorithms and tools to help an expert map exercises and tasks to underlying skills. The last decade has witnessed a wealth of data driven approaches aiming to refine expert-defined mappings of tasks to skill. This refinement can be seen as a classification problem: for each possible mapping of task to skill, the classifier has to decide whether the expert’s advice is correct, or incorrect. Whereas most algorithms are working at the level of individual mappings, we introduce an approach based on a multi-label classification algorithm that is trained on the mapping of a task to all skills simultaneously. The approach is shown to outperform the existing task to skill mapping refinement techniques.

The JEDM editor and associate editors express their sincere gratitude and thank the editorial boa... more The JEDM editor and associate editors express their sincere gratitude and thank the editorial board and colleagues who devoted their time and effort to reviewing for JEDM in 2013.

Submissions were assigned to 1 TC member and received at least 3 reviews. After the initial revie... more Submissions were assigned to 1 TC member and received at least 3 reviews. After the initial reviews were submitted, the designated TC facilitated discussion amongst reviewers in order to resolve differences and correct misunderstandings. The TC then provided a recommendation to the Program Chairs. The final decisions were based on these recommendations, the meta-reviews, and reviewer scores. A total of 131 submissions were reviewed. Out of 80 regular paper submissions, 29 were accepted (36% acceptance rate); out of 51 short paper submissions, 11 were accepted (22% acceptance rate). This year, we did not invite regular papers to be published as short papers, but instead invited them to either be included in the main proceedings as extended abstracts, or be published in the adjunct proceedings Late Breaking Results track (LBR). Six of them were published in the LBR track, and a total of 27 extended abstracts are published in the main proceedings. The program also features 3 demos, 3 theory, opinion and reflection papers and 14 late breaking results papers presented in UMAP poster session, which collectively showcase the wide spectrum of novel ideas and latest results in user modeling, adaptation and personalization. We also invited three distinguished keynote speakers, each illustrating significant issues and prospective directions for the field. Pearl Pu, School of Computer and Communication Sciences at EPFL, describes in her talk the various challenges related to understanding, detecting, and visualizing emotions in large text datasets. Jennifer Golbeck, University of Maryland, focuses on how to consider issues of privacy and consent when users cannot explicitly state their preferences, The Creepy Factor, and how to balance users concerns with the benefits personalized technology can offer. Paul De Bra, Eindhoven University of Technology, discusses in his talk "After twenty-five years of user modeling and adaptation what makes us UMAP?" how the field evolved, insights into where the field is headed, and the hottest topics for exploration. The conference includes a doctoral consortium that provides an opportunity for doctoral students to explore and develop their research interests under the guidance of distinguished scholars. This track received 15 submissions, of which seven were accepted as full papers and six as posters. A set of 8 workshops rounded off the program: EdRecSys: Educational Recommender Systems organized by Kurt Driessens (University of Maastricht, The Netherlands), Irena Koprinska (University of Sydney, Australia), Olga C. Santos (Spanish National University for Distance Education, Spain), Evgueni Smirnov (University of Maastricht, The Netherlands), Kalina Yacef (University of Sydney, Australia), Osmar Zaiane (University of Alberta, Canada) EvalUMAP: Towards Comparative Evaluation in User Modeling, Adaptation and Personalization organized by Owen Conlan, Liadh Kelly, Kevin Koidl, Seamus Lawless, Athanasios Staikopoulos (Trinity College Dublin, Ireland) HAAPIE: Human Aspects in Adaptive and Personalized Interactive Environments organized by Panagiotis Germanakos (SAP SE, Germany), Styliani Kleanthous-Loizou (University of Cyprus, Cyprus), George Samaras (Department of Computer Science, University of Cyprus), Vania Dimitrova (University of Leeds, UK), Ben Steichen (Santa Clara University, USA) PALE: Personalization Approaches in Learning Environments organized by Milos Kravcik (RWTH Aachen University, Germany), Olga C. Santos (UNED,Spain), Jesus G. Boticario (UNED, Spain), Maria Bielikova (FIIT STUBA,Slovakia), Tomas Horvath (Eotvos Lorand University, Budapest, Hungary) PATCH: Personalized Access to Cultural Heritage organized by Liliana Ardissono (University of Torino, Italy), Cristina Gena (University of Torino, Italy), Tsvi Kuflik, (University of Haifa, Israel) SOAP: Surprise, Opposition, and Obstruction in Adaptive and Personalized Systems organized by Peter Knees (Johannes Kepler University Linz, Austria), Kristina Andersen (Studio for Electro Instrumental Music, Amsterdam, the Netherlands), Alan Said (Recorded Future, Gothenburg, Sweden), and Marko Tkalcic (Free University of Bozen-Bolzano, Italy) THUM: Temporal and Holistic User Modeling organized by Cataldo Musto (University of Bari Aldo Moro, Italy), Amon Rapp (University of Torino, Italy), Federica Cena (University of Torino, Italy), Frank Hopfgartner (University of Glasgow), Judy Kay (University of Sidney, Australia), Giovanni Semeraro (University of Bari Aldo Moro, Italy) Veronika Bogina (University of Haifa, Israel), David Konopnicki (IBM Research, Haifa, Israel), Tsvi Kuflik (University of Haifa, Israel), Bamshad Mobasher (DePaul University, Chicago, USA) WPPG: Fifty Shades of Personalization, Workshop on Personalization in Serious and Persuasive Games and Gameful Interactions organized by Elke Mattheiss (Austrian Institute of Technology), Marc Busch (Austrian Institute of Technology), Rita Orji (University of Waterloo,…

Online Peer Instruction has become prevalent in many“flipped classroom” settings, yet little work... more Online Peer Instruction has become prevalent in many“flipped classroom” settings, yet little work has been done to examine the content students generate in such a learning environment. This study characterizes a dataset generated by an open-source, web-based homework system that prompts students to first answer questions, and then provide explanations of their reasoning. Of particular interest in this dataset, is that students are also prompted to evaluate a subset of peer explanations based on how convincing they are, as part of the Peer Instruction learning script. Since these student“votes”are then used in the selection of what is shown to future learners, we cast this as an instance of learnersourcing, a paradigm that presents new research opportunities for the Learning Analytics community. This study characterizes a dataset from one Peer Instruction tool, that includes not only the student generated answers and explanations, but this novel “vote” attribute, which aims to captur...

The EDM Conference was held in Raleigh this year, from June 29 to July 2, and for the second time... more The EDM Conference was held in Raleigh this year, from June 29 to July 2, and for the second time it held a Journal track which was edited by Kalina Yacef this year. The Journal track allows papers submitted to JEDM to be presented at the conference. A summary is available in the proceedings, and the full text is published in the Journal.

Recent interest in online education, such as Massively Open Online Courses and intelligent tutori... more Recent interest in online education, such as Massively Open Online Courses and intelligent tutoring systems, promises large amounts of data from students solving items at different levels of proficiency over time. Existing approaches for inferring students’ knowledge from data require a cognitive model – a mapping between the tutor problems and the set of skills they require. This is a very expensive requirement, since it depends on expert domain knowledge. The success of previous methods in using student performance data to construct this mapping automatically has been limited in that they cannot handle data collected over time, or that they require expensive expert domain knowledge. This dissertation studies how to model students’ time varying knowledge, without requiring expert domain knowledge. We introduce four novel methods: • Dynamic Cognitive Tracing: an easily implemented prototype that jointly estimates cognitive and student models. • Automatic Knowledge Tracing: a method ...

New Rev. Hypermedia Multim., 2018

ACM UMAP is an annual conference on user modeling, adaptation and personalization. User modeling ... more ACM UMAP is an annual conference on user modeling, adaptation and personalization. User modeling concerns the process of understanding the user’s needs, preferences, interests, knowledge and other aspects. This is achieved by reasoning about and extracting knowledge from user data, which includes both data that is explicitly provided by the user—such as profile data—and implicitly gathered usage data—such as browsing data. Adaptation and personalization techniques exploit the user models in order to better tailor a software system, such as a website, to the user needs. Recommender systems are the best known type of personalized systems, but the field is much wider and includes among others personalized search, adaptive user interfaces, personalized advice, and personalized technology-enhanced learning. This special issue contains extended versions of selected papers from UMAP 2017, the 25th edition of the conference series. The conference was hosted in Bratislava, Slovakia, from 9 t...

The EDM Conference was held in Madrid, Spain, this year, from June 26 to June 29, and it included... more The EDM Conference was held in Madrid, Spain, this year, from June 26 to June 29, and it included for the first time a Journal track which was edited by Michel Desmarais and Mykola Pechenizskiy. The Journal track allows papers submitted to JEDM to be presented at the conference. A summary is available in the proceedings, and the full text is published in the Journal.

Procedia Computer Science, 2021

Addressing Global Challenges and Quality Education, 2020

This paper presents the results of a study, carried out as part of the design-based development o... more This paper presents the results of a study, carried out as part of the design-based development of an online self-assessment for prospective students in higher online education. The self-assessment consists of a set of tests – predictive of completion – and is meant to improve informed decision making prior to enrolment. The rationale being that better decision making will help to address the ongoing concern of non-completion in higher online education. A prototypical design of the self-assessment was created based on an extensive literature review and correlational research, aimed at investigating validity evidence concerning the predictive value of the tests. The present study focused on investigating validity evidence regarding the content of the self-assessment (including the feedback it provides) from a user perspective. Results from a survey among prospective students (N = 66) indicated that predictive validity and content validity of the self-assessment are somewhat at odds: ...

Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, 2017

![Research paper thumbnail of Fondements m'ethodologiques et empiriques d'un syst`eme consultant actif pour l''edition de texte](https://a.academia-assets.com/images/blank-paper.jpg)

2014 International Conference on Data Science and Advanced Analytics (DSAA), 2014

General Bayesian network classifier (GBNC) contains only features necessary for classification, s... more General Bayesian network classifier (GBNC) contains only features necessary for classification, so an ideal structure learning solution is to learn GBNC without having to learn the whole Bayesian network (BN). A local search based algorithm called LAS-GBNC is proposed. Given faithfulness assumption, LAS-GBNC relies on the information about each variable's appearance in the so-called d-separator(cut set) to sort candidate CI tests dynamically, performing `effective' ones with priority. Experimental studies indicate that (1) LAS-GBNC achieves the same quality of networks as PC and IPC-BNC, (2)It is much more efficient than PC due to its local search design, and (3) It is obviously faster than IPC-BNC because of its adaptive search strategy.

Lecture Notes in Computer Science

Learning of Markov blanket (MB) can be regarded as an optimal solution to the feature selection p... more Learning of Markov blanket (MB) can be regarded as an optimal solution to the feature selection problem. In this paper, an efficient and effective framework is suggested for learning MB. Firstly, we propose a novel algorithm, called Iterative Parent-Child based search of MB (IPC-MB), to induce MB without having to learn a whole Bayesian network first. It is proved correct, and is demonstrated to be more efficient than the current state of the art, PCMB, by requiring much fewer conditional independence (CI) tests. We show how to construct an AD-tree into the implementation so that computational efficiency is further increased through collecting full statistics within a single data pass. We conclude that IPC-MB plus AD-tree appears a very attractive solution in very large applications.