Sebastián Ventura - Profile on Academia.edu (original) (raw)

Papers by Sebastián Ventura

Educational Data Mining, Jul 1, 2009

EDM brings together researchers from computer science, education, psychology, psychometrics, and ... more EDM brings together researchers from computer science, education, psychology, psychometrics, and statistics to analyze large data sets to answer educational research questions. The increase in instrumented educational software and databases of student test scores, has created large repositories of data reflecting how students learn. The EDM conference focuses on computational approaches for using those data to address important educational questions. The broad collection of research disciplines ensures cross fertilization of ideas, with the central questions of educational research serving as a unifying focus. We received a total of 54 submissions from 24 countries. Submissions were reviewed by three reviewers and 20 of them were accepted as full papers (37.03% acceptance rate). 13 other submissions were accepted as poster or as student papers. All papers will appear both on the web, at www.educationaldatamining.org, as well as in the printed proceedings. The conference also included invited talks by Professor Arthur C. Graesser from University of Memphis and by Professor Bamshad Mobasher from DePaul University. We would like to thank the Universidad de Córdoba, Escuela Universitaria Politécnica, Junta de Andalucía y Ministerio de Ciencia e Innovación for their generous sponsorship of EDM2009. We would like to thank the program committee members, local committee, web chair, the reviewers and the invited speakers for their enthusiastic help in putting this conference together.

Discovering clues to avoid middle school failure at early stages

ABSTRACT The use of data mining techniques in educational domains helps to find new knowledge abo... more ABSTRACT The use of data mining techniques in educational domains helps to find new knowledge about how students learn and how to improve the resources management. Using these techniques for predicting school failure is very useful in order to implement corrective actions. With this purpose, we try to determine the earliest stage when the quality of the results allows for clarifying the possibility of school failure. We process real information from a Spanish high school by structuring the whole data in incremental datasets, which represent how students’ academic records grow. Our study reveals an early and robust detection of the risky cases of school failure at the end of the first out of four courses.

This work presents the application of subgroup discovery techniques to e-learning data from learn... more This work presents the application of subgroup discovery techniques to e-learning data from learning management systems (LMS) of andalusian universities. The objective is to extract rules describing relationships between the use of the different activities and modules available in the elearning platform and the final mark obtained by the students. For this purpose, the results of different classical and evolutionary subgroup discovery algorithms are compared, showing the adequacy of the evolutionary algorithms to solve this problem. Some of the rules obtained are analyzed with the aim of extract knowledge allowing the teachers to take actions to improve the performance of their students.

Advances in Engineering Software, Aug 1, 2011

Nowadays, there are a great number of both specific and general data mining tools available to ca... more Nowadays, there are a great number of both specific and general data mining tools available to carry out association rule mining. However, it is necessary to use several of these tools in order to obtain only the most interesting and useful rules for a given problem and dataset. To resolve this drawback, this paper describes a fully integrated framework to help in the discovery and evaluation of association rules. Using this tool, any data mining user can easily discover, filter, visualize, evaluate and compare rules by following a helpful and practical guided process described in this paper. The paper also explains the results obtained using a sample public dataset.

Minería de reglas de asociación con programación genética gramatical

En este trabajo presentamos un algoritmo de programación genética gramatical G3P (Grammar Guided ... more En este trabajo presentamos un algoritmo de programación genética gramatical G3P (Grammar Guided Genetic Programming) para la extracción de reglas de asociación sobre conjuntos de datos. Para ello, pro-ponemos dos versiones para la extracción de reglas de asociación: ...

Modeling and predicting students’ engagement behaviors using mixture Markov models

Knowledge and Information Systems

Students' engagement reflect their level of involvement in an ongoing learning process wh... more Students' engagement reflect their level of involvement in an ongoing learning process which can be estimated through their interactions with a computer-based learning or assessment system. A pre-requirement for stimulating student engagement lie in the capability to have an approximate representation model for comprehending students' varied (dis)engagement behaviors. In this paper, we utilized model-based clustering for this purpose which generates K mixture Markov models to group students' traces containing their (dis)engagement behavioral patterns. To prevent the Expectation-Maximization (EM) algorithm from getting stuck in a local maxima, we also introduced a new initialization method named as K-EM. The proposed method initializes the EM algorithm using the results of a preliminary K-means clustering algorithm performed on students' logged problem-solving actions. We performed an experimental work on two real datasets using the three variants of the EM algorithm: the original EM, emEM, K-EM; and, non-mixture baseline models for both datasets. The proposed K-EM method has shown very promising results and achieved significant performance difference in comparison to the other approaches particularly using the Dataset1 (which contains small length traces in contrast to the Dataset2). Hence, we suggest to perform

Information Processing & Management, 2020

have fed a database about this pathology, with 1516 patients and 126 attributes, for more than 10... more have fed a database about this pathology, with 1516 patients and 126 attributes, for more than 10 years. Finding useful knowledge therein has shown to be a difficult endeavor. We present four heuristic operators and a complete methodology for searching for interesting rules that describe cases with complications and recurrences. Our proposal has shown some advantages over the well-known Apriori algorithm, for class association rule mining, and the adaptation of three representatives of associative classification. Besides, it has allowed us to identify rules with practical interest among the vast amount of trivial and sporadic associations. Colorectal cancer is a significant pathology because of its high incidence and morbi-mortality, around 70 per 100.000 patients. In fact, it is one of the most prevalent cancers in Spain and it is the second leading cause of deaths connected to cancer in the same country. According to the National Statistics Institute 1 , it stands for 3.5% of the total of deceases. Its incidence is higher in people between 50 and 70 years old, but it also appears in hereditary contexts in people under 40 years old. Colorectal cancer demands a multidisciplinary approach where different specialists, such as oncologists, surgeons, radiologists, gastroenterologists, etc. have to work properly coordinated in order to provide the best care to the patients and reduce the complications rate, what leads to inferior hospitalization lengths and less re-admissions. With the aim of auditing and improving the management of this pathology, professionals of the Reina Sofia University Hospital have fed a database with the information of 1516 colorectal cancer patients for more than 10 years, which is object of study in this work. Contrary to most of the works that apply knowledge discovery approaches to colorectal cancer, whose data are set of medical images (

Interactive Learning Environments, 2019

The aim of this paper is to categorize and describe different types of learners in massive open o... more The aim of this paper is to categorize and describe different types of learners in massive open online courses (MOOCs) by means of a subgroup discovery (SD) approach based on MapReduce. The proposed SD approach, which is an extension of the well-known FP-Growth algorithm, considers emerging parallel methodologies like MapReduce to be able to cope with extremely large datasets. As an additional feature, the proposal includes a threshold value to denote the number of courses that each discovered rule should satisfy. A post-processing step is also included so redundant subgroups can be removed. The experimental stage is carried out by considering de-identified data from the first year of 16 MITx and HarvardX courses on the edX platform. Experimental results demonstrate that the proposed MapReduce approach outperforms traditional sequential SD approaches, achieving a runtime that is almost constant for different courses. Additionally, thanks to the final post-processing step, only interesting and not-redundant rules are discovered, hence reducing the number of subgroups in one or two orders of magnitude. Finally, the discovered subgroups are easily used by courses' instructors not only for descriptive purposes but also for additional tasks such as recommendation or personalization.

Educational data science in massive open online courses

WIREs Data Mining and Knowledge Discovery, 2016

The current massive open online course (MOOC) euphoria is revolutionizing online education. Despi... more The current massive open online course (MOOC) euphoria is revolutionizing online education. Despite its expediency, there is considerable skepticism over various concerns. In order to resolve some of these problems, educational data science (EDS) has been used with success. MOOCs provide a wealth of information about the way in which a large number of learners interact with educational platforms and engage with the courses offered. This extensive amount of data provided by MOOCs concerning students' usage information is a gold mine for EDS. This paper aims to provide the reader with a complete and comprehensive review of the existing literature that helps us understand the application of EDS in MOOCs. The main works in this area are described and grouped by task or issue to be solved, along with the techniques used. WIREs Data Mining Knowl Discov 2017, 7:e1187. doi: 10.1002/widm.1187This article is categorized under: Application Areas > Education and Learning

Single and multi-objective ant programming for mining interesting rare association rules

International Journal of Hybrid Intelligent Systems, 2014

Extracting frequent and reliable rules has been the main interest of the association task of data... more Extracting frequent and reliable rules has been the main interest of the association task of data mining. However, the discovery or infrequent or rare rules is attracting a lot of interest in many domains, such as banking frauds, biomedical data and network intrusion. Most of existent solutions for discovering reliable rules that rarely appear are based on exhaustive classical approaches, which have the drawback of becoming infeasible when dealing with high complex data sets, and which do not take into account any measure of the interestingness of the rules mined. This paper explores the application of ant programming, a bio-inspired technique for finding computer programs, to the discovery of rare association rules. To this end, it proposes two algorithms: a first one which evaluates individuals generated from a single-objective point of view, and a second one which considers simultaneously several objectives to evaluate individuals' fitness. Both of them show their ability to find a high reliable and interesting set of rare rules for the data miner in a short period of time, lacking the drawbacks of exhaustive algorithms.

Educational Data Mining - EDM 2009, Cordoba, Spain, July 1-3, 2009. Proceedings of the 2nd International Conference on Educational Data Mining

Web Usage Mining for Predicting Final Marks of MOODLE Students

Computer Applications in Engineering Education, 2010

Expert Systems, 2015

Early prediction of school dropout is a serious problem in education, but it is not an easy issue... more Early prediction of school dropout is a serious problem in education, but it is not an easy issue to resolve. On the one hand, there are many factors that can influence student retention. On the other hand, the traditional classification approach used to solve this problem normally has to be implemented at the end of the course to gather maximum information in order to achieve the highest accuracy. In this paper, we propose a methodology and a specific classification algorithm to discover comprehensible prediction models of student dropout as soon as possible. We used data gathered from 419 high schools students in Mexico. We carried out several experiments to predict dropout at different steps of the course, to select the best indicators of dropout and to compare our proposed algorithm versus some classical and imbalanced well‐known classification algorithms. Results show that our algorithm was capable of predicting student dropout within the first 4–6 weeks of the course and trust...

Application of Grammar Guided Ant Programming Models to Association Rule Mining

Integrated Computer Aided Engineering

Lecture Notes in Computer Science, 2009

In this paper, we introduce a Gene Expression Programming algorithm for multi label classificatio... more In this paper, we introduce a Gene Expression Programming algorithm for multi label classification. This algorithm encodes each individual into a discriminant function that shows whether a pattern belongs to a given class or not. The algorithm also applies a niching technique to guarantee that the population includes functions for each existing class. In order to evaluate the quality of our algorithm, its performance is compared to that of four recently published algorithms. The results show that our proposal is the best in terms of accuracy, precision and recall.

IEEE transactions on cybernetics, 2014

This paper proposes a novel grammar-guided genetic programming algorithm for subgroup discovery. ... more This paper proposes a novel grammar-guided genetic programming algorithm for subgroup discovery. This algorithm, called comprehensible grammar-based algorithm for subgroup discovery (CGBA-SD), combines the requirements of discovering comprehensible rules with the ability to mine expressive and flexible solutions owing to the use of a context-free grammar. Each rule is represented as a derivation tree that shows a solution described using the language denoted by the grammar. The algorithm includes mechanisms to adapt the diversity of the population by self-adapting the probabilities of recombination and mutation. We compare the approach with existing evolutionary and classic subgroup discovery algorithms. CGBA-SD appears to be a very promising algorithm that discovers comprehensible subgroups and behaves better than other algorithms as measures by complexity, interest, and precision indicate. The results obtained were validated by means of a series of nonparametric tests.

Technology, Knowledge and Learning, 2015

This book is an edited volume concerning the current emerging topics of learning analytics (LA). ... more This book is an edited volume concerning the current emerging topics of learning analytics (LA). It is important to note that is closely related to other similar topics such as: academic analytics (AA) (

A Survey on Pre-Processing Educational Data

Educational Data Mining, 2013

ABSTRACT ata pre-processing is the first step in any data mining process, being one of the most i... more ABSTRACT ata pre-processing is the first step in any data mining process, being one of the most important but less studied tasks in educational data mining research. Pre-processing allows transforming the available raw educational data into a suitable format ready to be used by a data mining algorithm for solving a specific educational problem. However, most of the authors rarely describe this important step or only provide a few works focused on the pre-processing of data. In order to solve the lack of specific references about this topic, this paper specifically surveys the task of preparing educational data. Firstly, it describes different types of educational environments and the data they provide. Then, it shows the main tasks and issues in the pre-processing of educational data, Moodle data being mainly used in the examples. Next, it describes some general and specific pre-processing tools and finally, some conclusions and future research lines are outlined.

A genetic programming free-parameter algorithm for mining association rules

2012 12th International Conference on Intelligent Systems Design and Applications (ISDA), 2012

Abstract This paper presents a free-parameter grammar-guided genetic programming algorithm for mi... more Abstract This paper presents a free-parameter grammar-guided genetic programming algorithm for mining association rules. This algorithm uses a contex-free grammar to represent individuals, encoding the solutions in a tree-shape conformant to the grammar, so they are more expressive and flexible. The algorithm here presented has the advantages of using evolutionary algorithms for mining association rules, and it also solves the problem of tuning the huge number of parameters required by these algorithms. The main feature of ...

Data Mining in E-Learning

WIT Transactions on State of the Art in Science and Engineering, 2006

1 Introduction 3 2 Adaptive (educational) hypermedia 5 3 The AHAM reference architecture 8 4 A ge... more 1 Introduction 3 2 Adaptive (educational) hypermedia 5 3 The AHAM reference architecture 8 4 A general-purpose adaptive web-based platform 10 4.1 Overall architecture of AHA! 10 4.2 The AHA! authoring tools 13 5 Questions, quizzes and tasks 14 6 Adapting to learning styles 15 7 Conclusions 16