Machine Learning Prediction of University Student Dropout: Does Preference Play a Key Role? (original) (raw)

Towards a Students’ Dropout Prediction Model in Higher Education Institutions Using Machine Learning Algorithms

International Journal of Emerging Technologies in Learning (iJET)

Using machine learning to predict students’ dropout in higher education institutions and programs has proven to be effective in many use cases. In an approach based on machine learning algorithms to detect students at risk of dropout, there are three main factors: the choice of features likely to influence a partial or total stop of the student, the choice of the algorithm to implement a prediction model, and the choice of the evaluation metrics to monitor and assess the credibility of the results. This paper aims to provide a diagnosis of machine learning techniques used to detect students’ dropout in higher education programs, a critical analysis of the limitations of the models proposed in the literature, as well as the major contribution of this arti-cle is to present recommendations that may resolve the lack of global model that can be generalized in all the higher education institutions at least in the same country or in the same university.

A Real-Life Machine Learning Experience for Predicting University Dropout at Different Stages Using Academic Data

IEEE Access, 2021

High levels of school dropout are a major burden on the educational and professional development of a country's inhabitants. A country's prosperity depends, among other factors, on its ability to produce higher education graduates capable of moving a country forward. To alleviate the dropout problem, more and more institutions are turning to the possibilities that artificial intelligence can provide to predict dropout as early as possible. The difficulty of accessing personal data and privacy issues that it entails force the institutions to rely on the Academic Data of their students to create accurate and reliable predictive systems. This work focuses on creating the best possible predictive model based solely on academic data, and accordingly, its capacity to infer knowledge must be maximised. Thus, Feature Engineering and Instance Engineering techniques such as dealing with redundancy, significance of the features, correlation, cardinality features, missing values, creation or elimination of features, data fusion, removal of unuseful instances, binning, resampling, normalisation, or encoding are applied in detail before the construction of well-known models such as Gradient Boosting, Random Forest, and Support Vector Machine along with an Ensemble of them at different stages: prior to enrolment, at the end of the first semester, at the end of the second semester, at the end of the third semester, and at the end of the fourth semester. Through the construction of these predictive models that serve as inputs to a decision support system, the application of effective dropout prevention policies can be applied.

A Machine Learning Approach to Detect Student Dropout at University

International Journal of Advanced Trends in Computer Science and Engineering, 2021

In universities, student dropout is a major concern that reflects the university's quality. Some characteristics cause students to drop out of university. A high dropout rate of students affects the university's reputation and the student's careers in the future. Therefore, there's a requirement for student dropout analysis to enhance academic plan and management to scale back student's drop out from the university also on enhancing the standard of the upper education system. The machine learning technique provides powerful methods for the analysis and therefore the prediction of the dropout. This study uses a dataset from a university representative to develop a model for predicting student dropout. In this work, machinelearning models were used to detect dropout rates. Machine learning is being more widely used in the field of knowledge mining diagnostics. Following an examination of certain studies, we observed that dropout detection may be done using several methods. We've even used five dropout detection models. These models are Decision tree, Naïve bayes, Random Forest Classifier, SVM and KNN. We used machine-learning technology to analyze the data, and we discovered that the Random Forest classifier is highly promising for predicting dropout rates, with a training accuracy of 94% and a testing accuracy of 86%.

Impact of Postgraduate Students Dropout and Delay in University: Analysis Using Machine Learning Algorithms

International Journal of Advanced Trends in Computer Science and Engineering , 2021

Cost of education and economic background are some factors that influence student dropout from postgraduate studies. However, high dropouts do not affect the students only, but also impact university revenue. This research analyzes various literature on machine learning algorithms and applies suitable algorithm to produce a prediction model. This study indicates that decision tree and Random Forest algorithms have better accuracy, class recall, and class precision than Naïve Bayes. Therefore, the prediction model uses the Decision Tree algorithm to provide various approaches to maximize revenue in universities. The findings indicate high dropout rates negatively impact university revenue, while low rates influence revenue positively. Other aspects like grants received by students, the number of research publications, and degree level also positively or negatively impact revenue if the dropout rate is medium. A complete understanding of this prediction model can identify and minimize the risk of early withdrawal or delayed graduation and improve revenue generation by universities.

Evaluation of Prediction Algorithms in the Student Dropout Problem

Journal of Computer and Communications, 2020

University dropout is a growing problem which, in recent years, is using computer techniques to assist in the detection process. The paper presents the evaluation of some prediction algorithms to detect a student with a high possibility of scholar desertion. The approach uses real data from past scholar periods to create a dataset with different information of the students (i.e., personal, economic, and academic records). The algorithms selected in the experimental phase were: J48 decision tree, K-near neighbors, and support vector machine. We use two similarity metrics to split the dataset with cases with at least 80% of similarity to evaluate each case. We use the data from 2010 to 2016 with real students' information to predict if there exists the possibility of a real academic dropout in one test for a period. The results show that the J48 algorithm reaches a better performance in both experiments. Besides, the tree generated for each student is taken as a path of attention, reaching around 88% of effectiveness. Finally, the conclusions argue the contributions of the paper and propose a future line of research.

The Use of Predictive Analyzes for University Dropout Cases

Iraqi journal of science, 2021

We will also derive practical solutions using predictive analytics. And this would include application making predictions with real world example from University of Faculty of Chariaa of Fez. As soon as student enrolled to the university, they will certainly encounter many difficulties and problems which discourage their motivation towards their courses and which pushes them to leave their university.The aim of our article is to manage an investigation of the issue of dropping out their studies. This investigation actively integrates the benefits ofmachine learning. Hence, we will concentrate on two fundamental strategies which are KNN, which depends on the idea of likeness among data; and the famous strategy SVM, which can break the issues of classification.Thanks to predictive analytics, we can come up concrete solutions to decrease this issue. Therefore, our case study was specifically limited to University of Chariaa-Fez, Morocco.

A MACHINE LEARNING APPROACH TO PREDICT STUDENTS DROPOUT IN ZIMBABWE UNIVERSITIES: A CASE STUDY OF CHINHOYI UNIVERSITY OF TECHNOLOGY

chinhoyi university of technology, 2021

Student dropout is a major problem facing many academic institutions worldwide. Student dropouts do not only negatively affect the student, but affects the institution’s reputation and quality. Although there are several theoretical perspectives on student dropouts, that focus on the causes of dropouts, recent studies have leveraged machine learning techniques to explore the problem further. This study presents a novel approach that uses data mining techniques to predict dropouts among university students with similar cultural and social background. The build models successfully identified dropouts at an early stage. Thus, the models can act as a warning system that identifies students at risk of dropping out and management can promptly intervene to steer them to graduation. In addition, the machine learning models could affectively determine the top predictor variables for dropouts. The study reveals that student’s high school performance, first and second semester GPA are the top predictors of dropouts. The research also shows that SVM is a robust algorithm for making predictions on student dropouts with an AUC value more than 78%. An interesting future research work to pursue would be to build a degree program recommender systems to recommend degree programs to students. This will enhance student success rate and steer them to graduation.

Early Detection of Students at Risk – Predicting Student Dropouts Using Administrative Student Data and Machine Learning Methods

SSRN Electronic Journal, 2018

To successfully reduce student attrition, it is imperative to understand which students are at risk of dropping out. We develop an early detection system (EDS) to predict student success in tertiary education as a basis for a targeted intervention. The EDS uses regression analysis, neural networks, decision trees and the AdaBoost algorithm to identify student characteristics which distinguish potential dropouts from graduates. The developed method can be implemented in every German university, as it uses student performance and demographic data collected and maintained by legal mandate. Therefore the EDS self-adjusts to the university where it is employed. The EDS is tested and applied on a state university and a private university of applied sciences. Both institutes of higher education differ considerably in their organization, tuition fees and studentteacher ratios. Our results indicate a prediction accuracy at the end of the first semester of 79% for the state university and 85% for the private university of applied sciences. After the fourth semester, the accuracy improves to 90% for the state university and 95% for the private university of applied sciences.

Intelligent System to Predict University Students Dropout

International Journal of Online and Biomedical Engineering (iJOE)

The objective of this research is to reduce the dropout rate of students in the Faculty of Systems Engineering and Informatics of the Universidad Nacional Mayor de San Marcos – FISI-UNMSM, through the implementation of an intelligent system with a data mining approach and the autonomous learning algorithm (decision trees) that predicts which students are at risk of dropping out. It was developed in Python and the free software Weka, for this purpose student data was collected from 2014 to 2020. This solution increases the availability and the level of satisfaction of the faculty; in the learning process, an accuracy percentage of 90.34% and precision of 95.91% was obtained, so the data mining model is considered valid. In addition, it was found that the variables that most influenced students in making the decision to abandon their studies were the historical weighted average, the weighted average of the last cycle and the number of credits passed.

An Analysis of Student Representation, Representative Features and Classification Algorithms to Predict Degree Dropout

Proceedings of the 9th International Conference on Learning Analytics & Knowledge

Identifying and monitoring students who are likely to dropout is a vital issue for universities. Early detection allows institutions to intervene, addressing problems and retaining students. Prior research into the early detection of at-risk students has opted for the use of predictive models, but a comprehensive assessment of the suitability of different algorithms and approaches is complicated by the large number of variable features that constitute a student's educational experience. Predictive models vary in terms of their amplitude, temporality and the learning algorithms employed. While amplitude refers to the ability of the model to operate on multiple degrees, temporality is often considered due to the natural temporal aspect of the data. In the absence of a comparative framework of learning algorithms, the aim of this paper has been to provide such an analysis, based on a proposed classification of strategies for predicting dropouts in Higher Education Institutions. Three different student representations are implemented (namely Global Feature-Based, Local Feature-Based, and Time Series) in conjunction with the appropriate learning algorithms for each of them. A description of each approach, as well as its implementation process, are presented in this paper as technical contributions. An experiment based on a dataset of student information from two degrees, namely Business Administration and Architecture, acquired through an automated management system from a university in Brazil is used. Our findings can be summarized as: (i) of the three proposed student representations, the Local Feature-Based was the most suitable approach for predicting dropout. In addition to providing high quality results, the Local Feature-Based representations are simple to build, and the construction of the model is less expensive when compared to more complex ones; (ii) as a conclusion of the results obtained via Local Feature-Based, dropout can be said to be accurately predicted using grades of a few core courses, so there is no need for a complex features extraction process; (iii) considering temporal aspects of