Decision Tree Research Papers - Academia.edu (original) (raw)

Corner kicks are one of the most important set pieces in high-level football. The present study aimed to analyze the evolution of the tactical approach to corner kicks in high-performance football. For this, a total of 1704 corner kicks... more

Corner kicks are one of the most important set pieces in high-level football. The present study aimed to analyze the evolution of the tactical approach to corner kicks in high-performance football. For this, a total of 1704 corner kicks executed in the 192 matches corresponding to the 2010, 2014 and 2018 FIFA World Cups were analyzed. To achieve the proposed objectives, the observational methodology was used. The results show an evolution in the mode of execution of these actions, but instead the success rate remains low. The log-linear test allowed to find significant relationships between some of the most important categorical variables in these actions: match status, number of intervening attackers and time. The decision tree models show that the number of players involved in these actions is the criterion that presents the greatest information gain. These results corroborate previous multivariate studies, although more research is still needed. Finally, the results of the presen...

Feed is a crucial variable because it can determine the success of fish farming. Breeders can use two types of artificial feed, namely alternative feed and pellets. Many cultivators need pellets as the main consumption for the fish they... more

Feed is a crucial variable because it can determine the success of fish farming. Breeders can use two types of artificial feed, namely alternative feed and pellets. Many cultivators need pellets as the main consumption for the fish they are cultivating because the pellets contain a composition that has been adjusted to their needs based on the type and age of the fish. However, currently, cultivators are facing a problem, namely the high price of fish pellets on the market. Therefore, an analysis of the classification of the selection of fish feed sellers is needed that is adjusted to several criteria like the number of types of feed, price, order, delivery, and availability of discounts. This study conducted a classification analysis of simplification of characteristics in selecting fish feed sellers in Kendal Regency that would then be compared with a model without feature selection by utilizing the Decision Tree C4.5 method. The results of this study are the decision tree with th...

The field of educational data mining has gained significant traction for its pivotal role in assessing students' academic achievements. However, to ensure the compatibility of algorithms with the selected dataset, it is imperative for a... more

The field of educational data mining has gained significant traction for its
pivotal role in assessing students' academic achievements. However, to
ensure the compatibility of algorithms with the selected dataset, it is
imperative for a comprehensive analysis of the algorithms to be done. This
study delved into the development of machine learning algorithms utilizing
students' online learning activities to effectively classify their academic
performance. In the data cleaning stage, we employed VarianceThreshold
for discarding features that have all zeros. Feature selection and
oversampling techniques were integrated into the data preprocessing, using information gain to facilitate efficient feature selection and synthetic
minority oversampling technique (SMOTE) to address class imbalance. In
the classification phase, three supervised machine learning algorithms:
k-nearest neighbors (KNN), multi-layer perceptron (MLP), and logistic
regression (LR) were implemented, with 3-fold cross-validation to enhance
robustness.
Classifiers’
performance underwent refinement through
hyperparameter tuning via GridSearchCV. Evaluation metrics, encompassing accuracy, precision, recall, and F1-score, were meticulously measured for each classifier. Notably, the study revealed that both MLP and LR achieved impeccable scores of 100% across all metrics, while KNN exhibited a noticeable performance boost after using hyperparameter tuning.

A chest X-ray can convey a lot about a patient's condition. However, it requires a specialized and skilled doctor to determine the type of lung disease with high accuracy. Here comes the role of deep learning techniques (DL) and... more

A chest X-ray can convey a lot about a patient's condition. However, it
requires a specialized and skilled doctor to determine the type of lung
disease with high accuracy. Here comes the role of deep learning techniques (DL) and artificial intelligence (AI) in accelerating the process of detecting lung diseases and classifying them with high precision, which saves time and effort for the patient and the doctor alike. This work presents a proposed model for a machine learning (ML) and AI system to analyze chest X-ray images and categorize them into four cases normal, viral pneumonia, bacterial pneumonia, and coronavirus disease 2019 (COVID-19). The system relies on extracting Mel frequency cepstral coefficient (MFCC) features from a dataset consisting of 4,800 chest X-ray images, and then these features are used to train four basic classifiers based on the data mining tool Orange3, which are adaptive boosting (AdaBoost), decision trees (DTs), gradient boosting (GB), and random forest (RF). The model was tested and evaluated, where the AdaBoost classifier excelled with an accuracy of 100%, followed by RF with an accuracy of 99.5%. Finally, GB and DTs came with a classification accuracy of 98.5%, and 97.2%, respectively.

Heart disease is the one of the most common disease. This disease is quite common now a days we used different attributes which can relate to this heart diseases well to find the better method to predict and we also used algorithms for... more

Heart disease is the one of the most common disease. This disease is quite common now a days we used different attributes which can relate to this heart diseases well to find the better method to predict and we also used algorithms for prediction. Naive Bayes, algorithm is analyzed on dataset based on risk factors. We also used decision trees and combination of algorithms for the prediction of heart disease based on the above attributes. The results shown that when the dataset is small naive Bayes algorithm gives the accurate results and when the dataset is large decision trees gives the accurate results.

In this paper, a digital twin of the network of heating systems for smart cities is developed using the example of the city of Almaty. The study used machine learning algorithms to estimate future thermal energy consumption and develop... more

In this paper, a digital twin of the network of heating systems for smart cities is developed using the example of the city of Almaty. The study used machine learning algorithms to estimate future thermal energy consumption and develop thermodynamic formulas. This work offers a thorough and in-depth analysis of thermal energy consumption. In addition, the paper identifies the relationship between thermal energy consumption and ambient temperature, and wind uncertainty in certain urban areas using machine learning methods to predict thermal energy consumption. Using both training and regression models, this interdependence is revealed. The obtained forecasts provide
useful information for studying the structure of heat consumption in Almaty and reducing heat losses by reducing overheating in the zones of heating networks. In addition, the study analyzes high-resolution spatial data collected from 385 homes and 62 heat transfer circuits located throughout the city during the heating season. The study examines the degree of relationship between the ambient temperature and the amount of heat energy used in the areas of Astana. A minor impact of wind speed is also estimated. These discoveries allow us to use machine learning algorithms to find the location of hot spots and inefficient zones with high losses.

Backpackers often travel for a longer period of time, have their own budgets and requirements on accommodations. The existing systems do not offer personalized recommendation criteria and some proposed inefficient recommender system (RS)... more

Backpackers often travel for a longer period of time, have their own budgets and requirements on accommodations. The existing systems do not offer personalized recommendation criteria and some proposed inefficient recommender system (RS) for users. Moreover, other than information searching from websites and bloggers, only limited systems were specifically designed for backpackers’ accommodations recommender system. An observation and online survey was conducted to get the information from backpackers regarding their preferences while looking for the accommodations. Fifty (50) respondents were involved in the survey and the data have been analyzed and were classified to build a decision tree. The decision tree model then implemented in the Backpackers’ accommodations Recommender System (BRS). BRS offers a convenient way and solution for backpackers by including decision tree technique in the system to suggest best accommodations suit to backpacker’s preferences.

Objective The main objective of this paper is to compare the performance of logistic regression and decision tree classification methods and to find the significant environment determinants that causes pre-term birth. Design, setting and... more

Objective The main objective of this paper is to compare the performance of logistic regression and decision tree classification methods and to find the significant environment determinants that causes pre-term birth. Design, setting and population Between 2017 to 2018, 90 pregnant females underwent birth outcome followed by research staff at our institutions, out of those 50 are full-term and 40 are preterm births in this study. Method Before and after feature selection logistic regression and decision tree classifier model has been compared in this dataset and to evaluate the model accuracy. Main outcome measures Preforming the accuracy of machine learning classification model and important factors on pre-term birth. Results: Using chi-square test and find the Area of residence and GSH, MDA, α-HCH, total HCH and total DDT are responsible for the preterm birth. Using the multiple logistic regression, pre term birth was associated with MDA and α-HCH (95% CI 0.04 to 0.48 and 95% CI 0...

In the Philippines, passing the Licensure Examination for Teachers (LET) is the first step toward becoming a professional teacher and a crucial evaluation tool for assessing the quality of teacher education programs in Higher Education... more

In the Philippines, passing the Licensure Examination for Teachers (LET) is the first step toward becoming a professional teacher and a crucial evaluation tool for assessing the quality of teacher education programs in Higher Education Institutions (HEIs). The alarming decline in the LET’s passing rate from 31.45% in 2010 to 27.28% in 2018 has raised the need for a proactive approach to predicting candidates’ performance in the LET. Thus, the study aims to determine the best machine-learning classification model for predicting LET results for mathematics teachers, which can help improve and ensure that they are well-prepared to pass the LET. The study employs educational data mining and machine learning principles to test the three algorithms: Gradient Boosted Trees, Logistic Regression, and Naïve Bayes. Data were collected from four participating universities, comprising 769 data points. The performance of the models was measured using accuracy, classification error, precision, recall, Area Under the Curve, and F1-score. All three models performed satisfactorily to excellent, with the Gradient Boosted Trees outperforming the other models in the training and testing phases. Nevertheless, Logistic Regression outperforms the other two in all indices on the evaluation data set. Thus, it was concluded that Logistic Regression is the most suitable model for predicting LET results for mathematics teachers due to its stability and reliability when subjected to evaluation data. The findings emphasize the importance of utilizing machine learning models to gain insights into LET results, enabling HEIs to create policies and provide targeted support and interventions to teacher candidates.

Many deaf people worldwide face problems with integrating into society and interacting with people who do not understand sign language. This can lead to isolation and difficulty in expressing feelings. In this research, our primary goal... more

Many deaf people worldwide face problems with integrating into society and interacting with people who do not understand sign language. This can lead to isolation and difficulty in expressing feelings. In this research, our primary goal is to help deaf people communicate, express their feelings, and socialize with others. Toward that end, 40 Arabic words that are commonly used in social interactions were used to build a dataset of hand movements used by deaf people to express these words. These movements were recorded using a Leap Motion Controller ( LMC ). The resulting dataset consists of 1,579 instances and 112 features, recorded with the help of five deaf persons. Feature reduction and oversampling techniques were applied to analyze the dataset. Machine learning algorithms were then used to build a model that is able to classify any given hand posture or gesture into one of those 40 words. This work compared the performance of nine classification algorithms: Random Forest, Decis...

Affording in the direction of Breast Cancer Organization, Breast Cancer is solitary and one and only of the most perilous sorts of viruses that is located operative for females in the biosphere. By way of experimental professional... more

Affording in the direction of Breast Cancer Organization, Breast Cancer is solitary and one and only of the most perilous sorts of viruses that is located operative for females in the biosphere. By way of experimental professional distinguishing this cancer in her initial phase aids in abiding breathes. Based on cancer.net proposal individualized funnels for additional 120 kinds of cancer and correlated to genetic diseases. Aimed At discovering breast cancer fundamentally AI rehearses are utilized. We have foreseen adaptive ensemble voting scheme for broke down breast cancer with WBC (Wisconsin Breast Cancer) record. Intention of our effort is to associate & describe in what way CNN and logistic algorithm afford used for detecting breast cancer yet the variables are condensed. Here remain 2 categories of tumours be situated. Benign tumour and malignant tumours, where benign tumour is non-cancer and malignant is cancer tumour

This article provides an overview of modern machine learning methods in the context of their active use in credit scoring, with particular attention to the following algorithms: light gradient boosting machine (LGBM) classifier, logistic... more

This article provides an overview of modern machine learning methods in the context of their active use in credit scoring, with particular attention to the following algorithms: light gradient boosting machine (LGBM) classifier, logistic regression (LR), linear discriminant analysis (LDA), decision tree (DT) classifier, gradient boosting classifier and extreme gradient boosting (XGB) classifier. Each of the methods mentioned is subject to careful analysis to evaluate their applicability and effectiveness in predicting credit risk. The article examines the advantages and limitations of each method, identifying their impact on the accuracy and reliability of borrower creditworthiness assessments. Current trends in machine learning and credit scoring are also covered, warning of challenges and discussing prospects. The analysis highlights the significant contributions of methods such as LGBM classifier, LR, LDA, DT classifier, gradient boosting classifier and XGB classifier to the development of modern credit scoring practices, highlighting their potential for improving the accuracy and reliability of borrower creditworthiness forecasts in the financial services industry. Additionally, the article discusses the importance of careful selection of machine learning models and the need to continually update methodology in light of the rapidly changing nature of the financial market.

This study investigates the pertinence of machine learning techniques on various datasets and how we can leverage it in prediction of health risks. I investigated how well two algorithms—Logistic Regression and Multi-Layered Perceptron... more

This study investigates the pertinence of machine learning techniques on various datasets and how we can leverage it in prediction of health risks. I investigated how well two algorithms—Logistic Regression and Multi-Layered Perceptron (MLP)—predict health outcomes and risk. To be more precise, I evaluated the model's capacity to recognize stroke risk using a dataset of stroke predictions. By means of comparison analysis, this study seeks to clarify the advantages and disadvantages of each algorithm when used with these disparate data kinds, providing information about how well-suited they are for different prediction tasks. Additionally, I provided a framework for data analysis that outlines crucial procedures for data preparation, cleaning, and exploration. This framework may be used to improve the efficacy of machine learning models on a variety of datasets.

Masih terdapat kemungkinan kesalahan penilaian agunan sebagai acuan nilai kredit, yang akan membuka peluang terjadinya NPL. Jadi diperlukan suatu cara penilaian (prediksi nilai) yang cukup proporsional, kredibel dan akurat. Prediksi yang... more

Masih terdapat kemungkinan kesalahan penilaian agunan sebagai acuan nilai kredit, yang akan membuka peluang terjadinya NPL. Jadi diperlukan suatu cara penilaian (prediksi nilai) yang cukup proporsional, kredibel dan akurat. Prediksi yang tidak akurat menyebabkan perencanaan manajemen kredit yang tidak tepat. Prediksi nilai agunan telah menarik minat banyak peneliti karena nilai pentingnya baik di teoritis dan empiris. Model yang berbeda dapat memberikan keakuratan yang berbeda pula. Karena itu penelitian ini bertujuan menerapkan algoritma decision tree C.45 untuk penilaian agunan pengajuan kredit. Penelitian ini menggunakan data agunan pengajuan kredit di Kota Banjarmasin. Evaluasi kinerja algoritma menggunakan precision and recall dan AUC kemudian dibandingkan dan dianalisa hasilnya antara metode analisis lain (Naive Bayes, K-NN) dengan hasil prediksi dengan metode klasifikasi algoritma C4.5. Hasilnya, Decision Tree C4.5 dapat diterapkan dalam penilaian agunan kredit dengan akurasi...

Dilation of biliary tree can be an indicator of several diseases such as stones, tumors, benign strictures, and some cases cancer. This dilation can be due to many reasons such as gallstones, inflammation of the bile ducts, trauma,... more

Dilation of biliary tree can be an indicator of several diseases such as stones, tumors, benign strictures, and some cases cancer. This dilation can be due to many reasons such as gallstones, inflammation of the bile ducts, trauma, injury, severe liver damage. Automatic measurement of the biliary tree in magnetic resonance images (MRI) is helpful to assist hepatobiliary surgeons for minimally invasive surgery. In this paper, we proposed a model to segment biliary tree MRI images using a Fully Convolutional Neural (FCN) network. Based on the extracted area, seven features that include Entropy, standard deviation, RMS, kurtosis, skewness, Energy and maximum are computed. A database of images from King Hussein Medical Center (KHMC) is used in this work, containing 800 MRI images; 400 cases with normal biliary tree; and 400 images with dilated biliary tree labeled by surgeons. Once the features are extracted, four classifiers (Multi-Layer perceptron neural network, support vector machin...

Breast cancer (BC) is a major global health concern. Detecting BC at an early stage gives more treatment options and can help avoid more aggressive treatments. The use of machine learning (ML) in BC prediction offers significant potential... more

Breast cancer (BC) is a major global health concern. Detecting BC at an early stage gives more treatment options and can help avoid more aggressive treatments. The use of machine learning (ML) in BC prediction offers significant potential for improving the accuracy and speed of diagnosis, personalizing treatment, and identifying high-risk patients. However, there are significant challenges associated with the use of ML, including the need for high-quality data and more flexible models with optimal parameters to achieve high efficiency. In this paper, we propose an optimized framework based on multi-stage data exploration. This framework is designed to provide a comprehensive approach to data exploration, ensuring that the data is well-prepared for ML. In addition, the framework includes dynamic ensemble-based classifiers, which combine multiple independent classifiers to improve accuracy and mitigate the risk of overfitting in conjunction with the cross-validation techniques. These classifiers are optimized using Bayesian hyperparameter tuning, which involves selecting the optimal values for the various hyperparameters of the model. This approach can significantly improve the prediction accuracy of the resulting model. The study evaluates the framework using the publicly available Wisconsin Diagnostic Breast Cancer (WDBC) dataset and compares our results with other state-of-the-art models. The experimental results show that the best result is 100% for accuracy and recall with hyperparameters of (Ensemble Method = AdaBoost, Number of learners = 322, learning rate = 0.9350, and the Maximum number of splits = 1). The highest performance has been achieved with the proposed framework compared with the other models in terms of accuracy (mean = 99.35%, best = 100%, worst = 98.7%, and Standard Deviation = 0.325). The framework can potentially improve the accuracy and efficiency of BC prediction, ultimately leading to better outcomes for patients.

In the electrical discharge machining (EDM) process, especially during the machining of hardened steels, changes in tool shape have been identified as one of the major problems. To understand the aforesaid dilemma, an initiative was... more

In the electrical discharge machining (EDM) process, especially during the machining of hardened steels, changes in tool shape have been identified as one of the major problems. To understand the aforesaid dilemma, an initiative was undertaken through this experimental study. To assess the distortion in tool shape that occurs during the machining of EN31 tool steel, variations in tool shape were examined by monitoring the roundness of the tooltip before and after machining with a coordinate measuring machine. The change in out-of-roundness of the tooltip varied from 5.65 to 37.8 µm during machining under different experimental conditions. It was revealed that the input current, the pulse on time, and the pulse off time had most significant effect in terms of changes in the out-of-roundness values during machining. Machine learning techniques (decision tree, random forest, generalized linear model, and neural network) were applied for the prediction of changes in tool shape. It was o...

Managerial flexibility has value in the context of uncertain R…D projects, as management can repeatedly gather information about uncertain project and market characteristics and, based on this information, change its course of action.... more

Managerial flexibility has value in the context of uncertain R…D projects, as management can repeatedly gather information about uncertain project and market characteristics and, based on this information, change its course of action. This value is now well accepted and referred to as “real option value.” We introduce, in addition to the familiar real option of abandonment, the option of corrective action that management can take during the project. The intuition from options pricing theory is that higher uncertainty in project payoffs increases the real option value of managerial decision flexibility. However, R…D managers face uncertainty not only in payoffs, but also from many other sources. We identify five example types of R…D uncertainty, in market payoffs, project budgets, product performance, market requirements, and project schedules. How do they influence the value from managerial flexibility? We find that if uncertainty is resolved or costs/revenues occur after all decisi...

COVID-19 continues to cause a significant impact on public health. To minimize this impact, policy makers undertake containment measures that however, when carried out disproportionately to the actual threat, as a result if errorneous... more

COVID-19 continues to cause a significant impact on public health. To minimize this impact, policy makers undertake containment measures that however, when carried out disproportionately to the actual threat, as a result if errorneous threat assessment, cause undesirable long-term socio-economic complications. In addition, macro-level or national level decision making fails to consider the localized sensitivities in small regions. Hence, the need arises for region-wise threat assessments that provide insights on the behaviour of COVID-19 through time, enabled through accurate forecasts. In this study, a forecasting solution is proposed, to predict daily new cases of COVID-19 in regions small enough where containment measures could be locally implemented, by targeting three main shortcomings that exist in literature;the unreliability of existing data caused by inconsistent testing patterns in smaller regions, weak deploy-ability of forecasting models towards predicting cases in previ...

The issue of identifying the prevalence of sickness that is linked to the population of a nation, state, neighborhood, organization, or school has not been taken into consideration by the majority of prior studies on the prediction of... more

The issue of identifying the prevalence of sickness that is linked to the population of a nation, state, neighborhood, organization, or school has not been taken into consideration by the majority of prior studies on the prediction of illness among populations. They frequently merely choose any sickness based on assumption, while those that determined the prevalence of the condition before developing their framework utilized survey data or data from web repositories, which removes idiosyncrasies from those data. In order to increase performance, this research suggests an enhanced data analytics framework for the predictive diagnosis of common illnesses affecting university students. In order to do this, exploratory data analysis (EDA) using a multivariate analytic technique was conducted using a high-level model methodology using CRISP-DM stages. When the suggested strategy was evaluated on support vector machines, ensemble gradient boosting, random forest, decision tree, K-neighbor...

As markets have become increasingly saturated, companies have acknowledged that their business strategies need to focuson identifying those customers who are most likely to churn. It is becoming common knowledge in business, that... more

As markets have become increasingly saturated, companies have acknowledged that their business strategies need to focuson identifying those customers who are most likely to churn. It is becoming common knowledge in business, that retainingexisting customers is the best core marketing strategy to survive in industry. In this research, both descriptive and predictivedata mining techniques were used to determine the calling behaviour of subscribers and to recognise subscribers with highprobability of churn in a telecommunications company subscriber database. First a data model for the input data variablesobtained from the subscriber database was developed. Then Simple K-Means and Expected Maximization (EM) clusteringalgorithms were used for the clustering stage, while Decision Stump, M5P and RepTree Decision Tree algorithms were usedfor the classification stage. The best algorithms in both the clustering and classification stages were used for the predictionprocess where customers that...