Phishing Website Detection Paradigm using XGBoost (original) (raw)

Detection of Phishing Websites Using Ensemble Machine Learning Approach

ITM Web of Conferences, 2021

In this paper, we propose the use of Ensemble Machine Learning Methods such as Random Forest Algorithm and Extreme Gradient Boosting (XGBOOST) Algorithm for efficient and accurate phishing website detection based on its Uniform Resource Locator. Phishing is one of the most widely executed cybercrimes in the modern digital sphere where an attacker imitates an existing - and often trusted - person or entity in an attempt to capture a victim’s login credentials, account information, and other sensitive data. Phishing websites are visually and semantically similar to real ones. The rise in online trading activities has resulted in a rise in the number of phishing scams. Cybersecurity jobs are the most difficult to fill, and the development of an automated system for phishing website detection is the need of the hour. Machine Learning is one of the most feasible methods to approach this situation, as it is capable of handling the dynamic nature of phishing techniques, in addition to prov...

PhiBoost- A novel phishing detection model Using Adaptive Boosting approach

Jordanian Journal of Computers and Information Technology, 2021

Every day, cyberattacks increase and use different strategies. One of the most common cyberattacks is Phishing, where the attacker collects sensitive and confidential information by pretending as a trusted party. Different traditional strategies have been introduced for anti-phishing, such as blacklisted, heuristic search and visual similarity. Most of these traditional methods have a high false rate and take a long time to detect the phishing website. New modes have been introduced using machine learning techniques which improve the detection's accuracy. Machine learning techniques require a huge amount of data called features that are collected from different websites. These collected features are classified into four categories. This paper introduces a novel detection model by utilizing features' selection to pick up the highly correlated features with the class label. The phase of features' selection employs independent significance features library from MATLAB and heat-map from Python to find the highly correlated features. Then, the proposed model uses an adaptive boosting approach which consists of multiple classifiers to increase the model's accuracy. The proposed model produces an extremely high predictive accuracy of approximately 99%.

A Comparative Analysis of Phishing Website Detection Using Xgboost Algorithm

2019

As most of human activities are being moved to cyberspace, phishers and other cybercriminals are making the cyberspace unsafe by causing serious risks to users and businesses as well as threatening global security and economy. Nowadays, phishers are constantly evolving new methods for luring user to reveal their sensitive information. To avoid falling victim to cybercriminals, a phishing detection algorithms is very necessary to be developed. Machine learning or data mining algorithms are used for phishing detection such as classification that categorized cyber users in to either malicious or safe users or regression that predicts the chance of being attacked by some cybercriminals in a given period of time. Many techniques have been proposed in the past for phishing detection but due to dynamic nature of some of the many phishing strategies employed by the cybercriminals, the quest for better solution is still on. In this paper, we propose a new phishing detection model based on Ex...

Solving the Problem of Detecting Phishing Websites Using Ensemble Learning Models

Scientific Journal of Astana IT University

Due to the popularity of the easiest way to obtain personal information among attackers, phishing detection is becoming a popular area for research aimed at countering the implementation of such attacks. Malicious website detection is essential to prevent the spread of malware and protect end users from victims. Unfortunately, malicious URL detection still needs to be better understood due to a lack of features and inaccurate classification. Possible sources were examined in order to investigate the subject. Based on the collected information from previous studies, this study is devoted to solving the problem of detecting phishing websites using Ensemble Learning. The aim of the work is to choose the most optimal algorithm for classifying phishing websites using gradient boosting algorithms. AdaBoost, CatBoost, and Gradient Boosting Classifier were chosen as Ensemble Learning algorithms and were used to improve the efficiency of classifiers. Practical studies of the parameters of ea...

Phishing Websites Detection by Using Optimized Stacking Ensemble Model

Computer Systems Science and Engineering

Phishing attacks are security attacks that do not affect only individuals' or organizations' websites but may affect Internet of Things (IoT) devices and networks. IoT environment is an exposed environment for such attacks. Attackers may use thingbots software for the dispersal of hidden junk emails that are not noticed by users. Machine and deep learning and other methods were used to design detection methods for these attacks. However, there is still a need to enhance detection accuracy. Optimization of an ensemble classification method for phishing website (PW) detection is proposed in this study. A Genetic Algorithm (GA) was used for the proposed method optimization by tuning several ensemble Machine Learning (ML) methods parameters, including Random Forest (RF), AdaBoost (AB), XGBoost (XGB), Bagging (BA), GradientBoost (GB), and LightGBM (LGBM). These were accomplished by ranking the optimized classifiers to pick out the best classifiers as a base for the proposed method. A PW dataset that is made up of 4898 PWs and 6157 legitimate websites (LWs) was used for this study's experiments. As a result, detection accuracy was enhanced and reached 97.16 percent.

Detection of Phishing URL using Ensemble Learning Techniques

2020

Phishing is one of the prevailing means of performing cyber-attacks. Spoofed email, social media, development of clone website are the main medium used by various phishers in order to steal the private information of an individual. Uniform Resource Locators (URLs) are the main source for sharing malwares, trojans and false information. Therefore, the accurate classification between legit and phishing url is very much important. Traditional methods of detecting phishing url were mainly rely on the blacklisting and signature based methods. Both of these methods are time consuming process and can not work effectively on new set of URL. Many machine learning classifiers also have been used, to classify the URL as phishing or legit. But, with traditional machine learning approaches, low accurate results have been achieved. Therefore, in this work we propoosed the use of ensemble learning methods. Where we have used the Bagging, AdaBoost, Random Forest and Gradient boosting algorithms. La...

URL Based Phishing Website Detection by Using Gradient and Catboost Algorithms

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

Phishing is one of the most common and most dangerous attacks among cybercrimes. The aim of these attacks is to steal the information used by individuals and organizations to conduct transactions. Phishing websites contain various hints among their contents and web browser-based information. In existing system the Random forest algorithm is used. In our proposed system, we are using different classification algorithm like bagging and boosting algorithms that are Gradient Boosting, Cat boosting to increase accuracy. The features extracted based on the features of websites in UC Irvine Machine Learning Repository. Here, we have performed the performance analysis between the boosting algorithms like Gradient boost, Cat boost and the random forest. From the performance analysis we can determine the best suitable algorithm to detect the phishing website .This study is considered to be an applicable design in automated systems with high performing classification against the phishing activity of websites.

A Machine Learning Approach to Identifying Phishing Websites: A Comparative Study of Classification Models and Ensemble Learning Techniques

ICST Transactions on Scalable Information Systems

Phishing assaults are one of the more prevalent types of cybercrime in the world today. To steal information, users are sent emails and messages. Moreover, websites are used for it. Phishing primarily targets corporate web-sites, such as those for e-commerce, finance, and governmental organizations. In order to obtain sensitive user information, attackers impersonate websites, a phenomenon known as phishing. In addition to exploring the use of machine learning algorithms to identify and stop web phishing assaults, this research suggests utilizing machine learning techniques to detect phish-ing URLs by analysing various aspects of the URLs. The study includes classification models like Logistic Regression, Random Forest, Decision trees, KNN, Naive bayes, SVM and other ensemble learning techniques like Gradient Boosting, XGBoost, Histogram Gradient Boosting, Light Gradient Boosting and AdaBoost were used to detect phishing websites.

Detection of Phishing Website using XG – Boost Algorithm

International Journal of Advanced Research in Science, Communication and Technology, 2024

Phishing, a cybercriminal's attempted attack, is a social web-engineering attack in which valuable data or personal information might be stolen from either email addresses or websites. There are many methods available to detect phishing, but new ones are being introduced in an attempt to increase detection accuracy and decrease phishing websites success to steal information. Phishing is generally detected using Machine Learning methods with different kinds of algorithms. In this study, our aim is to use Machine Learning to detect phishing websites. We used the data from Kaggle consisting of 86 features and 11,430 total URLs, half of them are phishing and half of them are legitimate. We trained our data using

An Optimized Stacking Ensemble Model for Phishing Websites Detection

Electronics

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from U...