Phish Catch: Machine Learning Way of Detecting Phishing Websites (original) (raw)

Prediction of phishing websites using machine learning

Spatial Informing Research, 2022

With the growing popularity of the information science, more application is being integrated with websites that can be accessed directly through the internet. This has increased the possibility of attack by ill-legal persons to steal personal information. To identify a phishing assault, several strategies have been presented. However, there is still opportunity for progress in the fight against phishing. The objective of this research paper is to develop a more accurate prediction model using Decision Tree (DT), Random Forest (RF) and Gradient Boosting Classifiers (GBC) with three features selection techniques Extra Tree (ET), Chi-Square and Recursive Feature Elimination (RFE). Since phishing websites dataset contains 89 features, therefore we have applied extra tree and chi-square, feature selection method to identify the limited important features and then recursive features elimination technique has been used to reduce the dataset up-to optimum important features. We have compared the performance of the developed model using machine learning algorithms and find the best prediction performance using GBC, followed by RF and DT. These algorithmic models capture the trends from various cases of phishing with over R-square, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), in each case.

Machine Learning-Based Phishing Attack Detection

International Journal of Advanced Computer Science and Applications, 2020

This paper explores machine learning techniques and evaluates their performances when trained to perform against datasets consisting of features that can differentiate between a Phishing Website and a safe one. This capability of telling these sites apart from one another is vital in the modernday internet surfing. As more and more of our resources shift online, one vulnerability and a leak of sensitive information by someone could bring everything down in a connected network. This paper's objective through this research is to highlight the best technique for identifying one of the most commonly occurring cyberattacks and thus allow faster identification and blacklisting of such sites, therefore leading to a safer and more secure web surfing experience for everyone. To achieve this, we describe each of the techniques we look into in great detail and use different evaluation techniques to portray their performance visually. After pitting all of these techniques against each other, we have concluded with an explanation in this paper that Random Forest Classifier does indeed work best for Phishing Website Detection.

Machine Learning-Based Phishing Detection

IRJET, 2023

Millions of users have been successfully connected globally by the internet today, and as a result, users' reliance on this platform for data browsing, online transactions, and information downloads has grown. Cybersecurity is a term for a collection of technologies and procedures used to safeguard software and hardware against intrusion, harm, and attacks. DoS attacks, Man-inthe-Middle attacks, Phishing attacks, SQL Injection attacks, etc. are some of the most often seen cybersecurity threats. There has been an uptick in consumers losing access to their very sensitive and private information over the past few years. These days, fraudsters utilise such methods to trick their victims in an effort to steal personal information including their username, password, bank account information, and credit card information. Attacks against users are frequently delivered via spoofing emails, illegal websites, malware, etc. To handle complicated and massive amounts of data, a structured automated technique is necessary. The most common and effective approach that can be used to address this issue is machine learning, according to research. The most widely used machine learning methods include neural networks, decision trees, logistic regression, and support vector machines (SVM). A group of deep learning and machine learning models will be trained in this study to identify phishing websites.

Phishing Detection with Machine Learning

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

The goal of our project is to implement a machine learning solution to the problem of detecting phishing and malicious web links. The end result of our project will be a software product which uses a machine learning algorithm to detect malicious URLs. Phishing is the technique of extracting user credentials and sensitive data from users by masquerading as a genuine website. In phishing, the user is provided with a mirror website which is identical to the legitimate one but with malicious code to extract and send user credentials to phishers. Phishing attacks can lead to huge financial losses for customers of banking and financial services. The traditional approach to phishing detection has been to either to use a blacklist of known phishing links or heuristically evaluate the attributes in a suspected phishing page to detect the presence of malicious codes. The heuristic function relies on trial and error to define the threshold, which is used to classify malicious links from benign ones. The drawback to this approach is poor accuracy and low adaptability to new phishing links. We plan to use machine learning to overcome these drawbacks by implementing some classification algorithms and comparing the performance of these algorithms on our dataset. We will test algorithms such as Logistic Regression, SVM, Decision Trees and Neural Networks on a dataset of phishing links from UCI Machine Learning repository and pick the best model to develop a browser plugin, which can be published as a browser extension.

Phishing Detection Based on Machine Learning and Feature Selection Methods Phishing Detection Based on Machine Learning and Feature Selection Methods

With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with this problem using machine learning algorithms in addition to employing a novel dataset that related to phishing detection, which contains 5000 legitimate web-pages and 5000 phish-ing ones. In order to obtain the best results, various machine learning algorithms were tested. Then J48, Random forest, and Multilayer perceptron were chosen. Different feature selection tools were employed to the dataset in order to improve the efficiency of the models. The best result of the experiment achieved by utilizing 20 features out of 48 features and applying it to Random forest algorithm. The accuracy was 98.11%.

Phishing Website Detection Using Machine Learning: A Review

Wasit Journal for Pure sciences

Phishing, a form of cyber attack in which perpetrators employ fraudulent websites or emails to Deceive individuals into divulging sensitive information such as passwords or financial data, can be mitigated through various machine-learning algorithms for website detection. These algorithms, including decision trees, support vector machines, and Random Forest, analyze multiple website features, such as URL structure, website content, and the presence of specific keywords or patterns, to ascertain the likelihood of a website being a phishing site. This comprehensive review elucidates the concept of phishing website detection and the diverse techniques employed while summarizing previous studies, their outcomes, and their contributions. Overall, machine learning algorithms serve as a potent tool in the identification of phishing websites, thereby safeguarding users against falling prey to such malicious attacks.

A comparison study of machine learning techniques for phishing detection

Journal of Business and Information Systems (e-ISSN: 2685-2543)

In the last few years, phishing attacks have been increasing eventually. As the internet is developing, security for it is becoming a challenging task. Cyber-attacks and threats are increasing rapidly. These days many fake websites are created to deceive victims by collecting their login credentials, bank details, etc. Many anti-phishing products are launched into the market and use blacklists, heuristics, and visual and machine learning-based approaches, these products cannot prevent all the phishing attacks. However, unlike predicting phishing URLs, there are only a few studies that compare machine learning techniques in predicting phishing. The present study compares the predictive accuracy of several machine learning methods including Decision tree, Random forest, Multilayer Perceptions, Support Vector Machines, and XGBoost for predicting phishing URLs.

Detection of Phishing Websites using Machine Learning Techniques

IJCSIS Vol 18 No. 7 July Issue, 2020

Abstract— With the developing interaction of the Internet and public activity, the Internet is taking a gander at how individuals learn and work, however it likewise opens us to raising genuine security dangers. Step by step instructions to perceive different system assaults, especially attacks not seen already, is a key issue that should be unraveled critically. The target of phishing website URLs is to gather the individual data like client's name, passwords and on the web banking exchanges. Phishers use the sites which are outwardly and semantically like those of genuine sites. Since a large portion of the clients go online to get to the administrations given by government and financial foundations, there has been a significant increment in phishing assaults in last few years. Machine learning is a useful asset used to endeavor against phishing assaults. There are a few strategies or ways to deal with identifying phishing sites. The fundamental point of this paper is to execute the framework with high efficiency, exactness and cost effectively. The task is actualized utilizing 4 ML managed classification models. The four classification models are K-Nearest Neighbor, Kernel Support vector machine, Decision tree and Random Forest classifier. It was discovered that the Random Forest classifier is most accurate for the chosen dataset and gives an accuracy score of 96.82%. Keywords- Machine Learning, classification, Cyber security, Phishing, KNN, Kernel SVM, Decision Tree, Random Forest Classifier

Efficient prediction of phishing websites using supervised learning algorithms

Procedia Engineering, 2012

Phishing is one of the luring techniques used by phishing artist in the intention of exploiting the personal details of unsuspected users. Phishing website is a mock website that looks similar in appearance but different in destination. The unsuspected users post their data thinking that these websites come from trusted financial institutions. Several antiphishing techniques emerge continuously but phishers come with new technique by breaking all the antiphishing mechanisms. Hence there is a need for efficient mechanism for the prediction of phishing website. This paper employs Machine-learning technique for modelling the prediction task and supervised learning algorithms namely Multi layer perceptron, Decision tree induction and Naïve bayes classification are used for exploring the results. It has been observed that the decision tree classifier predicts the phishing website more accurately when comparing to other learning algorithms.

Phishing Websites Detection using Machine Learning

International Journal of Advanced Computer Science and Applications

Tremendous resources are spent by organizations guarding against and recovering from cybersecurity attacks by online hackers who gain access to sensitive and valuable user data. Many cyber infiltrations are accomplished through phishing attacks where users are tricked into interacting with web pages that appear to be legitimate. In order to successfully fool a human user, these pages are designed to look like legitimate ones. Since humans are so susceptible to being tricked, automated methods of differentiating between phishing websites and their authentic counterparts are needed as an extra line of defense. The aim of this research is to develop these methods of defense utilizing various approaches to categorize websites. Specifically, we have developed a system that uses machine learning techniques to classify websites based on their URL. We used four classifiers: the decision tree, Naïve Bayesian classifier, support vector machine (SVM), and neural network. The classifiers were tested with a data set containing 1,353 real world URLs where each could be categorized as a legitimate site, suspicious site, or phishing site. The results of the experiments show that the classifiers were successful in distinguishing real websites from fake ones over 90% of the time.