Phishing Detection Based on Machine Learning and Feature Selection Methods Phishing Detection Based on Machine Learning and Feature Selection Methods (original) (raw)
Related papers
Phishing Detection Based on Machine Learning and Feature Selection Methods
International Journal of Interactive Mobile Technologies (iJIM), 2019
With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with this problem using machine learning algorithms in addition to employing a novel dataset that related to phishing detection, which contains 5000 legitimate web-pages and 5000 phishing ones. In order to obtain the best results, various machine learning algorithms were tested. Then J48, Random forest, and Multilayer perceptron were chosen. Different feature selection tools were employed to the dataset in order to improve the efficiency of the models. The best result of the experiment achieved by utilizing 20 features out of 48 features and applying it to Random forest algorithm. The accuracy was 98.11%.
Phishing Detection: A Hybrid Model with Feature Selection and Machine Learning Techniques
International Journal of Experimental Research and Review, 2023
Various phishing problems increase in cyber space with the progress of information technology. One of the prominent cyber-attacks rooted in social engineering is known as phishing. This malicious activity aims to deceive individuals into divulging sensitive information, including credit card details, login credentials, and passwords. The main importance of this research is finding the best outcome by various machine learning (ML) techniques. This paper uses a Tree Classifier (ETC), Forward Selection, Pearson correlation, Logit-LR model and Principal_Component_Analysis for feature selection. The Logistic_regression (LR), Naïve_Bayes (NB), Decision_Tree (DT), K-Nearest Neighbor (K-NN), Support_Vector_Machine (SVM), Random_Forest (RF), AdaBoost and Bagging classifiers are used for developing the phishing detection model. We have studied the model in four cases. Case 1 has 6 commonly selected features by ET, forward selection and Pearson's correlation, case 2 has 25 features by logit model, case 3 has all features, and case 4 has principal component analysis (3 and 5 components). We find the highest accuracy of 97.3% in case 2 with the random forest model.
Phishing Website Detection Using Effective Classifiers and Feature Selection Techniques
2019
Phishing is a relatively new form of network assault where a web page illegally invokes current users to request financial or personal data or passwords. This act jeopardizes the privacy of many users and consequently, ongoing research has been carried out to find detection tools and to develop existing solutions. Classifiers based on machine learning can be used to detect phishing websites effectively and therefore, various machine learning classification algorithms i.e. Naive Bayes, J48 and HNB are implemented and compared through this research. In addition, performance of a classifier combining HNB and J48 was also closely observed as a solution to the stated problem. The study proposes a novel manual feature selection approach and presents a comparative study with Filter method feature selection techniques. The dataset used in this research is collected from UCI machine learning repository, has 2670 instances and 30 attributes of website structure. The empirical result indicated...
Prediction of phishing websites using machine learning
Spatial Informing Research, 2022
With the growing popularity of the information science, more application is being integrated with websites that can be accessed directly through the internet. This has increased the possibility of attack by ill-legal persons to steal personal information. To identify a phishing assault, several strategies have been presented. However, there is still opportunity for progress in the fight against phishing. The objective of this research paper is to develop a more accurate prediction model using Decision Tree (DT), Random Forest (RF) and Gradient Boosting Classifiers (GBC) with three features selection techniques Extra Tree (ET), Chi-Square and Recursive Feature Elimination (RFE). Since phishing websites dataset contains 89 features, therefore we have applied extra tree and chi-square, feature selection method to identify the limited important features and then recursive features elimination technique has been used to reduce the dataset up-to optimum important features. We have compared the performance of the developed model using machine learning algorithms and find the best prediction performance using GBC, followed by RF and DT. These algorithmic models capture the trends from various cases of phishing with over R-square, Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), in each case.
Phishing Webpage Detection using Feature Selection Methods
Deleted Journal, 2024
Phishing attacks are rapidly taking place around the globe. This makes it vital to have efficient phishing detection methods in place. All the datasets that are available are voluminous generally with a vast number of features. Furthermore, many of the features present are redundant or irrelevant and don't substantially help in determining the final outcome. Therefore, it is necessary to identify those features and eliminate them to help reduce resources & time. This paper proposes two phishing detection techniques wherein one method incorporates ensemble feature reduction method and the other incorporates a feature reduction method based on average weight which help in eliminating irrelevant features and making a compact subset of the features to identify phishing attacks. These two methods are based on correlation, chi square, gain ratio, and information gain. The system uses Random Forest classifier which outperforms the rest of the classifiers. The comparison between both the methods is provided and the best method is determined taking factors like accuracy and computational time into consideration. The Phishing Webpage dataset is taken from Mendeley data.
Feature Selection for Improved Phishing Detection
Lecture Notes in Computer Science, 2012
Phishinga hotbed of multibillion dollar underground economyhas become an important cybersecurity problem. The centralized blacklist approach used by most web browsers usually fails to detect zero-day attacks, leaving the ordinary users vulnerable to new phishing schemes; therefore, learning machine based approaches have been implemented for phishing detection. Many existing techniques in phishing website detection seem to include as many features as can be conceived, while identifying a relevant and representative subset of features to construct an accurate classifier remains an interesting issue in this particular application of machine learning. This paper evaluates correlation-based and wrapper-type feature selection techniques using real-world phishing data sets with 177 initial features. Experiments results show that applying an effective feature selection procedure generally results in statistically significant improvements in the classification accuracies of-among others-Naïve Bayes, Logistic Regression and Random Forests, in addition to improved efficiency in training time.
Detection of Phishing Websites using Machine Learning Techniques
IJCSIS Vol 18 No. 7 July Issue, 2020
Abstract— With the developing interaction of the Internet and public activity, the Internet is taking a gander at how individuals learn and work, however it likewise opens us to raising genuine security dangers. Step by step instructions to perceive different system assaults, especially attacks not seen already, is a key issue that should be unraveled critically. The target of phishing website URLs is to gather the individual data like client's name, passwords and on the web banking exchanges. Phishers use the sites which are outwardly and semantically like those of genuine sites. Since a large portion of the clients go online to get to the administrations given by government and financial foundations, there has been a significant increment in phishing assaults in last few years. Machine learning is a useful asset used to endeavor against phishing assaults. There are a few strategies or ways to deal with identifying phishing sites. The fundamental point of this paper is to execute the framework with high efficiency, exactness and cost effectively. The task is actualized utilizing 4 ML managed classification models. The four classification models are K-Nearest Neighbor, Kernel Support vector machine, Decision tree and Random Forest classifier. It was discovered that the Random Forest classifier is most accurate for the chosen dataset and gives an accuracy score of 96.82%. Keywords- Machine Learning, classification, Cyber security, Phishing, KNN, Kernel SVM, Decision Tree, Random Forest Classifier
IJERT-Detection of Phishing Websites using an Efficient Machine Learning Framework
International Journal of Engineering Research and Technology (IJERT), 2020
https://www.ijert.org/detection-of-phishing-websites-using-an-efficient-machine-learning-framework https://www.ijert.org/research/detection-of-phishing-websites-using-an-efficient-machine-learning-framework-IJERTV9IS050888.pdf Phishing attack is one of the commonly known attack where the information from the internet users are stolen by the intruder. The internet users are losses their sensitive information such as Protected passwords, personal information and their transactions to the intruders. The Phishing attack is normally carried by the attackers where the legitimate frequently used websites are manipulated and masked to gather the personal information of the users. The Intruders use the personal information and can manipulate the transactions and get definite from them. From the literature there are various anti-Phishing websites by the various authors. Some of the techniques are Blacklist or Whitelist and heuristic and visual similarity based methods. In spite of the users using these techniques most of the users are getting attacked by the intruders by means of Phishing to gather their sensitive information. A novel Machine Learning based classification algorithm has been proposed in this paper which uses heuristic features where feature selection can be extracted from the attributes such as Uniform Resource Locator, Source Code, Session, Type of security involve, Protocol used, type of website. The proposed model has been evaluated using five machine learning algorithms such as random forest, K Nearest Neighbor, Decision Tree, Support Vector Machine, Logistic regression. Out of these models, the random forest algorithm performs better with attack detection accuracy of 91.4%. Moreover the Random Forest Model uses orthogonal and oblique classifiers to select the best classifiers for accurate detection of Phishing attacks in the websites.
Phish Catch: Machine Learning Way of Detecting Phishing Websites
2020
With the advent of 4G technology, the internet became available to masses. Everyone started to use internet services in different spheres of their life, making them vulnerable to diverse threats. One of the primary risks for internet users is Phishing Websites. Instead of breaching the security of systems phishing websites try to fool the users and make them give away the credentials which they are not supposed to share with anyone. In this study, we took 21 features and tried to predict their class i.e legitimate or phish using a supervised learning algorithm Index Terms Phishing, Machine Learning, SVM, Decision Tree, Random Forest, Internet, Security
Intelligent Methods for Accurately Detecting Phishing Websites
With increasing technology developments, there is a massive number of websites with varying purposes. But a particular type exists within this large collection, the so-called phishing sites which aim to deceive their users. The main challenge in detecting phishing websites is discovering the techniques that have been used. Where phishers are continually improving their strategies and creating web pages that can protect themselves against many forms of detection methods. Therefore, it is very necessary to develop reliable, active and contemporary methods of phishing detection to combat the adaptive techniques used by phishers. In this paper, different phishing detection approaches are reviewed by classifying them into three main groups. Then, the proposed model is presented in two stages. In the first stage, different machine learning algorithms are applied to validate the chosen dataset and applying features selection methods on it. Thus, the best accuracy was achieved by utilizing only 20 features out of 48 features combined with Random Forest is 98.11%. While in the second stage, the same dataset is applied to various fuzzy logic algorithms. As well the experimental results from the application of Fuzzy logic algorithms were incredible. Where in applying the FURIA algorithm with only five features the accuracy rate was 99.98%. Finally, comparison and discussion of the results between applying machine learning algorithms and fuzzy logic algorithms is done. Where the performance of using fuzzy logic algorithms exceeds the use of machine learning algorithms.