TF-IDF Feature-Based Spam Filtering of Mobile SMS Using Machine Learning Approach (original) (raw)

SMS Spam Detection Using Machine Learning

Background: The number of people using mobile phones is increasing; hence, the SMS messages are also increasing day by day. Correspondingly, SMS spam messages and spam email are also increasing, as is SMS spam detection, such as limited message size, use of local and shortcut words, and incomplete slogan information. These challenges need to be solved. Objectives: Efficient spam detection is an important tool in order to help people classify whether something is spam or not. In order to construct a model that is capable of distinguishing between legitimate and malicious Android applications, it provides a systematic approach to managing safety risks in operations. Methods: In this paper, we applied various machine learning techniques for SMS spam detection using the SVM algorithm. And Naïve Bayes has many more algorithms and preprocessing methods used for used datasets for the detection of SMS spam. Statistical Analysis: The SVM algorithm and CNN algorithm are used for predicting the detection of spam in the given datasets in the model, as well as information about spammers. Applications: It also shows the details of the spammers whose spam messages were sent to end users. Improvements: The paper ends with implications and suggestions for future research and more datasets are processed to get better spam details and information.

A Comparative Analysis of SMS Spam Detection Employing Machine Learning Methods

IEEE, 2022

In recent times, the increment of mobile phone usage has resulted in a huge number of spam messages. Spammers continuously apply more and more new tricks that cause managing or preventing spam messages a challenging task. The aim of this study is to detect spam message to prevent different cybercrimes as spam messages have become a security threat nowadays. In this paper, we contributed to previous studies on SMS spam problems to perform a better accuracy using several different techniques such as Support Vector Machine, K-Nearest Neighbor, Naïve Bayes, Random Forest, Logistic Regression and some more. Our result indicated that Support Vector Machine achieved the highest accuracy of 99%, indicating it might be useful as an effective machine learning system for future research.

Contributions to the study of sms spam filtering: new collection and results

2011

The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. In practice, fighting mobile phone spam is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. On the other hand, in academic settings, a major handicap is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, as SMS messages are fairly short, content-based spam filters may have their performance degraded. In this paper, we offer a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we compare the performance achieved by several established machine learning methods. The results indicate that Support Vector Machine outperforms other evaluated classifiers and, hence, it can be used as a good baseline for further comparison.

Towards SMS Spam Filtering: Results under a New Dataset

International Journal of Information Security Science, 2013

The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. Recent reports clearly indicate that the volume of mobile phone spam is dramatically increasing year by year. In practice, fighting such plague is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. Probably, one of the major concerns in academic settings is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, traditional content-based filters may have their performance seriously degraded since SMS messages are fairly short and their text is generally rife with idioms and abbreviations. In this paper, we present details about a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we offer a comprehensive analysis of such dataset in order to ensure that there are no duplicated messages coming from previously existing datasets, since it may ease the task of learning SMS spam classifiers and could compromise the evaluation of methods. Additionally, we compare the performance achieved by several established machine learning techniques. In summary, the results indicate that the procedure followed to build the collection does not lead to near-duplicates and, regarding the classifiers, the Support Vector Machines outperforms other evaluated techniques and, hence, it can be used as a good baseline for further comparison.

Spam Detection In Sms Using Machine Learning Through Text Mining

International Journal of Scientific & Technology Research, 2020

The development of the cell phone clients has prompted a sensational increment in SMS spam messages. Despite the fact that in many parts of the world, versatile informing channel is right now viewed as "spotless" and trusted, on the complexity ongoing reports obviously show that the volume of cell phone spam is drastically expanding step by step. It is a developing mishap particularly in the Middle East and Asia. SMS spam separating is a similarly late errand to arrangement such an issue. It acquires numerous worries and convenient solutions from SMS spam separating. Anyway it fronts its own specific issues and issues. This paper moves to deal with the undertaking of sifting versatile messages as Ham or Spam for the Indian Users by adding Indian messages to the overall accessible SMS dataset. The paper examinations distinctive machine learning classifiers on vast corpus of SMS messages for individuals.

SMS SPAM FILTERING FOR MODERN MOBILE DEVICES

This work examines solutions to the growing problem of spam and fraudulent messages that are prevalent in the mobile phone industry today. It begins with an examination of some common methods for detecting spam messages such as: the Rule-based method and the Statistical Learning method using Naive Bayes approach. This work specifically explores Naive-Bayes classifier for categorizing messages based on their resemblance with words that feature in other spam and non-spam messages in the training set, thereby reducing the number of spams that get through to the end user and completely eliminate false positives (messages that are misclassified as spam). Incorporated in the dataset for this project is the SMS Spam Corpus v.0.1 Big. It has 1,002 SMS ham (legitimate) messages and 322 spam messages. For the initial training and testing of the Naive-Bayes classifier, Python 2.7 interpreter, Sublime Text text-editor, Plotly for data visualization and comparing results were used while Java Development Kit, Android SDK, Android Studio and Android Emulator were used for deployment. It can be concluded that using a spam threshold of 0.7 along with adjustments to the Naive Bayes algorithm, we obtained some desirable results. In an attempt to improve on the method used in this work, we are currently working on how to use hybridized machine learning algorithms for detecting Spam messages in mobile devices.

Spam SMS Filtering Using Support Vector Machines

Intelligent Data Communication Technologies and Internet of Things, 2021

In recent years, SMS spam messages are increasing exponentially due to the increase in mobile phone users. Also, there is a yearly increment in the volume of mobile phone spam. Filtering the spam message has become a key aspect. On the other side, machine learning has become an attractive research area and shown the capacity in data analysis. So, in this paper, two popular algorithms named Naive Bayes and support vector machine are applied to SMS data. The SMS dataset is considered from Kaggle resource. The detailed result analysis is presented. Accuracy of 96.19% and 98.79% is noticed for the chosen algorithms, respectively, for spam SMS detection.

A Survey of Emerging Techniques in Detecting SMS Spam

Transactions on Machine Learning and Artificial Intelligence, 2019

This In the past years, spammers have focused their attention on sending spam through short messages services (SMS) to mobile users. They have had some success because of the lack of appropriate tools to deal with this issue. This paper is dedicated to review and study the relative strengths of various emerging technologies to detect spam messages sent to mobile devices. Machine Learning methods and topic modelling techniques have been remarkably effective in classifying spam SMS. Detecting SMS spam suffers from a lack of the availability of SMS dataset and a few numbers of features in SMS. Various features extracted and dataset used by the researchers with some related issues also discussed. The most important measurements used by the researchers to evaluate the performance of these techniques were based on their recall, precision, accuracies and CAP Curve. In this review, the performance achieved by machine learning algorithms was compared, and we found that Naive Bayes and SVM produce effective performance.

Machine Learning Sms Spam Detection Model

2020

Millions of shillings are lost by mobile phone users every year in Kenya due to SMs Spam, a social engineering skill attempting to obtain sensitive information such as passwords, Personal identification numbers and other details by masquerading as a trustworthy entity in an electronic commerce. The design of efficient fraud detection algorithm and techniques is key to reducing these losses. Fraud detection using machine learning is a new approach of detecting fraud especially in Mobile commerce. The design of fraud detection techniques in a mobile platform is challenging due to the non-stationary distribution of the data. Most machine learning techniques especially in SMs Spam deal with one language. It is in this background that the study will focus on a client side SMs Spam detection in Kenya’s mobile using machine learning. Naive’s Bayes algorithm was used for this purpose because it is highly scalable in text classification. The contributors of Corpus include mobile service prov...

Mobile SMS Spam Detection using Machine Learning Techniques

Journal of emerging technologies and innovative research, 2018

Spam SMS be unwanted messages to users, which be worrying and from time to time damaging. present be a group of survey papers available on SMS spam detection techniques. study and reviewed their used techniques, approaches and algorithms, their advantages and disadvantages, evaluation measures, discussion on datasets as well as lastly end result judgment of the studies. even though, the SMS spam detection techniques are additional demanding than SMS spam detection techniques since of the local contents, use of shortened words, unluckily not any of the existing research addresses these challenges. There is a enormous scope of upcoming research in this region and this survey can act as a reference point for the upcoming direction of research.