Deep Learning Algorithm Models for Spam Identification on Cellular Short Message Service (original) (raw)
Related papers
SMS Spam Detection Based on Long Short-Term Memory and Gated Recurrent Unit
International Journal of Future Computer and Communication, 2019
An SMS spam is the message that hackers develop and send to people via mobile devices targeting to get their important information. For people who are ignorant, if they follow the instruction in the message and fill their important information, such as internet banking account in a faked website or application, the hacker may get the information. This may lead to loss their wealth. The efficient spam detection is an important tool in order to help people to classify whether it is a spam SMS or not. In this research, we propose a novel SMS spam detection based on the case study of the SMS spams in English language using Natural Language Process and Deep Learning techniques. To prepare the data for our model development process, we use word tokenization, padding data, truncating data and word embedding to make more dimension in data. Then, this data is used to develop the model based on Long Short-Term Memory and Gated Recurrent Unit algorithms. The performance of the proposed models ...
A Model for Filtering Spam SMS Using Deep Machine Learning Technique
IJARCCE, 2021
In recent years, a substantial growth has been experienced in the mobile phone market. A cumulative of 432.1 million mobile gadgets has delivered in the second quarter of 2013 with an increment of 6.0% year over year. As the acquisition of cellphone gadgets has become common, Short Message Service (SMS) has developed into a multi-billion-dollar business. A rush in the quantity of unwanted business notices sent to cell phones utilizing text messages has additionally expanded due to the increased popularity of mobile platforms. This rise attracted attackers, which have resulted in SMS Spam problem. This study presents model for SMS spam filtering classification using Deep Machine Learning Techniques. The system uses the deep machine learning model(MLPNM) in tensorflow and keras framework to classify SMS Message dataset containing 5574 messages. The dataset was read from directory using the pandas.read_csv function. The dataset was cleaned to make sure there are no null values present. The Deep learning model was built with a total of three dense layer with takes in 8672 inputs and 1 output, a batch size (batch size equals the total dataset thus making the iteration and epochs values equivalents) of 32 and epoch value of 50. This trained model was saved and exported into web for easy access and testing with the help of python flask, so that users make various input SMS message. Bootstrap framework (HTML and CSS) was use to design the Front End, while for the Backend, python programming language was use. The results of the test showed accuracy of 99.82% of all input message classified as either ham(legimate) or spam to verify if it's actually a Spam SMS message or a Ham (Legitimate) SMS spam messages.
A Novel Approach for Arabic SMS Spam Detection Using Hybrid Deep Learning Techniques
Procedia Computer Science
Spam detection in SMS communication is a crucial task for maintaining the quality of messaging services and protecting users from unwanted and potentially harmful messages. Arabic SMS spam detection poses unique challenges due to the rich morphology and complex structure of the Arabic language, which can significantly impact the performance of traditional text classification methods. To address these challenges, this paper presents a novel approach for Arabic SMS spam detection using a hybrid deep learning model that combines Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM) networks. The proposed model leverages the strengths of CNNs in capturing local features and patterns in the text and the capability of Bi-LSTM networks to understand long-term dependencies and contextual information. This hybrid architecture is designed to effectively handle the complexities of the Arabic language and improve the accuracy of spam detection. The model was evaluated on a dataset of Arabic SMS messages, consisting of both ham (non-spam) and spam messages. The dataset underwent preprocessing steps, including text cleaning, tokenization, and padding, to prepare it for training the deep learning model. The hybrid model was trained using the Adam optimizer and evaluated using accuracy, precision, recall, and F1 score metrics. Early stopping was implemented to prevent overfitting during the training process. The results demonstrate that the hybrid model achieved high performance, with an accuracy of 0.9699, precision of 0.9739, recall of 0.9675, and an F1 score of 0.9707. These metrics indicate the model's effectiveness in accurately detecting spam in Arabic SMS messages. Additionally, the paper provides visualizations of the confusion matrix, ROC curve, and training-validation loss graph to illustrate the model's performance. The implications of this research are significant for the field of Arabic text classification and spam detection. The proposed hybrid model offers a robust solution for accurately classifying Arabic SMS messages, which can be integrated into messaging platforms to enhance spam detection capabilities and improve user experience. Future work could explore data augmentation techniques, transfer learning, and advanced hybrid architectures to further enhance the model's performance and applicability.
International Journal of Scientific Research in Science, Engineering and Technology, 2022
The widely used and mostly accessible communication medium to reach large volume of users in low cost is the “Short Message Service” i.e. SMS. These communication even though are useful for the advertisements in various sectors like banking, agriculture or even for the governmental schemes but sometimes they create a nuisances for those users which are not intended audience for that message. Some messages even may contain malicious links too. The efforts are proposed to restrict these spam messages using the hybrid mechanism deploying the deep learning and artificial intelligence.
Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques
Procedia Computer Science, 2024
In the digital age, the ubiquity of text messaging has unfortunately paved the way for SMS phishing, or 'smishing,' a deceptive practice where fraudsters dispatch fraudulent messages to extract sensitive information from unsuspecting recipients. This issue is not trivial. It represents a significant threat to both personal privacy and organizational security, leading to potential data breaches and financial repercussions. Against this backdrop, the imperative for advanced detection strategies is undeniable. This survey leverages a systematic review methodology to assess the effectiveness of Natural Language Processing (NLP) and Machine Learning (ML) techniques in the detection of SMS phishing, also known as smishing. By methodically analyzing research spanning various detection strategies, the study illuminates the evolution from basic rule-based frameworks to sophisticated ML algorithms, enriched with NLP for deep analysis. The findings underscore the superior efficacy of combining ML classifiers with NLP, particularly through the deployment of advanced deep learning models like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), which offer unprecedented accuracy in identifying and thwarting complex smishing attacks. The value of this study lies in its comprehensive synthesis of current methodologies and its contribution to the ongoing enhancement of cybersecurity defenses. It serves as a crucial guide for future research directions, emphasizing the necessity of adopting and innovating cutting-edge NLP and ML techniques to stay ahead of evolving digital threats.
Detecting Bengali Spam SMS Using Recurrent Neural Network
Journal of Communications, 2020
SMS is being spammed if the sender sends it to the targeted users to gain important personal information. If targeted users respond with personal information, it will be a great opportunity for the sender to grab their desired goal. Now, this phenomenon increases rapidly and Machine Learning (ML) is mostly used to classify this problem. In terms of Bangladesh, email spam detection is common but detecting SMS spam with the Bengali dataset is completely new as a research problem. This research is taken part to detect Bengali spam SMS using traditional Machine Learning algorithms along with Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Then, the performances of all algorithms are compared to find the best among them. The highest testing accuracy rate is gained by both LSTM and GRU, which is 99%. To the best of our knowledge, this work is the first to apply the deep learning algorithms LSTM and GRU for detecting Bengali spam. Besides, a comparative analysis is performed with some traditional supervised ML algorithms and deep learning algorithms. Moreover, the effects of various activation functions and optimizers are also experimented on LSTM and GRU deep learning algorithms. ADAGRAD optimizer gains the best accuracy over RMSPROP, ADAMAX, ADADELTA and SGD. Finally, the best combinations of deep learning algorithms, activation functions, and optimizers are proposed based on experimental analysis.
Journal of Advances in Mathematics and Computer SciencJournal of Advances in Mathematics and Computer Science, 2023
In the modern era, mobile phones have become ubiquitous, and Short Message Service (SMS) has grown to become a multi-million-dollar service due to the widespread adoption of mobile devices and the millions of people who use SMS daily. However, SMS spam has also become a pervasive problem that endangers users' privacy and security through phishing and fraud. Despite numerous spam filtering techniques, there is still a need for a more effective solution to address this problem [1]. This research addresses the pervasive issue of SMS spam, which poses threats to users' privacy and security. Despite existing spam filtering techniques, the high false-positive rate persists as a challenge. The study introduces a novel approach utilizing Natural Language Processing (NLP) and machine learning models, particularly BERT (Bidirectional Encoder Representations from Transformers), for SMS spam detection and classification. Data preprocessing techniques, such as stop word removal and tokenization, are applied, along with feature extraction using BERT. Machine learning models, including SVM, Logistic Regression, Naive Bayes, Gradient Boosting, and Random Forest, are integrated with BERT for differentiating spam from ham messages. Evaluation results
Spam text classification using LSTM Recurrent Neural Network
International Journal of Emerging Trends in Engineering Research, 2021
Sequence Classification is one of the on-demand research projects in the field of Natural Language Processing (NLP). Classifying a set of images or text into an appropriate category or class is a complex task that a lot of Machine Learning (ML) models fail to accomplish accurately and end up under-fitting the given dataset. Some of the ML algorithms used in text classification are KNN, Naïve Bayes, Support Vector Machines, Convolutional Neural Networks (CNNs), Recursive CNNs, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), etc. For this experimental study, LSTM and a few other algorithms were chosen for a more comparative study. The dataset used is the SMS Spam Collection Dataset from Kaggle and 150 more entries were additionally added from different sources. Two possible class labels for the data points are spam and ham. Each entry consists of the class label, a few sentences of text followed by a few useless features that are eliminated. After converting the text to the required format, the models are run and then evaluated using various metrics. In experimental studies, the LSTM gives much better classification accuracy than the other machine learning models. F1-Scores in the high nineties were achieved using LSTM for classifying the text. The other models showed very low F1-Scores and Cosine Similarities indicating that they had underperformed on the dataset. Another interesting observation is that the LSTM had reduced the number of false positives and false negatives than any other model.
Performance Evaluation of LSTM and RNN Models in the Detection of Email Spam Messages
European Journal of Information Technologies and Computer Science, 2022
Email spam is an unwanted bulk message that is sent to a recipient's email address without explicit consent from the recipient. This is usually considered a means of advertising and maximizing profit, especially with the increase in the usage of the internet for social networking but can also be very frustrating and annoying to the recipients of these messages. Recent research has shown that about 14.7 billion spam messages are sent out every single day of which more than 45% of these messages are promotional sales content that the recipient did not specifically opt-in. This has gotten the attention of many researchers in the area of natural language processing. In this paper, we used the Long Short-Time Memory (LSTM) for classification tasks between spam and Ham messages. The performance of LSTM is compared with that of a Recurrent Neural Network (RNN) which can also be used for a classification task of this nature but suffers from short-time memory and tends to leave out important information from earlier time steps to later ones in terms of prediction. The evaluation of the result shows that LSTM achieved 97% accuracy with both Adams and RMSprop optimizers compared to RNN with an accuracy of 94% with RMSprop and 87% accuracy with Adams optimizer.