Spam Filtering Research Papers - Academia.edu (original) (raw)
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary discussion in the study background examines the applications of machine learning techniques to the email spam filtering process of the leading internet service providers (ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email spam filtering process, and the various efforts by different researchers in combating spam through the use machine learning techniques was done. Our review compares the strengths and drawbacks of existing machine learning approaches and the open res...
Classification is considered as one of the building blocks in data mining problem and the major issues concerning data mining in large databases are efficiency and scalability. In this paper we propose a data classification method... more
Classification is considered as one of the
building blocks in data mining problem and the major
issues concerning data mining in large databases are
efficiency and scalability. In this paper we propose a data
classification method using AVL trees enhances the
quality and stability of data mining problems.
Researchers from various disciplines such as statistics,
machine learning, pattern recognition, and data mining
considered the issue of growing a decision tree from
available data. Specifically, we consider a scenario in
which we apply the multi level mining method on the
data set and show how the proposed approach tend to
give the efficient multiple level classifications of large
amounts of data. The results specify that performance
evaluation of the proposed algorithm that uses the
algorithm to acquire designing rule from the knowledge
database is discussed in the paper.
- by Stephen B Joseph and +1
- •
- Machine Learning, Data Mining, Mobile Technology, Mobile Computing
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary discussion in the study background examines the applications of machine learning techniques to the email spam filtering process of the leading internet service providers (ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email spam filtering process, and the various efforts by different researchers in combating spam through the use machine learning techniques was done. Our review compares the strengths and drawbacks of existing machine learning approaches and the open res...
Unsolicited communications currently accounts for over sixty percent of all sent e-mail with projections reaching the mid-eighties. While much spam is innocuous, a portion is engineered by criminals to prey upon, or scam, unsuspecting... more
Unsolicited communications currently accounts for over sixty percent of all sent e-mail with projections reaching the mid-eighties. While much spam is innocuous, a portion is engineered by criminals to prey upon, or scam, unsuspecting people. The senders of scam spam attempt to mask their messages as non-spam and con through a range of tactics, including pyramid schemes, securities fraud, and identity theft via phisher mechanisms (e.g. faux PayPal or AOL websites). To lessen the suspicion of fraudulent activities, scam messages sent by the same individual, or collaborating group, augment the text of their messages and assume an endless number of pseudonyms with an equal number of different stories. In this paper, we introduce ScamSlam, a software system designed to learn the underlying number criminal cells perpetrating a particular type of scam, as well as to identify which scam spam messages were written by which cell. The system consists of two main components; 1) a filtering mec...
Fake checks are one of the most common instruments used to commit fraud against consumers. This fraud is particularly costly for victims, since they generally loose thousands of dollars as well as being exposed to judicial proceedings.... more
Fake checks are one of the most common instruments used to commit fraud against consumers. This fraud is particularly costly for victims, since they generally loose thousands of dollars as well as being exposed to judicial proceedings. Currently, there is no existing solution to authenticate checks and detect fake ones instantly. Instead, banks must wait for a period of more than 48 hours to detect the scam. In this context, we propose a block chain based scheme to authenticate checks. More precisely, our approach helps the banks to share information about provided checks without exposing the banks' customers' personal data. A proof of concept of our scheme was realized using Python language and relying on Name coin block chain.
There are many social media and messengers in use today, because of the situation with the corona virus pandemic the social media have become an integral part of our daily lives, including work activities. However, there is a lot of... more
There are many social media and messengers in use today, because of the situation with the corona virus pandemic the social media have become an integral part of our daily lives, including work activities. However, there is a lot of unnecessary information that comes to users in large quantities, so the problem of dealing with spam messages on social networks and messengers is now very relevant. By spam we mean any messages that a particular user (person, company, etc.) considers unnecessary in a particular text stream. The project is dedicated to solving the scientific problem of detecting spam messages in the text context of any social network or messenger using anti-spam bot that is based on various spam detection algorithms. Four algorithms were implemented and investigated: an algorithm using naive Bayesian classifier, support vector method, multilayer perceptron neural network and convolutional neural network. The main idea is to develop a complex spam detection algorithm for ...
This chapter is an attempt to pull together the three contexts sketched above: the multitude of practices experienced in one’s infra-ordinary use of social media; the sensationalist narratives mustered under buzzwords throughout news... more
This chapter is an attempt to pull together the three contexts sketched above: the multitude of practices experienced in one’s infra-ordinary use of social media; the sensationalist narratives mustered under buzzwords throughout news media narratives; and the scholarly representation of a variety of practices often described as deceiving, confrontational, offensive, negative, disruptive, abusive, unethical, non-normative, deviant or antisocial. A reader might be expected to ask: What is trolling? Why do some social media users troll others? Is trolling good or bad? How can trolling be stopped? Unfortunately, my contribution is not meant to answer any of these questions. As I summarize in the first half of this chapter, more than twenty years of research into problematic social media practices have already tackled many of these interrogatives, providing a wide variety of comprehensive answers, and producing a rich and detailed cartography of the possibilities of academic approaches to trolling. The broader theoretical question that this chapter addresses is, rather: Where next? How can we write about trolling, and about other problematic social media practices, while avoiding both the oversimplification of popular media narratives and the overdetermination that results from academic treatments?
The profitability promoted by Google in its well-known video distribution platform YouTube has attracted an increasing number of users. However, such success has also attracted a large number of malicious users, which aim to self-promote... more
The profitability promoted by Google in its well-known video distribution platform YouTube has attracted an increasing number of users. However, such success has also attracted a large number of malicious users, which aim to self-promote their videos or circulate viruses and malware. As we know that YouTube offers limited tools for comment moderation, so spam increases very rapidly and that's why the comment section of the owners is disabled. It is very difficult to established classification methods for automatic spam filtering since the messages are very short and often widespread with slangs, symbols, and abbreviations. In this paper, we have evaluated several top-performance classification techniques for detecting and analyzing spam comments. The statistical analysis of results indicates that, with 99.9% of confidence level, decision trees, logistic regression, Bernoulli Naive Bayes, random forests, linear and Gaussian SVMs are statistically equivalent in maximum rate. Therefore, it is very important to find a way to detect these comments on videos and report them before they are viewed by innocent users.
In this research, we propose a methodology for advert value calculation in CPM, CPC and CPA networks. Accurately estimating this value increases the three previous networks’ incomes by selecting the most profitable advert. By increasing... more
In this research, we propose a methodology for advert value calculation in CPM, CPC and CPA networks. Accurately estimating this value increases the three previous networks’ incomes by selecting the most profitable advert. By increasing income, publishers are better paid and improved services are afforded to advertisers. To develop this methodology, we propose a system based on traditional Machine Learning methods and Deep Learning methods. The system has two inputs and one output. The inputs are the user visit and the data about the advertiser. The output is the advert value expressed in dollars. Deep Learning predicts model behavior more precisely for many supervised problems. The three experiments carried out allow us to conclude that DL is a supervised method that is very efficient in the classification of spam adverts and in the estimation of the CTR. In the prediction of online sales, DLNN have shown, on average, worse performance than cubist and random forest methods, although better performance than model tree, model rules and linear regression methods.
- by Dafne Rosso Pelayo, PhD and +1
- •
- Machine Learning, Online Advertising, CPC, Deep Learning
— The use of Internet and its related services is increasing day by day. Many million people everyday surf net and use it for various reasons. With so much use of internet, the threats related to security are the major concern of today.... more
— The use of Internet and its related services is increasing day by day. Many million people everyday surf net and use it for various reasons. With so much use of internet, the threats related to security are the major concern of today. There are many security concerns or threats faced by the net surfers and that is because of malwares which have many forms such as viruses, worms, trojans horses, rootkits, botnets and various other forms of data attacks. Among all the threats mentioned above, botnet seems to be quite prevalent now days. It has already spread its roots in Wide Area Network (WAN) such as Internet and continuously spreading at very high pace. Botnet is a network of computers where the computers are infected by installing in them a harmful program. Each computer as a part of Botnet is called a bot or zombie. A Botnet is remotely controlled by a person who commands and controls the bots through a server called command and control sever(C&C). Such person who commands the bots is called a botmaster or bot herder. This paper is written to serve the objective to perform an extensive study of core problem that is the study and detection of Botnets.This paper focuses on the study of malwares where special emphasis is put on botnets and their detection.
Under short messaging service (SMS) spam is understood the unsolicited or undesired messages received on mobile phones. These SMS spams constitute a veritable nuisance to the mobile subscribers. This marketing practice also worries... more
Under short messaging service (SMS) spam is understood the unsolicited or undesired messages received on mobile phones. These SMS spams constitute a veritable nuisance to the mobile subscribers. This marketing practice also worries service providers in view of the fact that it upsets their clients or even causes them lose subscribers. By way of mitigating this practice, researchers have proposed several solutions for the detection and filtering of SMS spams. In this paper, we present a review of the currently available methods, challenges and future research directions on spam detection techniques, filtering and mitigation of mobile SMS spams. The existing research literature is critically reviewed and analysed. The most popular techniques for SMS spam detection, filtering and mitigation are compared, including the used datasets, their findings and limitations, and the future research directions are discussed. This review is designed to assist expert researchers to identify open areas that need further improvement.
Satire is an attractive subject in deception detection research: it is a type of deception that intentionally incorporates cues revealing its own deceptiveness. Whereas other types of fabrications aim to instill a false sense of truth in... more
Satire is an attractive subject in deception detection research: it is a type of deception that intentionally incorporates cues revealing its own deceptiveness. Whereas other types of fabrications aim to instill a false sense of truth in the reader, a successful satirical hoax must eventually be exposed as a jest. This paper provides a conceptual overview of satire and humor, elaborating and illustrating the unique features of satirical news, which mimics the format and style of journalistic reporting. Satirical news stories were carefully matched and examined in contrast with their legitimate news counterparts in 12 contemporary news topics in 4 domains (civics, science, business, and “soft” news). Building on previous work in satire detection, we proposed an SVM-based algorithm, enriched with 5 predictive features (Absurdity, Humor, Grammar, Negative Affect, and Punctuation) and tested their combinations on 360 news articles. Our best predicting feature combination (Absurdity, Grammar and Punctuation) detects satirical news with a 90% precision and 84% recall (F-score=87%). Our work in algorithmically identifying satirical news pieces can aid in minimizing the potential deceptive impact of satire. [Note: The associated dataset of the Satirical and Legitimate News, S-n-L News DB 2015-2016, is available via http://victoriarubin.fims.uwo.ca/news-verification/ . The set is password-protected to avoid automated harvesting. Please feel free to request the password, if you are interested.]
Artificial Intelligence (AI), in combination with the Internet of Things (IoT), called (AIoT), an emerging trend in industrial applications, is capable of intelligent decision-making with self-driven analytic. With its extensive usage in... more
Artificial Intelligence (AI), in combination with the Internet of Things (IoT), called (AIoT), an emerging trend in industrial applications, is capable of intelligent decision-making with self-driven analytic. With its extensive usage in diverse scenarios, IoT devices generate bulk data that gets contrived by attackers to disrupt normal operations and services. Hence, there is a daring need for proactive data analyses that must prevent cyber-attacks and crimes. To investigate crimes involving Electronic Mail (email), analysis of both the header and the email body is required since the semantics of communication helps to identify the source of potential evidence. With the continued growth of data shared via emails, investigators now face the daunting challenge of extracting the required semantic information from the bulks of emails, thereby causing a delay in the investigation process. This gives an edge to the criminal in erasing their footprints of malicious acts. The existing keyw...
Email spam is one of the major challenges faced daily by every email user in the world. On a daily basis email users receive hundreds of spam mails having a new content, from anonymous addresses which are automatically generated by robot... more
Email spam is one of the major challenges faced daily by every email user in the world. On a daily basis email users receive hundreds of spam mails having a new content, from anonymous addresses which are automatically generated by robot software agents. The traditional methods of spam filtering such as black lists and white lists using (domains, IP addresses, mailing addresses) have proven to be grossly ineffective in curtailing the menace of spam messages. This have brought afore the need for the invention of highly reliable email spam filters. Of recent, machine learning approach have been successfully applied in detecting and filtering spam emails. This paper proposes the use of random forest machine learning algorithm for efficient classification of email spam messages. The main purpose is to develop a spam email filter with better prediction accuracy and less numbers of features. From the Enron public dataset consisting of 5180 emails of both ham, spam and normal emails, a set of prominent spam email features (from the literatures) were extracted and applied by the random forests algorithm with a resultant classification accuracy of 99.92%, very low false positive rate (0.01) and very high true positive rate of 0.999. All experiments are conducted on WEKA data mining and machine learning simulation environment.
With over a billion Internet users surfing the Web daily in search of information, buying, selling and accessing social networks, marketers focus intensively on developing websites that are appealing to both the searchers and the search... more
With over a billion Internet users surfing the Web daily in search of information, buying, selling and accessing social networks, marketers focus intensively on developing websites that are appealing to both the searchers and the search engines. Millions of webpages are submitted each day for indexing to search engines. The success of a search engine lies in its ability to provide accurate search results. Search engines’ algorithms constantly evaluate websites and webpages that could violate their respective policies. For this reason some websites and webpages are subsequently blacklisted from their index. Websites are increasingly being utilised as marketing tools, which result in major competition amongst websites. Website developers strive to develop websites of high quality, which are unique and content rich as this will assist them in obtaining a high ranking from search
engines. By focusing on websites of a high standard, website developers utilise search
engine optimisation (SEO) strategies to earn a high search engine ranking. From time to time SEO practitioners abuse SEO techniques in order to trick the search engine algorithms, but the algorithms are programmed to identify and flag these techniques as spamdexing. Search engines do not clearly explain how they interpret keyword stuffing (one form of spamdexing) in a webpage. However, they regard spamdexing in many different ways and do not provide enough detail to clarify what crawlers take into consideration when interpreting the spamdexing status of a website. Furthermore, search engines differ in the way that they interpret spamdexing, but offer no clear quantitative evidence for the crossover point of keyword dense website text to spamdexing. Scholars have indicated different views in respect of spamdexing, characterised by different keyword density measurements in the
body text of a webpage. This raised several fundamental questions that form the basis of this research. This research was carried out using triangulation in order to determine how the scholars, search engines and SEO practitioners interpret spamdexing. Five websites with varying keyword densities were designed and submitted to Google, Yahoo! and Bing. Two phases of the experiment were done and the results were recorded. During both phases almost all of the webpages, including the one with a 97.3% keyword density, were indexed. The aforementioned enabled this research to conclusively disregard the keyword stuffing issue, blacklisting and any form of penalisation. Designers are urged to rather concentrate on usability and good values behind building a website. The research explored the fundamental contribution of keywords to webpage indexing and visibility. Keywords used with or without an optimum level of measurement of richness and poorness result in website ranking and indexing. However, the focus should be on the way in which the end user would interpret the content displayed, rather than how the search engine would react towards the content. Furthermore, spamdexing is likely to scare away potential clients and end users instead of embracing them, which is why the time spent on spamdexing should rather be used to produce quality content.
- by Melius Weideman and +1
- •
- Search Engine Optimisation, Websites, Spam Filtering
Today very important means of communication is the e-mail that allows people all over the world to communicate, share data, and perform business. Yet there is nothing worse than an inbox full of spam; i.e., information crafted to be... more
Today very important means of communication is the e-mail that allows people all over the
world to communicate, share data, and perform business. Yet there is nothing worse than an
inbox full of spam; i.e., information crafted to be delivered to a large number of recipients
against their wishes. In this paper, we present a numerous anti-spam methods and solutions that
have been proposed and deployed, but they are not effective because most mail servers rely on
blacklists and rules engine leaving a big part on the user to identify the spam, while others rely
on filters that might carry high false positive rate.
Artykuł ten wydrukowany był w czasopiśmie ComputerWorld, nr 37, 1999. Przedstawia on wczesny etap rozwoju społeczeństwa informacyjnego i wykryte wtedy zagrożenia tego rozwoju , które zostały opisane metaforą „smogu informacyjnego” .... more
Artykuł ten wydrukowany był w czasopiśmie ComputerWorld, nr 37, 1999. Przedstawia on wczesny etap rozwoju społeczeństwa informacyjnego i wykryte wtedy zagrożenia tego rozwoju , które zostały opisane metaforą „smogu informacyjnego” .
Społeczeństwo informacyjne jeszcze nie powstało i nikt nie wie, czym ono będzie. Ale proces jego formowania trwa i ten fakt już teraz stwarza problemy. Co ciekawe - są to problemy bardzo podobne do tych, jakie zrodziła rewolucja techniczna na początku XIX w. Wtedy ubocznym skutkiem industrializacji było zanieczyszczenie środowiska. Kolejna rewolucja techniczna produkuje zanieczyszczenie przestrzeni informacyjnej. Czy potrafimy zwalczyć to zagrożenie, czy też pochłonie nas uboczny produkt globalizacji informacyjnej, który ukrywa się pod terminem smog informacyjny?
The subject of this research is the development of the architecture of expert system for distributed content aggregation system, the main purpose of which is the categorization of aggregated data. The author examines the advantages and... more
The subject of this research is the development of the architecture of expert system for distributed content aggregation system, the main purpose of which is the categorization of aggregated data. The author examines the advantages and disadvantages of expert systems, toolset for development of expert systems, classification of expert systems, as well as application of expert systems for categorization of data. Special attention is given to the description of architecture of the proposed expert system, which consists of spam filter, component for determination of the main category for each type of the processed content, and components for determination of subcategories, one of which is based on the domain rules, and the other uses the methods of machine learning methods and complements the first one. The conclusion is made that expert system can be effectively applied for solution of the problems of categorization of data in the content aggregation systems. The author establishes that hybrid solutions, which combine an approach based on the use of knowledge base and rules with implementation of neural networks allow reducing the cost of the expert system. The novelty of this research lies in the proposed architecture of the system, which is easily extensible and adaptable to workloads by scaling existing modules or adding new ones. The proposed module for spam detection leans on adapting the behavioral algorithm for detecting spam in emails; the proposed module for determination of the key categories of content uses two types of algorithms: fuzzy fingerprints and Twitter topic fuzzy fingerprints that was initially applied for categorization of messages in the social network Twitter. The module that determine subcategory based on the keywords functions in interaction with the thesaurus database. The latter classifier uses the reference vector algorithm for the final determination of subcategories.
Email (Elektronik Mail) atau surat elektronik merupakan salah satu perkembangan teknologi saat ini, dengan email pengiriman pesan dapat dilakukan dengan cepat, dan dapat dikirimkan ke banyak penerima pesan dalam waktu yang singkat, Namun... more
Email (Elektronik Mail) atau surat elektronik merupakan salah satu perkembangan teknologi saat ini, dengan email pengiriman pesan dapat dilakukan dengan cepat, dan dapat dikirimkan ke banyak penerima pesan dalam waktu yang singkat, Namun seiring dengan banyaknya pengguna email banyak orang yang menyalahgunakan email untuk kepentingan pribadinya. Salah satu penyalahgunaan email adalah dalam berbentuk SPAM (Stupid Pointless Annoying Message) pada email. Hal ini menyebabkan ketidaknyamanan pengguna dan dapat memehuni memory dari email, selain itu isi dari pesan SPAM sangat beragam, diantaranya berita hoax, iklan perusahaan, ataupun penyebaran virus dan physing.
Under short messaging service (SMS) spam is understood the unsolicited or undesired messages received on mobile phones. These SMS spams constitute a veritable nuisance to the mobile subscribers. This marketing practice also worries... more
Under short messaging service (SMS) spam is understood the unsolicited or undesired messages received on mobile phones. These SMS spams constitute a veritable nuisance to the mobile subscribers. This marketing practice also worries service providers in view of the fact that it upsets their clients or even causes them lose subscribers. By way of mitigating this practice, researchers have proposed several solutions for the detection and filtering of SMS spams. In this paper, we present a review of the currently available methods, challenges and future research directions on spam detection techniques, filtering and mitigation of mobile SMS spams. The existing research literature is critically reviewed and analysed. The most popular techniques for SMS spam detection, filtering and mitigation are compared, including the used datasets, their findings and limitations, and the future research directions are discussed. This review is designed to assist expert researchers to identify open are...
The susceptible characteristics of email spams allow them to undergo changes that can make them to easily evade spam filters. This necessitates the need to develop more effective spam filters. Machine learning approaches have proved to be... more
The susceptible characteristics of email spams allow them to undergo changes that can make them to easily evade spam filters. This necessitates the need to develop more effective spam filters. Machine learning approaches have proved to be an efficient method for solving the problem of several spam emails wreaking havoc on email users. The conventional techniques of spam filtering like black lists and white lists (using domains, IP addresses, mailing addresses, etc.) have not been able to effectively curb the hazards posed by spam emails. In this paper, we applied the Logistic Model Tree machine learning algorithm for efficient classification of email spam messages. The aim of this study is to develop an email spam filter with superior prediction accuracy and fewer number of features. From the Enron public dataset consisting of 5,180 emails of both ham, spam, and normal emails, some features were extracted and used by the Logistic Model Tree Induction algorithm. Our technique has a classification accuracy of 99.305%, very low false positive rate (0.05), and very high true positive rate of 0.995. All experiments are conducted on WEKA data mining and machine learning simulation environment.
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary discussion in the study background examines the applications of machine learning techniques to the email spam filtering process of the leading internet service providers (ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email spam filtering process, and the various efforts by different researchers in combating spam through the use machine learning techniques was done. Our review compares the strengths and drawbacks of existing machine learning approaches and the open research problems in spam filtering. We recommended deep leaning and deep adversarial learning as the future techniques that can effectively handle the menace of spam emails.
- by Emmanuel Gbenga Dada and +1
- •
- Spam Filtering
–The Spam Emails are regularly causing huge losses to business on a regular basis. The Spam filtering is an automated technique to identity SPAM and HAM (Non-Spam). The Web Spam filters can be categorized as: Content based spam filters... more
–The Spam Emails are regularly causing huge losses to business on a regular basis. The Spam filtering is an automated technique to identity SPAM and HAM (Non-Spam). The Web Spam filters can be categorized as: Content based spam filters and List based spam filters. In this research work, we have studied the spam statistics of a famous Spambot 'Srizbi'. We have also discussed different approaches for Spam Filtering and finally proposed a new algorithm which is made on the basis of behavioral approaches of Spammers and to restrict the budding economical growth of Spam generating company's. We have used the hidden Honeypot and a Honeytrap module to minimize the spam generated from Contact and Feedback forms on public and social networking CMS websites.
İstenmeyen elektronik postalar alıcıya rızası dışında gönderilen ve genellikle kötü niyetli veya tanıtım amaçlı olan kişilerin başvurduğu bir yöntemdir. Elektronik postalar, kullanımının kolaylığı, maliyetlerinin ucuz olmasından dolayı... more
İstenmeyen elektronik postalar alıcıya rızası dışında gönderilen ve genellikle kötü niyetli veya tanıtım amaçlı olan kişilerin başvurduğu bir yöntemdir. Elektronik postalar, kullanımının kolaylığı, maliyetlerinin ucuz olmasından dolayı propaganda, reklam, oltalama yapmak isteyen kişi veya topluluklar tarafından etkin bir biçimde kullanılmaktadır. Amaçlarını gerçekleştirmek isteyen kişi veya topluluklar hiç tanımadıkları e-posta hesaplarına gereksiz ve istenmeyen postalar gönderirler. Bu çalışmada, istenmeyen elektronik postaların filtrelenmesi için literatürde bulunan yöntemler incelenmiştir. Bu istenmeyen e-posta filtreleme yöntemleri temel olarak yapay zekâ tabanlı olmayan ve yapay zekâ tabanlı olan şeklinde iki ana başlık altında incelenmiştir. Yapay zekâ tabanlı olmayan yöntemlerin istenmeyen e-posta tespitinde etkili sonuçlar verdiği ancak literatürde bu yöntemleri atlayabilen tekniklerin olduğu görülmektedir. İstenmeyen e-posta tespitinde yapay zekâ tabanlı makine öğrenmesi algoritmaları kullanan sistemlerin popülaritesinin arttığı ve araştırmaların bu yönde ivme kazandığı görülmektedir. Özellikle derin öğrenme yöntemleri yüksek performansları nedeniyle spam tespitinde tercih edilmeye başlamıştır. Literatürde klasik makine öğrenme yöntemlerinden olan Bayes, Destek Vektör Makinesi, Yapay Sinir Ağı, Rastgele Orman, Çok Katmanlı Algılayıcı, K-En Yakın Komşu gibi algoritmalarının kullanıldığı spam tespit yöntemlerinde yüksek başarım sağladığı görülmektedir. Uzun Kısa Süreli Bellek ve Evrişimsel Sinir Ağı algoritmalarını kullanan derin öğrenme temelli spam tespit yöntemlerinin başarım oranlarını daha da artırdığı farklı veri kümeleri kullanılarak gösterilmiştir. Ayrıca spam tespit sistemlerinde bulunan açık problemler ve Türkçe özelinde bu çalışmaların hangi aşamada olduğu da bu çalışmada irdelenmiştir ve çeşitli öneriler yapılmıştır. ABSTRACT Spam e-mails are a method that is sent to the recipient without his consent and is generally used by people with malicious or promotional purposes. E-mails are actively used by people or communities who want to make propaganda, advertising, phishing because of their ease of use and low cost. People or communities who want to achieve their goals send spam to the e-mail accounts they never knew. In this study, the methods in the literature for filtering spam e-mails were examined. These spam filtering methods are mainly examined under two main headings: non-artificial intelligence-based and artificial intelligence-based. It is seen that non-artificial intelligence-based methods give effective results in detecting spam, but there are techniques in the literature that can bypass these methods. It is seen that the systems that use artificial intelligence-based machine learning algorithms in detecting spam have increased in popularity and research has gained momentum in this direction. Especially deep learning methods have been preferred for spam detection due to their high performance. In the literature, it is seen that it provides high performance in spam detection methods using algorithms such as Bayes, Support Vector Machine, Artificial Neural Network, Random Forest, Multilayer Perceptron, and K-Nearest Neighbour, which are classical machine learning methods. It has been demonstrated using different datasets that deep learning-based spam detection methods using Long Short Term Memory and Convolutional Neural Network algorithms further increase the performance rates. Besides, open problems found in spam detection systems and the stage of these studies in Turkish are also examined in this study and various suggestions have been made.
This paper analyses the method of intelligent spam filtering techniques during the SMS (Short message Service) text paradigm, in the context of mobile text messages spam. The unique characteristics of the SMS contents be indicative of the... more
This paper analyses the method of intelligent spam filtering techniques during the SMS (Short message Service) text paradigm, in the context of mobile text messages spam. The unique characteristics of the SMS contents be indicative of the fact that all approaches cannot be equally effective or efficient. This paper compares some of the trendy mobile SMS spam filtering techniques on a publically available SMS spam corpus, to categorize the methods that work best in the SMS text context. This can give hints on optimized SMS spam detection for mobile text messages.
In this age of popular instant messaging applications, Short Message Service or SMS has lost relevance and has turned into the forte of service providers, business houses, and different organizations that use this service to target common... more
In this age of popular instant messaging applications, Short Message Service or SMS has lost relevance and has turned into the forte of service providers, business houses, and different organizations that use this service to target common users for marketing and spamming. A recent trend in spam messaging is the use of content in regional language typed in English, which makes the detection and filtering of such messages more challenging. In this work, an extended version of a standard SMS corpus containing spam and non-spam messages that is extended by the inclusion of labeled text messages in regional languages like Hindi or Bengali typed in English has been used, as gathered from local mobile users. Monte Carlo approach is utilized for learning and classification in a supervised approach, using a set of features and machine learning algorithms commonly used by researchers. The results illustrate how different algorithms perform in addressing the given challenge effectively.
The emerging technology has led to the development of various platforms through which millions of people collaborate and communicate with each other. Spamming is the action of sending unsolicited messages through electronic messaging... more
The emerging technology has led to the development of various platforms through which millions of people collaborate and communicate with each other. Spamming is the action of sending unsolicited messages through electronic messaging system. Spam is a form of platform manipulation. Spam message can be sent over multiple communication medium such as email,Instant Messages(IM), Online Social Networks(OSN) etc. Statistics show that a large proportion of internet traffic are spam. The person who spreads unsolicited contents to others are known as spammers. Spammers intentionally send messages to recipients who did not grant permission to send them. most of the messages are based on advertisements in which some of them are source of security breaches and lead to phishing or malware attack .This is because the spammers present their contents as valuable or as real one, and send them to the user. The authentic users mistake the spam information as an important one.
Considering that the volume of emails worldwide is 269billion messages per day and that 49.7% of it is spam which includes emails from fraudsters. These cyber criminals have the intention to "phish" for personal sensitive information from... more
Considering that the volume of emails worldwide is 269billion messages per day and that 49.7% of it is spam which includes emails from fraudsters. These cyber criminals have the intention to "phish" for personal sensitive information from their victims or infect their computers with viruses or malicious contents for illegal financial gains. This article therefore explains the different ways these online scams are perpetuated and presents several investigations and counter attack strategy proposals by machine learning experts to tackle the issue of spam filtering. This paper reports different research designs and solutions proposed with the use of machine learning algorithms, ranging from techniques that are based on text categorization to strategies that examines email content with attached images. The effectiveness and efficiency of these machine learning tools were discovered and discussed. In conclusion, further research on spam filtering tools based on machine learning algorithms was encouraged as cyber criminals are continuously innovating new methods that threaten and abuse these systems in their bid to avoid spam filters.
Every year, the number of uninvited email received by the common email user will increase dramatically. According to IDC, Spam has accounted for 38 percent of the 31 billion emails sent each day in North America in 2004, up from 24... more
Every year, the number of uninvited email received by the common email user will increase dramatically. According to IDC, Spam has accounted for 38 percent of the 31 billion emails sent each day in North America in 2004, up from 24 percent in 2002. keeping pace with amount of spam is that the quantity of filtering solutions out there to assist eliminate it. This paper describes in detail how several of the most common spam filtering technologies work, how effective they are at stopping spam, their strengths and weaknesses, and techniques used by spammers to circumvent them.