Identification of Slang Words Used in Pornographic Unsolicited Bulk Emails (original) (raw)
Related papers
Identification of Hindi Words Used in Pornographic Unsolicited Bulk Emails
has become a fast and cheap means of online communication. The main threat to e-mail is Unsolicited Bulk E-mail (UBE), commonly known as spam e-mail. The current work aims at identification of Hindi words in pornographic UBE. The motives of the paper are manifold. This is an attempt to better understand the UBE and its interplay with the regional language in the perspective of international spamming. The problem has been addressed by employing tokenization technique and Unigram BOW model. The current paper reports the first results on the identification of 87 Hindi words from more than 1,850 pornographic UBE analyzed by us. To the best of our knowledge, this is the first attempt to identify Hindi words in the corpus of pornographic UBE. lntrod u ctio n E-mail has become an efficient and popular communication mechanism as the number of lnternet users has increased. lt has provided both faster and cheaper forms of communication mechanism. But a large part of e-mail traffic consists of non-personal, non-time critical and unsolicited information. This type of email is called Unsolicited Bulk E-mail (UBE) and is commonly known by various other synonymous names like spam
Identification of Non-lexicon Slang Unigrams in Body-enhancement Medicinal UBE
Email has become a fast and cheap means of online communication. The main thr€at to email is Unsolicited Bulk Email (UBE), commonly called spam email, The currcnt work aims at identification of unigrarns in mor€ than 2700 LJBE that advertise body-enhancement drugs. The identification is based on the rtquircment that the unigram is not prtsent in English dictionary and is a slang term. The motives of tle paper are many fold. This is an attempt to analyze spasming behavior and employment of word-mutation technique. On the sidelines of the paper, we have attempted to better understand thc Spam, the slang and their inter-play. The problern has been addressed by employing Tokenization technique and Unigram BOW model. We found that the nonlexicon words constitute nearly 669/0 of total number of lexis of corpus whereas slang words constitute nearly 5.3470 of non-lexicon words.
Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE
Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the side-lines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.
Slang Unigrams Based Classification of Male Body-enhancement Medicinal UBE
E_mair has become a fast and cheap means of online communication' Themainthreattoe-mailisUnsolicitedBulkE-mail(UBE),commonly ffirT;;; ";",;;'0". i" its inherent disposition of excessive slans usase' known as spam e-mail' The current work aims at classification of UBE that advertizes the UoJy enhancement drugs for males' The classification is based on the presence of slang unigram of male body parts for which the enhancement drug is advertized' The motives of the paper are manyfold' This is "n "it"trnot to provide sub-categories for medicinal UBE and analyze the spamming behavior' lt is also an attempt to better understand the UBE, the slang and their interplay' The problem has been addressed by employing the iokenization technique and unigram Bag of
A Morphological Analysis of Slang Words Used by Characters in “Ralph Breaks the Internet” Movie
2021
The current study entitled “A Morphological Analysis of Slang Words Used by Characters In Ralph Breaks the Internet Movie” aimed to investigate the morphological processes of constructing slang words and its meaning of the found slang words used by the movie characters. This research used a descriptive qualitative method with content analysis design. The findings revealed that there are 42 slang words categorized into different morphological processes included compound (14,28%), clipping (11,90%), blending (14,28%), affixations (16,66%), reduplicative (7,14%), backformation (2,4%), abbreviation (2,4%), conversion (4,76%), alternation (14,28%), extension (4,76%) and word manufacture (7,14%). This study demonstrated the meaning changes of the slang words that have been affected through certain morphological processes by modifying their word category. Consequently, some slang terms have preserved the original meaning despite the changes in their spelling. In the meantime, certain slang...
ETERNAL (English Teaching Journal), 2019
People, all around the world, use internet recently. The usage of internet changes the language usage as well. People tend to use short word thus they are abbreviating some words or phrases. Many of new words are coined lately because of the using of internet. The formation of the new word then belongs to the study of morphology. The main theory of this research is morphology. Moreover, it uses the theory of abbreviation. Some types of abbreviation such as blends, acronym, alphabetism and clipping are found in this research.This research entitled ‘Modelling Abbreviation in Internet Slang: A Comparison Study of Indonesian Internet Slang and English Internet Slang’. The data are taken from some websites which have internet slang dictionary. The writer only took 20 data for this research from those online internet slang dictionaries. There are ten (10) data for English Internet slang language and ten (10) data for Indonesian internet slang language. Conducting the analysis, the writer ...
The Identification of Pornographic Sentences in Bahasa Indonesia
Procedia Computer Science, 2019
The positive and negative content is mixed in the Internet world. The government of Indonesia notices that negative content is a potential issue that might threaten Internet users. The government launches several services such as DNS Nawala and TRUST+ TM Positif database. However, government action is not enough because of the validation of the TRUST+ TM Positif database requires many human resources. This research is the beginning of the identification of negative content on a web page. It provides the core system to determine the category of a sentence, which is pornography or non-pornography. The research begins with the corpus building, continued with the data training model, and the last is data testing. The corpus is downloaded from the pornographic websites from the TRUST+ TM Positif database. Moreover, we tested the identification process by using K-Nearest Neighbor (KNN), Passive Aggressive Classifier, and Support Vector Machine (SVM). Both Passive Aggressive Classifier and SVM show an excellent performance. Meanwhile, KNN yields a mediocre result. The SVM algorithm has the highest accuracy of 98.25%.
e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of more than 2700 body enhancement medicinal UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the UBE documents that advertise various products for body enhancement. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexis-set in the given UBE and the probability that the given UBE will be the one advertising for fake medicinal product. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in such UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.
2002-A corpus-based investigation of junk emails
Almost everyone who has an email account receives from time to time unwanted emails. These emails can be jokes from friends or commercial product offers from unknown people. In this paper we focus on these unwanted messages which try to promote a product or service, or to offer some “hot” business opportunities. These messages are called junk emails. Several methods to filter junk emails were proposed, but none considers the linguistic characteristics of junk emails. In this paper, we investigate the linguistic features of a corpus of junk emails, and try to decide if they constitute a distinct genre. Our corpus of junk emails was build from the messages received by the authors over a period of time. Initially, the corpus consisted of 1563, but after eliminating the duplications automatically we kept only 673 files, totalising just over 373,000 tokens. In order to decide if the junk emails constitute a different genre, a comparison with a corpus of leaflets extracted from BNC and wi...
Usage Sphere and Recognition Rate of Vulgarisms
2024
The article scrutinizes vulgarisms used in swear words, slangs, jargons, proverbs, sayings and idioms. Besides, the study made a survey on vulgarisms used in these lexical units by different age group respondents. While analyzing the article, it turned out that vulgarisms do not emerge as a result of anger or fury, so they can appear as a result of pampering or cuddling, too. After a rigid analysis, it became known that elderly respondents know more proverbs and sayings rich in vulgarity due to longer life experience while middle-aged people know more sexual slangs due to access to porn sites and adult videos. Teenagers do not know any proverbs or sayings containing vulgar words because of lack of life experience. After a profound investigation, it turned out that swear words host the most vulgarity while it is followed by slangs and jargons. Sexual slangs contain the second biggest vulgarity. Vulgarisms are sometimes euphemized in slangs or jargons while it is impossible in swear words.