Identification of Non-lexicon Slang Unigrams in Body-enhancement Medicinal UBE (original) (raw)

Email has become a fast and cheap means of online communication. The main thr€at to email is Unsolicited Bulk Email (UBE), commonly called spam email, The currcnt work aims at identification of unigrarns in mor€ than 2700 LJBE that advertise body-enhancement drugs. The identification is based on the rtquircment that the unigram is not prtsent in English dictionary and is a slang term. The motives of tle paper are many fold. This is an attempt to analyze spasming behavior and employment of word-mutation technique. On the sidelines of the paper, we have attempted to better understand thc Spam, the slang and their interplay. The problern has been addressed by employing Tokenization technique and Unigram BOW model. We found that the nonlexicon words constitute nearly 669/0 of total number of lexis of corpus whereas slang words constitute nearly 5.3470 of non-lexicon words. Further, non-lexicon slang unigrams composed of mutated form of single lexicon word, form more than 90% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon slang unigrams in any kind of UBE. '