Authorship Authentication of Short Messages from Social Networks Machines (original) (raw)
Related papers
Southeast Europe Journal of Soft Computing, 2018
Dataset consists of 17000 tweets collected from Twitter, as 500 tweets for each of 34 authors that meet certain criteria. Raw data is collected by using the software Nvivo. The collected raw data is preprocessed to extract frequencies of 200 features. In the data analysis 128 of features are eliminated since they are rare in tweets. As a progressive presentation, five – ten – fifteen – twenty - thirty and thirty four of these 34 authors are selected each time. Since recurrent artificial neural networks are more stable and iterations converge more quickly, in this work this architecture is preferred. In general, ANNs are more successful in distinguishing two classes, therefore for N authors, N×N neural networks are trained for pair wise classification. These N×N experts then organized as N special teams (CANNT) to aggregate decisions of these N×N experts. Number of authors is seen not so effective on the accuracy of the authentication, and around 80% accuracy is achieved for any number of authors.
Authorship Attribution for Online Social Media
2018
This chapter gives a comprehensive knowledge of various machine learning classifiers to achieve authorship attribution (AA) on short texts, specifically tweets. The need for authorship identification is due to the increasing crime on the internet, which breach cyber ethics by raising the level of anonymity. AA of online messages has witnessed interest from many research communities. Many methods such as statistical and computational have been proposed by linguistics and researchers to identify an author from their writing style. Various ways of extracting and selecting features on the basis of dataset have been reviewed. The authors focused on n-grams features as they proved to be very effective in identifying the true author from a given list of known authors. The study has demonstrated that AA is achievable on the basis of selection criteria of features and methods in small texts and also proved the accuracy of analysis changes according to combination of features. The authors fou...
Optimizing Authorship Profiling of Online Messages
2016
Authorship profiling is of growing importance in the current information age, partly due to its application in digital forensics. Methodologies of profiling like any other authorship analysis consist majorly of feature extraction and application of analytical techniques. Choice of feature sets and analytical techniques may significantly affect the performance of authorship analysis. Hence, a need for methods that can help improve on the success of authorship profiling undertakings. The present study sought through experiments, the writing features, analytical technique and number of class labels that can help improve the effectiveness of profiling the country of affiliation of authors of online messages. The experiment showed that the most effective model was achieved when all feature set types in our study were used within a two-class dataset that was analysed with the Neural Network (Multilayer Perceptron) machine learning scheme. The study recommends a need for further studies in finding models that can maximize both effectiveness and efficiency in profiling the authorship of online messages.
Authorship verification for short messages using stylometry
2013 International Conference on Computer, Information and Telecommunication Systems (CITS), 2013
Authorship verification can be checked using stylometric techniques through the analysis of linguistic styles and writing characteristics of the authors. Stylometry is a behavioral feature that a person exhibits during writing and can be extracted and used potentially to check the identity of the author of online documents. Although stylometric techniques can achieve high accuracy rates for long documents, it is still challenging to identify an author for short documents, in particular when dealing with large authors populations. These hurdles must be addressed for stylometry to be usable in checking authorship of online messages such as emails, text messages, or twitter feeds. In this paper, we pose some steps toward achieving that goal by proposing a supervised learning technique combined with n-gram analysis for authorship verification in short texts. Experimental evaluation based on the Enron email dataset involving 87 authors yields very promising results consisting of an Equal Error Rate (EER) of 14.35% for message blocks of 500 characters.
Journal of The American Society for Information Science and Technology, 2006
With the rapid proliferation of Internet technologies and applications, misuse of online messages for inappropriate or illegal purposes has become a major concern for society. The anonymous nature of online-message distribution makes identity tracing a critical problem. We developed a framework for authorship identification of online messages to address the identity-tracing problem. In this framework, four types of writing-style features (lexical, syntactic, structural, and content-specific features) are extracted and inductive learning algorithms are used to build feature-based classification models to identify authorship of online messages. To examine this framework, we conducted experiments on English and Chinese online-newsgroup messages. We compared the discriminating power of the four types of features and of three classification techniques: decision trees, backpropagation neural networks, and support vector machines. The experimental results showed that the proposed approach was able to identify authors of online messages with satisfactory accuracy of 70 to 95%. All four types of message features contributed to discriminating authors of online messages. Support vector machines outperformed the other two classification techniques in our experiments. The high performance we achieved for both the English and Chinese datasets showed the potential of this approach in a multiple-language context.
Procedia Computer Science, 2019
Identifying authors by their style of writing is a very challenging task. This problem has several applications, one of which is to identify fake online reviews written by spam accounts. The existence of such fake reviews degrades the credibility of the whole review collection, hence these fake reviews should be identified and removed. This process, however, needs to be automated since it is impossible to perform it manually in large review collections. Current authorship identification approaches identify authors based on large-scale texts such as documents. For this reason, these methods do not scale well to short texts such as online reviews that have limited features to learn from. This paper introduces a new method of author identification in short texts using combinations of machine learning algorithms and natural language processing techniques. The experiments we conducted on Yelp reviews gave promising results.
On the Empirical Evaluation of Hybrid Author Identification Method
2015
In this paper we focus on the identification of the author of a written text. We present a new hybrid method that combines a set of stylistic and statistical features in a machine learning process. We tested the effectiveness of the linguistic and statistical features combined with the inter-textual distance "Delta" on the PAN’@CLEF’2015 English corpus and we obtained 0.59 as c@1 precision.
Writer Identification Using Microblogging Texts for Social Media Forensics
IEEE Transactions on Biometrics, Behavior, and Identity Science, 2021
Establishing authorship of online texts is fundamental to combat cybercrimes. Unfortunately, text length is limited on some platforms, making the challenge harder. We aim at identifying the authorship of Twitter messages limited to 140 characters. We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes. We use two databases with 93 and 3957 authors, respectively. We test varying sized author sets and varying amounts of training/test texts per author. Performance is further improved by feature combination via automatic selection. With a large amount of training Tweets (>500), a good accuracy (Rank-5>80%) is achievable with only a few dozens of test Tweets, even with several thousands of authors. With smaller sample sizes (10-20 training Tweets), the search space can be diminished by 9-15% while keeping a high chance that the correct author is retrieved among the candidates. In such cases, automatic attribution can provide significant time savings to experts in suspect search. For completeness, we report verification results. With few training/test Tweets, the EER is above 20-25%, which is reduced to < 15% if hundreds of training Tweets are available. We also quantify the computational complexity and time permanence of the employed features.
Authorship Verification of Tweets Cross Topics Using Weighted Word Vectors Similarity
2020
Authorship Verification (AV) is one of the interesting topics that had developed rapidly and distinctly since the middle of the 19th century. With the social media era, there is always a problem in determining whether a given tweet, post, or comment was written by a certain user or not. We are proposing a new approach to verify if a tweet belongs to a claimed user. Our proposed method utilizes the benefits of one-shot learning. It is based on vectors similarity which depends on Term Frequency–Inverse Document Frequency (TF-IDF) and word embedding for better verification accuracy. After comparisons, our proposed approach outperforms existing methods in the case of cross topics.
A Sentiment-Based Author Verification Model Against Social Media Fraud
Joint Proceedings of the 19th World Congress of the International Fuzzy Systems Association (IFSA), the 12th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT), and the 11th International Summer School on Aggregation Operators (AGOP), 2021
The widespread and capability of Iot devices have made them a primary enabler for online fraud and fake authorship on social media. We present a novel approach, which uses sentiment analysis, to solve the problem of author verification in short text. We perform experimentation with our model on tweets, and show that it yields promising results.