Homogeneity-Based Transmissive Process to Model True and False News in Social Networks (original) (raw)
Related papers
Model-based non-Gaussian interest topic distribution for user retweeting in social networks
Neurocomputing, 2017
Retweeting behavior is critical to dissect information diffusion, innovation propagation and events bursting in networks. However, because of the various contents of tweets, recent work mainly focuses on the influential relationship while unable to derive different pathways of information diffusion. Therefore, our work tries to reveal the pattern by tracking retweeting behavior through user interest and categories of tweets. The key for modeling user interest is modeling topic distribution of tweets, which have non-Gaussian characteristics (e.g., power law distribution), thus we present the Latent Topics of user Interest (LTI) model which make full use of the non-Gaussian distribution of topics among tweets to uncover user interest and then predict users' possible actions. After dividing users into conceit users and altruism users by whether they have definite selection when retweeting, and categorizing tweets into repeated hot tweets and novel hot tweets by whether its topics always occur in the training set, we demonstrates a pattern-the conceit users promotes the diffusion of repeated hot tweets, whereas the altruism users expands the diffusion of novel hot tweets , and the pattern is evaluated by the correlation coefficient between types of users and tweets, which is greater than .61 for 10 and 100 million tweets of Weibo 2 and Twitter with respect to 70 and 58 thousand users over a period of one month.
Spam Diffusion in Social Networking Media using Latent Dirichlet Allocation
International Journal of Innovative Technology and Exploring Engineering, 2019
Like web spam has been a major threat to almost every aspect of the current World Wide Web, similarly social spam especially in information diffusion has led a serious threat to the utilities of online social media. To combat this challenge the significance and impact of such entities and content should be analyzed critically. In order to address this issue, this work usedTwitter as a case study and modeled the contents of information through topic modeling and coupled it with the user oriented feature to deal it with a good accuracy. Latent Dirichlet Allocation (LDA) a widely used topic modeling technique is applied to capture the latent topics from the tweets’ documents. The major contribution of this work is twofold: constructing the dataset which serves as the ground-truth for analyzing the diffusion dynamics of spam/non-spam information and analyzing the effects of topics over the diffusibility. Exhaustive experiments clearly reveal the variation in topics shared by the spam an...
Modeling topic specific credibility on twitter
2012
This paper presents and evaluates three computational models for recommending credible topic-specific information in Twitter. The first model focuses on credibility at the user level, harnessing various dynamics of information flow in the underlying social graph to compute a credibility rating. The second model applies a content-based strategy to compute a finer-grained credibility score for individual tweets. Lastly, we discuss a third model which combines facets from both models in a hybrid method, using both averaging and filtering hybrid strategies. To evaluate our novel credibility models, we perform an evaluation on 7 topic specific data sets mined from the Twitter streaming API, with specific focus on a data set of 37K users who tweeted about the topic "Libya". Results show that the social model outperfoms hybrid and content-based prediction models in terms of predictive accuracy over a set of manually collected credibility ratings on the "Libya" dataset.
New Generation Computing, 2022
Online social media has become a major source of information gathering for a huge section of society. As the amount of information flows in online social media is enormous but on the other hand, the fact-checking sources are limited. This shortfall of fact-checking gives birth to the problem of misinformation and disinformation in the case of the truthfulness of facts on online social media which can have serious effects on the wellbeing of society. This problem of misconception becomes more rapid and critical when some events like the recent outbreak of Covid-19 happen when there is no or very little information is available anywhere. In this scenario, the identification of the content available online which is mostly propagated from person to person and not by any governing authority is very needed at the hour. To solve this problem, the information available online should be verified properly before being conceived by any individual. We propose a scheme to classify the online social media posts (Tweets) with the help of the BERT (Bidirectional Encoder Representations from Transformers)-based model. Also, we compared the performance of the proposed approach with the other machine learning techniques and other State of the art techniques available. The proposed model not only classifies the tweets as relevant or irrelevant, but also creates a set of topics by which one can identify a text as relevant or irrelevant to his/her need just by just matching the keywords of the topic. To accomplish this task, after the classification of the tweets, we apply a possible topic modelling approach based on latent semantic analysis and latent Dirichlet allocation methods to identify which of the topics are mostly propagated as false information.
Empirical Study of Topic Modeling in Twitter
Social networks such as Facebook, LinkedIn, and Twitter have been a crucial source of information for a wide spectrum of users. In Twitter, popular information that is deemed important by the community propagates through the network. Studying the characteristics of content in the messages becomes important for a number of tasks, such as breaking news detection, personalized message recommendation, friends recommendation, sentiment analysis and others. While many researchers wish to use standard text mining tools to understand messages on Twitter, the restricted length of those messages prevents them from being employed to their full potential.
Dirichlet-Survival Process: Scalable Inference of Topic-Dependent Diffusion Networks
Lecture Notes in Computer Science, 2023
Information spread on networks can be efficiently modeled by considering three features: documents' content, time of publication relative to other publications, and position of the spreader in the network. Most previous works model up to two of those jointly, or rely on heavily parametric approaches. Building on recent Dirichlet-Point processes literature, we introduce the Houston (Hidden Online User-Topic Network) model, that jointly considers all those features in a non-parametric unsupervised framework. It infers dynamic topic-dependent underlying diffusion networks in a continuous-time setting along with said topics. It is unsupervised; it considers an unlabeled stream of triplets shaped as (time of publication, information's content, spreading entity) as input data. Online inference is conducted using a sequential Monte-Carlo algorithm that scales linearly with the size of the dataset. Our approach yields consequent improvements over existing baselines on both cluster recovery and subnetworks inference tasks.
A time-dependent topic model for multiple text streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11, 2011
In recent years social media have become indispensable tools for information dissemination, operating in tandem with traditional media outlets such as newspapers, and it has become critical to understand the interaction between the new and old sources of news. Although social media as well as traditional media have attracted attention from several research communities, most of the prior work has been limited to a single medium. In addition temporal analysis of these sources can provide an understanding of how information spreads and evolves. Modeling temporal dynamics while considering multiple sources is a challenging research problem. In this paper we address the problem of modeling text streams from two news sources -Twitter and Yahoo! News. Our analysis addresses both their individual properties (including temporal dynamics) and their inter-relationships. This work extends standard topic models by allowing each text stream to have both local topics and shared topics. For temporal modeling we associate each topic with a time-dependent function that characterizes its popularity over time. By integrating the two models, we effectively model the temporal dynamics of multiple correlated text streams in a unified framework. We evaluate our model on a large-scale dataset, consisting of text streams from both Twitter and news feeds from Yahoo! News. Besides overcoming the limitations of existing models, we show that our work achieves better perplexity on unseen data and identifies more coherent topics. We also provide analysis of finding real-world events from the topics obtained by our model.
Measuring the Future Popularity of a Tweet Containing Novel Topics
IJCSIS, 2018
A fundamental question in modeling information cascades is to predict the final size of an information cascade. That is, to predict how many reshares a given post will ultimately receive. A growing line of recent research has studied the spread prediction of online content in online social networks (OSN). Predicting the spread of such contents is important for obtaining latest information on different topics, viral marketing etc. Existing approaches on spread prediction are mainly focused on content and past behavior of users. However, not enough attention is paid to the structural characteristics of the network. We apply Latent Dirichlet Allocation (LDA) model on users’ past tweets of learn the users latent interests on different topics. We next identify top-k topics relevant to the new tweet using wordtopic distribution from LDA. Finally, we measure the spread prediction of the new tweet considering its acceptance in the underlying social network by taking into account the possible effect of all the propagation paths between tweet owner and the recipient user. Our experimental results on real dataset show the efficacy of the proposed approach. keywords. Information cascade, Online social network, Latent Dirichlet Allocation.
Through the Grapevine: A Comparison of News in Microblogs and Traditional Media
Lecture Notes in Social Networks, 2017
In recent years the greater part of news dissemination has shifted from traditional news media to individual users on microblogs such as Twitter and Reddit. Therefore, there has been increasing research e↵ort on how to automatically detect newsworthy and otherwise useful information on these platforms. In this paper, we present two novel algorithmic approaches-contentsimilarity computation and graph analysis-to automatically capture main di↵erences in newsworthy content between microblogs and traditional news media. For the content-similarity algorithm, we discuss why it is di cult to capture such unique information using traditional text-based search mechanisms. We performed an experiment to evaluate the content-similarity algorithm using a corpus of 35 million topic-specific Twitter messages and 6,112 New York Times articles on a variety of topics. This is followed by an online user study (N=200) to evaluate how users assess the content recommended by the algorithm. The results show significant di↵erences in user perception of newsworthiness and uniqueness of the content returned by our algorithm. Secondly, we investigate a method for identifying unique content in microblogs by harnessing network structure of the information propagation graphs. In this approach, we study how these two types of information di↵er from each other in terms of topic and dissemination behavior in the network. The results show that the majority of subgraphs in the traditional group have long retweet chains and exhibit a giant component surrounded by a number of small components, unique contents typically propagate from a dominating node with only a few multi-hop retweet chains observed. Furthermore, results from LDA and BPR algorithms indicate that strong and dense topic associations between users are frequently observed in the graphs of the traditional group, but not in the unique group.
A Social Network Newsworthiness Filter Based on Topic Analysis
International Journal of Technology, 2016
Assessing trustworthiness of social media posts is increasingly important, as the number of online users and activities grows. Current deploying assessment systems measure post trustworthiness as credibility. However, they measure the credibility of all posts, indiscriminately. The credibility concept was intended for news types of posts. Labeling other types of posts with credibility scores may confuse the users. Previous notable works envisioned filtering out non-newsworthy posts before credibility assessment as a key factor towards a more efficient credibility system. Thus, we propose to implement a topic-based supervised learning approach that uses Term Frequency-Interim Document Frequency (TF-IDF) and cosine similarity for filtering out the posts that do not need credibility assessment. Our experimental results show that about 70% of the proposed filtering suggestions are agreed by the users. Such results support the notion of newsworthiness, introduced in the pioneering work of credibility assessment. The topic-based supervised learning approach is shown to provide a viable social network filter.