Embracing Domain Differences in Fake News: Cross-domain Fake News Detection using Multi-modal Data (original) (raw)

Can Machines Learn to Detect Fake News? A Survey Focused on Social Media

Can Machines Learn to Detect Fake News? A Survey Focused on Social Media, 2019

Through a systematic literature review method, in this work we searched classical electronic libraries in order to find the most recent papers related to fake news detection on social medias. Our target is mapping the state of art of fake news detection, defining fake news and finding the most useful machine learning technique for doing so. We concluded that the most used method for automatic fake news detection is not just one classical machine learning technique, but instead a amalgamation of classic techniques coordinated by a neural network. We also identified a need for a domain ontology that would unify the different terminology and definitions of the fake news domain. This lack of consensual information may mislead opinions and conclusions.

A Review of Fake News Detection Models: Highlighting the Factors Affecting Model Performance and the Prominent Techniques Used

International Journal of Advanced Computer Science and Applications

In recent times, social media has become the primary way people get news about what is happening in the world. Fake news surfaces on social media every day. Fake news on social media has harmed several domains, including politics, the economy, and health. Additionally, it has negatively affected society's stability. There are still certain limitations and challenges even though numerous studies have offered useful models for identifying fake news in social networks using many techniques. Moreover, the accuracy of detection models is still notably poor given we deal with a critical topic. Despite many review articles, most previously concentrated on certain and repeated sections of fake news detection models. For instance, the majority of reviews in this discipline only mentioned datasets or categorized them according to labels, content, and domain. Since the majority of detection models are built using a supervised learning method, it has not been investigated how the limitations of these datasets affect detection models. This review article highlights the most significant components of the fake news detection model and the main challenges it faces. Data augmentation, feature extraction, and data fusion are some of the approaches explored in this review to improve detection accuracy. Moreover, it discusses the most prominent techniques used in detection models and their main advantages and disadvantages. This review aims to help other researchers improve fake news detection models.

Supervised Learning for Fake News Detection

IEEE Intelligent Systems

A large body of recent works has focused on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including source and posts from social media. In addition to exploring the main features proposed in the literature for fake news detection, we present a new set of features and measure the prediction performance of current approaches and features for automatic detection of fake news. Our results reveal interesting findings on the usefulness and importance of features for detecting false news. Finally, we discuss how fake news detection approaches can be used in the practice, highlighting challenges and opportunities.

Automated Fake News Detection using cross-checking with reliable sources

2022

Over the past decade, fake news and misinformation have turned into a major problem that has impacted different aspects of our lives, including politics and public health. Inspired by natural human behavior, we present an approach that automates the detection of fake news. Natural human behavior is to cross-check new information with reliable sources. We use Natural Language Processing (NLP) and build a machine learning (ML) model that automates the process of cross-checking new information with a set of predefined reliable sources. We implement this for Twitter and build a model that flags fake tweets. Specifically, for a given tweet, we use its text to find relevant news from reliable news agencies. We then train a Random Forest model that checks if the textual content of the tweet is aligned with the trusted news. If it is not, the tweet is classified as fake. This approach can be generally applied to any kind of information and is not limited to a specific news story or a catego...

Fake news detection: When complex problems demand complex solutions

2021

Fake news detection is one of the most challenging problems in today's information and communication systems. In this article we address the challenge of detecting the generation and spreading of misleading information in the specific scenario of rumours propagation and clickbait. We realise that the construction of the dataset used to study this kind of problems dramatically affects the performance of the model and, thus, its selection. Hence, we conduct experiments with two datasets of different complexity. In experiment A, by using a simple dataset with rumour propagation data from Twitter, we demonstrate that good performance scores can be obtained without relying on the high computational cost of hyper-parameters tuning. In experiment B, an approach with fewer parameters and computational layers is not suitable to study clickbait with a larger dataset featuring more complex dynamics. Information deluge clearly demands the automation of the procedures for information treatme...

Developed Models Based on Transfer Learning for Improving Fake News Predictions

Journal of Universal Computer Science, 2023

In conjunction with the global concern regarding the spread of fake news on social media, there is a large flow of research to address this phenomenon. The wide growth in social media and online forums has made it easy for legitimate news to merge with comprehensive misleading news, negatively affecting people's perceptions and misleading them. As such, this study aims to use deep learning, pre-trained models, and machine learning to predict Arabic and English fake news based on three public and available datasets: the Fake-or-Real dataset, the AraNews dataset, and the Sentimental LIAR dataset. Based on GloVe (Global Vectors) and FastText pre-trained models, A hybrid network has been proposed to improve the prediction of fake news. In this proposed network, CNN (Convolution Neural Network) was used to identify the most important features. In contrast, BiGRU (Bidirectional Gated Recurrent Unit) was used to measure the long-term dependency of sequences. Finally, multi-layer perceptron (MLP) is applied to classify the article news as fake or real. On the other hand, an Improved Random Forest Model is built based on the embedding values extracted from BERT (Bidirectional Encoder Representations from Transformers) pre-trained model and the relevant speaker-based features. These relevant features are identified by a fuzzy model based on feature selection methods. Accuracy was used as a measure of the quality of our proposed models, whereby the prediction accuracy reached 0.9935, 0.9473, and 0.7481 for the Fake-or-Real dataset, AraNews dataset, and Sentimental LAIR dataset respectively. The proposed models showed a significant improvement in the accuracy of predicting Arabic and English fake news compared to previous studies that used the same datasets.

Fake News Data Exploration and Analytics

Electronics

Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequen...

Automated Multi-Model Fake News Classifier

Journal of emerging technologies and innovative research, 2021

The widespread increase in fake news, whether created by humans or machines, has a negative impact on society and individuals on both a political and social level. The rapid rotation of news in the age of social media makes it difficult to assess its authenticity quickly. As a result, automated fake news identification tools have become a necessity. To solve the aforementioned problem, a hybrid Neural Network architecture is used, which incorporates the capabilities of CNN and LSTM, as well as two separate dimensionality reduction methods, PCA and Chi-Square. We'll use data from the Fake News Challenges (FNC) website, which includes four different forms of stances: agree, disagree, discuss, and unrelated. The aim of this study is to figure out what a news article's body is in relation to its headline using different deep learning and ML models.

A benchmark study of machine learning models for online fake news detection

Machine Learning with Applications, 2021

The proliferation of fake news and its propagation on social media has become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been suggested to detect fake news. However, most of those focused on a specific type of news (such as political) which leads us to the question of dataset-bias of the models used. In this research, we conducted a benchmark study to assess the performance of different applicable machine learning approaches on three different datasets where we accumulated the largest and most diversified one. We explored a number of advanced pre-trained language models for fake news detection along with the traditional and deep learning ones and compared their performances from different aspects for the first time to the best of our knowledge. We find that BERT and similar pre-trained models perform the best for fake news detection, especially with very small dataset. Hence, these models are significantly better option for languages with limited electronic contents, i.e., training data. We also carried out several analysis based on the models' performance, article's topic, article's length, and discussed different lessons learned from them. We believe that this benchmark study will help the research community to explore further and news sites/blogs to select the most appropriate fake news detection method.

FANG-COVID: A New Large-Scale Benchmark Dataset for Fake News Detection in German

Proceedings of the Fourth Workshop on Fact Extraction and VERification (FEVER), 2021

As the world continues to fight the COVID-19 pandemic, it is simultaneously fighting an 'infodemic'-a flood of disinformation and spread of conspiracy theories leading to health threats and the division of society. To combat this infodemic, there is an urgent need for benchmark datasets that can help researchers develop and evaluate models geared towards automatic detection of disinformation. While there are increasing efforts to create adequate, open-source benchmark datasets for English, comparable resources are virtually unavailable for German, leaving research for the German language lagging significantly behind. In this paper, we introduce the new benchmark dataset FANG-COVID consisting of 28,056 real and 13,186 fake German news articles related to the COVID-19 pandemic as well as data on their propagation on Twitter. Furthermore, we propose an explainable textual-and social context-based model for fake news detection, compare its performance to "blackbox" models and perform feature ablation to assess the relative importance of humaninterpretable features in distinguishing fake news from authentic news.