Predicting Socio-Economic Indicators using News Events (original) (raw)

Predicting market inflation expectations with news topics and sentiment

2021

This study presents a novel approach to incorporating news topics and their associated sentiment into predictions of breakeven inflation rate (BEIR) movements for eight countries with mature bond markets. We calibrate five classes of machine learning models including narrative-based features for each country, and find that they generally outperform corresponding benchmarks that do not include such features. We find Logistic Regression and XGBoost classifiers to deliver the best performance across countries. We complement these results with a feature importance analysis, showing that economic and financial topics are the key performance drivers in our predictions, with additional contributions from topics related to health and government. We examine cross-country spillover effects of news narrative on BEIR via Graphical Granger Causality and confirm their existence for the US and Germany, while five other countries considered in our study are only influenced by local narrative.

UsingWords from Daily News Headlines to Predict the Movement of Stock Market Indices

Managing Global Transitions, 2017

Stock market analysis is one of the biggest areas of interest for text mining. Many researchers proposed different approaches that use text information for predicting the movement of stock market indices. Many of these approaches focus either on maximising the predictive accuracy of the model or on devising alternative methods for model evaluation. In this paper, we propose a more descriptive approach focusing on the models themselves, trying to identify the individual words in the text that most affect the movement of stock market indices. We use data from two sources (for the past eight years): the daily data for the Dow Jones Industrial Average index ('open' and 'close' values for each trading day) and the headlines of the most voted 25 news on the Reddit WorldNews Channel for the previous 'trading days. ' By applying machine learning algorithms on these data and analysing individual words that appear in the final predictive models, we find that the words gay, propaganda and massacre are typically associated with a daily increase of the stock index, while the word iran mostly coincide with its decrease. While this work presents a first step towards qualitative analysis of stock market models, there is still plenty of room for improvements.

Forecasting with Economic News

ArXiv, 2022

The goal of this paper is to evaluate the informational content of sentiment extracted from news articles about the state of the economy. We propose a fine-grained aspect-based sentiment analysis that has two main characteristics: 1) we consider only the text in the article that is semantically dependent on a term of interest (aspect-based) and, 2) assign a sentiment score to each word based on a dictionary that we develop for applications in economics and finance (fine-grained). Our data set includes six large US newspapers, for a total of over 6.6 million articles and 4.2 billion words. Our findings suggest that several measures of economic sentiment track closely business cycle fluctuations and that they are relevant predictors for four major macroeconomic variables. We find that there are significant improvements in forecasting when sentiment is considered along with macroeconomic factors. In addition, we also find that sentiment matters to explains the tails of the probability distribution across several macroeconomic variables.

Identifying Predictive Causal Factors from News Streams

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

We propose a new framework to uncover the relationship between news events and real world phenomena. We present the Predictive Causal Graph (PCG) which allows to detect latent relationships between events mentioned in news streams. This graph is constructed by measuring how the occurrence of a word in the news influences the occurrence of another (set of) word(s) in the future. We show that PCG can be used to extract latent features from news streams, outperforming other graph-based methods in prediction error of 10 stock price time series for 12 months. We then extended PCG to be applicable for longer time windows by allowing time-varying factors, leading to stock price prediction error rates between 1.5% and 5% for about 4 years. We then manually validated PCG, finding that 67% of the causation semantic frame arguments present in the news corpus were directly connected in the PCG, the remaining being connected through a semantically relevant intermediate node.

Detecting Economic Events Using a Semantics-Based Pipeline

Lecture Notes in Computer Science, 2011

In today’s information-driven global economy, breaking news on economic events such as acquisitions and stock splits has a substantial impact on the financial markets. Therefore, it is important to be able to automatically identify events in news items accurately and in a timely manner. For this purpose, one has to be able to mine a wide variety of heterogeneous sources

Impact of News on the Commodity Market: Dataset and Results

ArXiv, 2020

Over the last few years, machine learning based methods have been applied to extract information from news flow in the financial domain. However, this information has mostly been in the form of the financial sentiments contained in the news headlines, primarily for the stock prices. In our current work, we propose that various other dimensions of information can be extracted from news headlines, which will be of interest to investors, policy-makers and other practitioners. We propose a framework that extracts information such as past movements and expected directionality in prices, asset comparison and other general information that the news is referring to. We apply this framework to the commodity "Gold" and train the machine learning models using a dataset of 11,412 human-annotated news headlines (released with this study), collected from the period 2000-2019. We experiment to validate the causal effect of news flow on gold prices and observe that the information produce...

IRJET- Forecasting Stock Market Trends Using News Headline Analysis

IRJET, 2021

This is a study that aims to find a correlation between news headlines and their effect on stock market trends using sentiment analysis, logistic regression, XGBoost, and deep learning. This is a study that aims to find a correlation between headlines of newspapers and their subsequent effect on stock market trends using sentiment analysis, logistic regression, XGBoost, and deep learning. In this study, we suggested a forecasting model for predicting sentiment around stock prices. We map feelings to see if there's a link between news-predicted sentiment and the original stock price, as well as to test the efficient market theory. Finding future stock trends is a difficult endeavor since stock trends are influenced by a variety of factors. Presumably, news items and stock prices are connected. Furthermore, news has the potential to change market patterns. As a result, we set out to investigate this link in-depth and see if stock movements can be forecast using news articles and prior price histories

Mining the Web to Predict Future Events

We describe and evaluate methods for learning to forecast forthcoming events of interest from a corpus containing 22 years of news stories. We consider the examples of identifying significant increases in the likelihood of disease outbreaks, deaths, and riots in advance of the occurrence of these events in the world. We provide details of methods and studies, including the automated extraction and generalization of sequences of events from news corpora and multiple web resources. We evaluate the predictive power of the approach on real-world events withheld from the system.

Mining Economic Topic Sentiment for Time Series Modeling

2014

Global businesses must react to daily changes in market conditions over multiple geographies and industries. Consuming reputable daily economic reports assists in understanding these changing conditions, but requires both a significant human time commitment and a subjective assessment of each topic area of interest. To combat these constraints, Dow's Advanced Analytics team has constructed a process to calculate sentence-level topic frequency and sentiment scoring from unstructured economic reports. Daily topic sentiment scores are aggregated to weekly and monthly intervals and used as exogenous variables to model external economic time series data. These models serve to both validate the relationship between our sentiment scoring process and also as near-term forecasts where daily or weekly variables are unavailable. This paper will first describe our process of using SAS® Text Miner to import and discover economic topics and sentiment from unstructured economic reports. The ne...

Linking Twitter Events With Stock Market Jitters

arXiv (Cornell University), 2017

Predicting investors reactions to financial and political news is important for the early detection of stock market jitters. Evidence from several recent studies suggests that online social media could improve prediction of stock market movements. However, utilizing such information to predict strong stock market fluctuations has not been explored so far. In this work, we propose a novel event detection method on Twitter, tailored to detect financial and political events that influence a specific stock market. The proposed approach applies a bursty topic detection method on a stream of tweets related to finance or politics followed by a classification process which filters-out events that do not influence the examined stock market. We train our classifier to recognise real events by using solely information about stock market volatility, without the need of manual labeling. We model Twitter events as feature vectors that encompass a rich variety of information, such as the geographical distribution of tweets, their polarity, information about their authors as well as information about bursty words associated with the event. We show that utilizing only information about tweets polarity, like most previous studies, results in wasting important information. We apply the proposed method on high-frequency intra-day data from the Greek and Spanish stock market and we show that our financial event detector successfully predicts most of the stock market jitters.