Quantifying the semantics of search behavior before stock market moves - PubMed (original) (raw)

Quantifying the semantics of search behavior before stock market moves

Chester Curme et al. Proc Natl Acad Sci U S A. 2014.

Abstract

Technology is becoming deeply interwoven into the fabric of society. The Internet has become a central source of information for many people when making day-to-day decisions. Here, we present a method to mine the vast data Internet users create when searching for information online, to identify topics of interest before stock market moves. In an analysis of historic data from 2004 until 2012, we draw on records from the search engine Google and online encyclopedia Wikipedia as well as judgments from the service Amazon Mechanical Turk. We find evidence of links between Internet searches relating to politics or business and subsequent stock market moves. In particular, we find that an increase in search volume for these topics tends to precede stock market falls. We suggest that extensions of these analyses could offer insight into large-scale information flow before a range of real-world events.

Keywords: complex systems; computational social science; data science; financial markets; online data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.

Fig. 1.

Google Trends based trading strategies for 55 different semantic topics. (A) For each topic, we depict the distribution of cumulative returns from 30 trading strategies, each based on search volume data for one term belonging to the topic. Strategies trade weekly on the SPXT from 2004 to 2012, using Δ_t_ = 3 wk. We show in the top row the distribution of cumulative returns for a random strategy. The mean percentage returns for each topic appear on the left column. We compare the cumulative returns for search volume-based strategies to the distribution of cumulative returns from the random strategy using two-sample Wilcoxon rank sum tests, with FDR correction for multiple comparisons among a range of topics and values of the parameter Δ_t_. We find that strategies based on keywords in the categories Politics I (W = 20,713, P = 0.01) and Business (W = 19,919, P = 0.04), shown in red, lead to higher cumulative returns than the random strategy. (B) Colored cells denote values of Δ_t_ for which the cumulative returns for a semantic topic are significantly higher than those of a random strategy (P < 0.05). Terms within the categories Business, Politics I, and Politics II result in significant returns across a range of values of Δ_t_. (C and D) same as A and B, but using shuffled search volumes and finding no significant “topics.”

Fig. 2.

Fig. 2.

Effect of changing time window on returns. For the Business, Politics I, and Politics II topics, we depict the distribution of cumulative returns from the corresponding trading strategies in six overlapping 4-yr time windows. Distributions are plotted using a kernel density estimate, with a Gaussian kernel and bandwidth calculated with Silverman’s rule of thumb (42). Strategies trade weekly on the SPXT, using Δ_t_ = 3. The distribution of cumulative returns for a random strategy is also shown in each time window. The mean percentage return R¯ for each topic is provided on the right of the figure. We compare the cumulative returns for search volume-based strategies to the distribution of cumulative returns from the random strategy using two-sample Wilcoxon rank sum tests, with FDR correction for multiple comparisons. Terms in the Politics I category result in significant returns (all _W_s ≥ 18,839, all _P_s < 0.05 after FDR correction) for all time windows, with the exception of 2009–2012 and 2010–2013. Terms relating to Business result in significant returns for the periods 2004–2007, 2006–2009, 2007–2010, and 2008–2011 (all _W_s ≥ 18,511, all _P_s < 0.05 after FDR correction). Finally, terms in the Politics II category result in significant returns for the periods 2005–2008, 2006–2009, 2007–2010, and 2008–2011 (all _W_s ≥ 19,196, all _P_s < 0.05 after FDR correction).

References

    1. Shleifer A. Inefficient Markets: An Introduction to Behavioral Finance. Oxford: Oxford Univ Press; 2000.
    1. Lillo F, Farmer JD, Mantegna RN. Econophysics: Master curve for price-impact function. Nature. 2003;421(6919):129–130. -PubMed
    1. Gabaix X. Power laws in economics and finance. Annu Rev Econ. 2009;1:255–293.
    1. Gabaix X, Gopikrishnan P, Plerou V, Stanley HE. A theory of power-law distributions in financial market fluctuations. Nature. 2003;423(6937):267–270. -PubMed
    1. Gabaix X, Gopikrishnan P, Plerou V, Stanley HE. Institutional investors and stock market volatility. Q J Econ. 2006;121(2):461–504.

Publication types

LinkOut - more resources