Data mining for unemployment rate prediction using search engine query data (original) (raw)

Abstract

Unemployment rate prediction has become critically significant, because it can help government to make decision and design policies. In previous studies, traditional univariate time series models and econometric methods for unemployment rate prediction have attracted much attention from governments, organizations, research institutes, and scholars. Recently, novel methods using search engine query data were proposed to forecast unemployment rate. In this paper, a data mining framework using search engine query data for unemployment rate prediction is presented. Under the framework, a set of data mining tools including neural networks (NNs) and support vector regressions (SVRs) is developed to forecast unemployment trend. In the proposed method, search engine query data related to employment activities is firstly extracted. Secondly, feature selection model is suggested to reduce the dimension of the query data. Thirdly, various NNs and SVRs are employed to model the relationship between unemployment rate data and query data, and genetic algorithm is used to optimize the parameters and refine the features simultaneously. Fourthly, an appropriate data mining method is selected as the selective predictor by using the cross-validation method. Finally, the selective predictor with the best feature subset and proper parameters is used to forecast unemployment trend. The empirical results show that the proposed framework clearly outperforms the traditional forecasting approaches, and support vector regression with radical basis function (RBF) kernel is dominant for the unemployment rate prediction. These findings imply that the data mining framework is efficient for unemployment rate prediction, and it can strengthen government’s quick responses and service capability.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  1. Askitas N, Zimmermann KF (2009) Google econometrics and unemployment forecasting. Appl Econom Q 55(2):107–120
    Article Google Scholar
  2. Blasco N, Corredor P, Del Rio C, Santamaria R (2005) Bad news and Dow Jones make the Spanish stocks go round. Eur J Oper Res 163(1):253–275
    Article MATH Google Scholar
  3. Chen CI (2008) Application of the novel nonlinear grey Bernoulli model for forecasting unemployment rate. Chao Solitons Fractals 37(1):278–287
    Article MATH Google Scholar
  4. Choi H, Varian H (2009) Predicting initial claims for unemployment benefits. Google technical report
  5. Choi H, Varian H (2009) Predicting the present with Google trends. Google technical report
  6. D’Amuri F (2009) Predicting unemployment in short samples with internet job search query data. MPRA paper no. 18403:1–17
  7. D’Amuri F, Marcucci J (2009) Google it! forecasting the US unemployment rate with a Google job search index. MPRA Paper No. 18248:1–52
  8. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS (2009) Detecting influenza epidemics using search engine query data. Nature 457(19):1012–1014
    Article Google Scholar
  9. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
    MATH Google Scholar
  10. Harvill JL, Ray BK (2005) A note on multi-step forecasting with functional coefficient autoregressive models. Int J Forecast 21(4):717–727
    Article Google Scholar
  11. Keilis-Borok VI, Soloviev AA, Allegre CB, Sobolevskii AN (2005) Patterns of macroeconomic indicators preceding the unemployment rise in Western Europe and the USA. Pattern Recogn 38(3):423–435
    Article MATH Google Scholar
  12. Krolzig HM, Marcellino M (2002) A Markov-switching vector equilibrium correction model of the UK labour market. Empir Econ 27:233–254
    Article Google Scholar
  13. Lahiani A, Scaillet O (2009) Testing for threshold effect in ARFIMA models: application to US unemployment rate data. Int J Forecast 25(2):418–428
    Google Scholar
  14. Lan KC, Ho KS, Luk RWP, Yeung DS (2005) FNDS: a dialogue-based system for accessing digested financial news. J Syst Softw 78(2):180–193
    Google Scholar
  15. Milas C, Rothman P (2008) Out-of-sample forecasting of unemployment rates with pooled STVECM forecasts. Int J Forecast 24(1):101–121
    Google Scholar
  16. Proietti T (2003) Forecasting the US unemployment rate. Comput Stat Data Anal 42(3):451–476
    Article MathSciNet MATH Google Scholar
  17. Schanne N, Wapler R (2010) Regional unemployment forecasts with spatial interdependencies. Int J Forecast 26(4):908–926
    Article Google Scholar
  18. Schumaker RP, Chen H (2009) A quantitative stock prediction system based financial news. Inform Process Manag 45(5):571–583
    Article Google Scholar
  19. Suhoy T (2009) Query indices and a 2008 downturn: Israeli data. Bank of Israel discussion paper
  20. Tashman LJ (2000) Out-of-sample tests of forecast accuracy: an analysis review. Int J Forecast 16(4):437–450
    Article Google Scholar
  21. Terui N, van Dijk HK (2002) Combined forecasts from linear and nonlinear time series models. Int J Forecast 18(3):421–438
    Article Google Scholar
  22. Vijverberg CPC (2009) A time deformation model and its time-varying autocorrelation: an application to US unemployment data. Int J Forecast 25(1):128–145
    Google Scholar
  23. Xu W, Han ZW, Ma J (2010) A neural network based approach to detect influenza epidemics using search engine query data. In: Proceeding of the ninth international conference on machine learning and cybernetics, Qingdao, China, pp 1408–1412
  24. Xu W, Zheng T, Li Z (2011) A neural network based forecasting method for the unemployment rate prediction using the search engine query data. In: Proceeding of the eighth IEEE international conference on e-business engineering, Beijing, China, pp 9–15
  25. Xu W, Li Z, Chen Q (2012) Forecasting the unemployment rate by neural networks using search engine query data. In: Proceeding of the 45th Hawaii international conference on system sciences, Hawaii, US, pp 3591–3599

Download references

Acknowledgments

This research work was partly supported by 973 Project (Grant No. 2012CB316205), National Natural Science Foundation of China (Grant No. 71001103) and Beijing Natural Science Foundation (No. 9122013).

Author information

Authors and Affiliations

  1. School of Information, Renmin University of China, Beijing, 100872, China
    Wei Xu, Ziang Li & Cheng Cheng
  2. School of Economics and Management, Tsinghua University, Beijing, 100084, China
    Tingting Zheng

Authors

  1. Wei Xu
  2. Ziang Li
  3. Cheng Cheng
  4. Tingting Zheng

Corresponding author

Correspondence toWei Xu.

Appendix: The top 100 search engine query data

Appendix: The top 100 search engine query data

No. Key words No. Key words
1 filing unemployment 51 ohio unemployment rate
2 unemployment filing for 52 unemployment ny
3 unemployment office 53 unemployment compensation
4 file for unemployment 54 unemployment in az
5 unemployment file for 55 to apply for unemployment
6 unemployment state 56 unemployment insurance claim
7 state of unemployment 57 unemployment department of labor
8 insurance unemployment 58 department of labor unemployment
9 washington unemployment 59 labor department unemployment
10 unemployment file 60 unemployment check
11 unemployment insurance 61 unemployment for mn
12 unemployment apply 62 unemployment in indiana
13 department of unemployment 63 unemployment in california
14 unemployment website 64 snag a job
15 unemployment application 65 unemployment grants
16 unemployment new york 66 unemployment in pennsylvania
17 washington state unemployment 67 unemployment benefit insurance
18 Wisconsinunemployment benefits 68 claim unemployment benefit
19 insurance for unemployment 69 part time unemployment
20 apply for unemployment 70 security jobs
21 unemployment claims 71 new york unemployment benefit
22 unemployment apply for 72 unemployment insurance benefit
23 apply for unemployment 73 unemployment dol
24 unemployment ca 74 unemployment info
25 unemployment services 75 unemployment commission
26 unemployment security 76 michigan unemployment benefits
27 unemployment 77 weekly unemployment insurance
28 to file unemployment 78 weekly unemployment benefits
29 unemployment benefits 79 nyc unemployment benefits
30 file for unemployment online 80 green jobs
31 ohio unemployment benefits 81 how to claim unemployment
32 unemployment file claims 82 unemployment rate
33 to file for unemployment 83 unemployment insurance benefits
34 unemployment benefits pa 84 unemployment weekly benefits
35 unemployment benefit 85 online unemployment application
36 nys dept labor 86 unemployment rate ny
37 state unemployment benefit 87 jobs in usa
38 connecticut unemployment benefits 88 new york unemployment benefits
39 dept of unemployment 89 benefits for unemployment
40 nys dept of labor 90 police jobs
41 for unemployment benefits 91 dc unemployment
42 uimn.org 92 unemployment in kansas
43 unemployment in michigan 93 mass unemployment benefits
44 unemployment benefit claim 94 unemployment online
45 unemployment payment 95 unemployment in florida
46 unemployment in colorado 96 eligible for unemployment
47 apply for unemployment online 97 benefits of unemployment insurance
48 unemployment benefits insurance 98 unemployment eligibility
49 application for unemployment 99 construction jobs
50 benefits unemployment insurance 100 unemployment rate recession

Rights and permissions

About this article

Cite this article

Xu, W., Li, Z., Cheng, C. et al. Data mining for unemployment rate prediction using search engine query data.SOCA 7, 33–42 (2013). https://doi.org/10.1007/s11761-012-0122-2

Download citation

Keywords