Expanding on Repeated Consumer Search Using Multi-Armed Bandits and Secretaries (original) (raw)

Abstract

We seek to take a different approach in deriving the optimal search policy for the repeated consumer search model found in Fishman & Rob (1995) with the main motivation of dropping the assumption of prior knowledge of the price distribution F (p) in each period. We will do this by incorporating the famous multi-armed bandit problem (MAB). We start by modifying the MAB framework to fit the setting of the repeated consumer search model and formulate the objective as a dynamic optimization problem. Then, given any sequence of exploration we assign a value to each store in that sequence using Bellman equations. We then proceed to break down the problem into individual optimal stopping problems for each period which incidentally coincides with the framework of the famous secretary problem where we proceed to derive the optimal stopping policy. We will see that implementing the optimal stopping policy in each period solves the original dynamic optimization by ‘forward induction’ reasoning.

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (11)

Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM journal on computing, 32 (1), 48-77.
Beckmann, M. (1990). Dynamic programming and the secretary problem. Computers & Mathe- matics with Applications, 19 (11), 25-28.
Derman, C. (1970). Finite state markovian decision processes (Tech. Rep.).
Ferguson, T. S., et al. (1989). Who solved the secretary problem? Statistical science, 4 (3), 282-289.
Fishman, A., & Rob, R. (1995). The durability of information, market efficiency and the size of firms. International Economic Review , 19-36.
Gittins, J. (1974). A dynamic allocation index for the sequential design of experiments. Progress in statistics, 241-266.
McCall, B. P., & McCall, J. J. (1987). A sequential study of migration and job search. Journal of Labor Economics, 5 (4, Part 1), 452-476.
Reinganum, J. F. (1979). A simple model of equilibrium price dispersion. Journal of Political Economy, 87 (4), 851-858.
Slivkins, A. (2019). Introduction to multi-armed bandits. arXiv preprint arXiv:1904.07272 .
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25 (3/4), 285-294.
Weng, L. (2018). The multi-armed bandit problem and its solutions. lilianweng.github.io/lil-log. Retrieved from http://lilianweng.github.io/lil-log/2018/01/23/the-multi-armed-bandit-problem-and-its-solutio