Contextual Bandits: Approximate Linear Bayes for Large Contexts (original) (raw)

Abstract

Contextual bandits, and in general informed decision making, can be studied in the general stochastic/statistical setting by means of the conditional probability paradigm where Bayes' theorem plays a central role. However, when informed decisions have to be made considering very large contextual information or the information is contained in too many variables with large history of observations and the time to take decisions is critical, the exact calculation of the Bayes' rule, to produce the best decision given the available information, is unaffordable. In this increasingly common setting some derivations and approximations to conditional probability and the Bayes' rule will progressively gain greater applicability. In this article, an algorithm able to handle large contextual information in the form of binary features for optimal decision making in contextual bandits is presented. The algorithm is analyzed with respect to its scalability in terms of the time required to select the best choice and the time required to update its policy. Last but not least, we address the exploration and exploitation issue explaining, despite the incomputability of an optimal tradeoff, the way in which the proposed algorithm "naturally"' balances exploration and exploitation by using common sense.

José Antonio Martín H. hasn't uploaded this talk.

Create a free Academia account to let José Antonio Martín H. know you want this talk to be uploaded.

Ask for this talk to be uploaded.