Hongseok Namkoong (original) (raw)

arXiv:2408.04570 [cs.LG], 2024

Selected for oral presentations at the Econometric Society Interdisciplinary Frontiers: Economics and AI+ML conference and Conference on Digital Experimentation

Adaptivity can significantly improve efficiency of experimentation, but it is challenging to implement even at large online platforms with mature experimentation systems. As a result, many real-world experiments are deliberately implemented with large batches and a handful of opportunities to update the sampling allocation as a way to reduce operational costs of experimentation.
In this work, we focus on adaptive experiments with limited adaptivity (short horizons T < 10). Bandit algorithms focusing on long-horizon settings are tailored to provide regret guarantees for each specific case, and we find they often underperform static A/B tests on practical problem instances with batched feedback, non-stationarity, multiple objectives and constraints, and personalization.
In response, we develop a mathematical programming framework for developing adaptive experimentation algorithms. Instead of the problem-specific research paradigm (akin to an optimization solver developed for a particular linear program), we ask the modeler to write down a flexible optimization formulation and use modern machine learning systems to (heuristically) solve for adaptive designs. Since a naive formulation of the adaptive experimentation problem as a dynamic program is intractable, we propose a batched view of the experimentation process. We model the uncertainty around batch-level sufficient statistics necessary to make allocation decisions, instead of attempting to model unit-level outcomes whose distributions are commonly unknown and leads to intractable dynamic programs with combinatorial action spaces.
Sequential Gaussian approximations is the main intellectual vehicle powering our mathematical programming framework. CLT-based normal approximations are universal in statistical inference, and a sequential variant we prove provides a simple optimization formulation that lends itself to modern computational tools. Through extensive empirical evaluation, we observe that even a preliminary and heuristic solution approach can provide major robustness benefits. Unlike bespoke methods (e.g., Thompson sampling variants), our mathematical programming framework provides consistent gains over static randomized control trials and exhibits robust performance across problem instances.