Near-Optimal Regret for Distributed Adversarial Bandits: A Black-Box Approach (original) (raw)

View PDF HTML (experimental)

Abstract:We study distributed adversarial bandits, where NNN agents cooperate to minimize the global average loss while observing only their own local losses. We show that the minimax regret for this problem is tildeTheta(sqrt(rho−1/2+K/N)T)\tilde{\Theta}(\sqrt{(\rho^{-1/2}+K/N)T})tildeTheta(sqrt(rho−1/2+K/N)T), where TTT is the horizon, KKK is the number of actions, and rho\rhorho is the spectral gap of the communication matrix. Our algorithm, based on a novel black-box reduction to bandits with delayed feedback, requires agents to communicate only through gossip. It achieves an upper bound that significantly improves over the previous best bound tildeO(rho−1/3(KT)2/3)\tilde{O}(\rho^{-1/3}(KT)^{2/3})tildeO(rho−1/3(KT)2/3) of Yi and Vojnovic (2023). We complement this result with a matching lower bound, showing that the problem's difficulty decomposes into a communication cost rho−1/4sqrtT\rho^{-1/4}\sqrt{T}rho−1/4sqrtT and a bandit cost sqrtKT/N\sqrt{KT/N}sqrtKT/N. We further demonstrate the versatility of our approach by deriving first-order and best-of-both-worlds bounds in the distributed adversarial setting. Finally, we extend our framework to distributed linear bandits in RdR^dRd, obtaining a regret bound of tildeO(sqrt(rho−1/2+1/N)dT)\tilde{O}(\sqrt{(\rho^{-1/2}+1/N)dT})tildeO(sqrt(rho−1/2+1/N)dT), achieved with only O(d)O(d)O(d) communication cost per agent and per round via a volumetric spanner.

Submission history

From: Hao Qiu [view email]
[v1] Fri, 6 Feb 2026 05:53:38 UTC (51 KB)