Multi-Armed Bandits with Delayed and Aggregated Rewards

Report No. ARL-TR-8754
Authors: Jacob Tyo, Ojash Neopane, Jonathon Byrd, Chirag Gupta, Conor Igoe
Date/Pages: August 2019; 20 pages
Abstract: We study the canonical multi-armed bandit problem under delayed feedback. Recently proposed algorithms have desirable regret bounds in the delayed-feedback setting but require strict prior knowledge of expected delays. In this work, we study the regret of such delay-resilient algorithms under milder assumptions on delay distributions. We experimentally investigate known theoretical performance bounds and attempt to improve on a recently proposed algorithm by making looser assumptions on prior delay knowledge. Further, we investigate the relationship between delay assumptions and marking an arm as suboptimal.
Distribution: Approved for public release
  Download Report ( 0.932 MBytes )
If you are visually impaired or need a physical copy of this report, please visit and contact DTIC.
 

Last Update / Reviewed: August 1, 2019