Log in

AMSI-ANZIAM Lecture Tour: Reinforcement Learning for Restless Bandits

  • 7 Feb 2023 3:20 PM
    Message # 13087608

    As part of the AMSI-ANZIAM lecture tour, Professor Konstantin Avrachenkov (INRIA, France) presents this specialist talk at the University of Newcastle.

    Visiting Adelaide, Melbourne, Canberra, Newcastle and Brisbane, the lecture tour runs from 13-22 February.

    Event Information: 

    Lecture: Reinforcement Learning for Restless Bandits
    Date: Monday 20 February
    Venue: The University of Newcastle (room details TBC)
    Time: 2pm AEDT
    No registration required

    Abstract: The Whittle index policy is a heuristic that has shown remarkably good performance and guaranteed asymptotic optimality when applied to the class of hard problems known as Restless Multi-Armed Bandit Problems (RMABPs). Some examples of applications of RMABPs are: machine maintenance, wireless channel scheduling, A/B testing and clinical trials, just to name a few. RMABP provides a classical example when a decision-maker needs to balance between exploration and exploitation. We present two approaches (tabular and neural network based) for learning the Whittle indices. The key feature of our approaches is the usage of two time-scales, a faster one to update the state-action Q-values, and a relatively slower one to update the Whittle indices. The neural network based approach computes the Q-values on the faster time-scale and is able to extrapolate information from one state to another, which makes the approach naturally scalable to environments with large state spaces. We present both the theoretical convergence analysis as well as illustrations by numerical examples.

    For more information, please visit

Powered by Wild Apricot Membership Software