Grand Challenge on Offline Reinforcement Learning for Bandwidth Estimation in Real Time Communications

Organized and sponsored by

Video conferencing systems have recently emerged as indispensable tools to sustain global business operations and enable accessible education by revolutionizing the way people connect, collaborate, and communicate despite physical barriers and geographical divides. The quality of experience (QoE) delivered by these systems to the end user depends on bandwidth estimation, which is the problem of estimating the variable capacity of the bottleneck link between the sender and the receiver over time. In real time communication systems (RTC), the bandwidth estimate serves as a target bit rate for the audio/video encoder, controlling the send rate from the client. Overestimating the capacity of the bottleneck link causes network congestion as the client sends data at a rate higher than what the network can handle. Network congestion is characterized by increased delays in packet delivery, jitter, and potential packet losses. In terms of user’s experience, users will typically experience many resolution switches, frequent video freezes, garbled speech, and audio/video desynchronization, to name a few. Underestimating the available bandwidth on the other hand causes the client to encode and transmit the audio/video streams in a lower rate signal than what the network can handle, which leads to underutilization and degraded QoE. Estimating the available bandwidth accurately is therefore critical to providing the best possible QoE to users in RTC systems. Nonetheless, bandwidth estimation is faced with a multitude of challenges such as dynamic network paths between senders and receivers with fluctuating traffic loads, existence of diverse wired and wireless access network technologies with distinct characteristics, existence of different transmission protocols fighting for bandwidth to carry side and cross traffic, and partial observability of the network as only local packet statistics are available at the client side to base the estimate on.

To improve QoE for users in RTC systems, the ACM MMSys 2024 grand challenge focuses on learning a fully data-driven bandwidth estimator using offline reinforcement learning based on a real-world dataset of packet traces with objective metrics that reflect user-perceived audio/video quality in Microsoft Teams.

Task

Offline reinforcement learning (RL) is a variant of RL where the agent learns from a fixed dataset of previously collected experiences, without interacting with the environment during training. In offline RL, the goal is to learn a policy that maximizes the expected cumulative reward based on the data. Offline RL is different from online RL where the agent can interact with the environment using its updated policy and learn from the feedback it receives online.

In this challenge, participants are provided with a dataset of real-world trajectories for Microsoft Teams audio/video calls. Each trajectory corresponds to the sequence of high-dimensional observation vector computed based on packet information received by the client in one audio/video call along with the bandwidth estimates. In addition, objective signals which capture the user-perceived audio/video quality during the call are provided. This dataset is based on calls with different bandwidth estimators (behaviour policies), including traditional and ML (machine learning) policies. The task of the challenge is to train a policy model (receiver side bandwidth estimator) which maps observations (observed network statistics) to actions (bandwidth estimates) to improve QoE for users. To this end, participants are free to define the agent state-action space and reward function based on the provided data and use offline RL techniques, such as imitation learning, conservative Q-learning, inverse reinforcement learning, and constrained policy optimization, to train a deep learning-based model for bandwidth estimation.

For more information on the challenge task, dataset, submission guidelines and requirements, evaluation criteria, and awards, please refer to the challenge website and the challenge GitHub repository

Participants with queries related to this grand challenge can either contact Sami Khairy by email or create an issue on the Github repository.

Important Dates

Please refer to the important dates page.

Organizers

Sami Khairy (samikhairy@microsoft.com)
Gabriel Mittag (gmittag@microsoft.com)
Ezra Ameri (ezraameri@microsoft.com)
Scott Inglis (singlis@microsoft.com)
Vishak Gopal (vishak.gopal@microsoft.com)
Mehrsa Golestaneh (mgolestaneh@microsoft.com)
Ross Cutler (ross.cutler@microsoft.com)
Francis Yan (francisy@microsoft.com)
Zhixiong Niu (zhixiong.niu@microsoft.com)