AUEB Stats Seminars 17/12/2020: Optimal data driven policies under constrained multi-armed bandit observations by Odyseas Kanavetas
Tue 15 Dec 2020 - 11:26
Ημερομηνία Εκδήλωσης:
Πέμπτη, Δεκέμβριος 17, 2020 - 12:30
ΚΥΚΛΟΣ ΣΕΜΙΝΑΡΙΩΝ ΣΤΑΤΙΣΤΙΚΗΣ ΔΕΚΕΜΒΡΙΟΣ 2020
Οδυσσέας Καναβέτας, Leiden University, Mathematical Institute
Optimal data driven policies under constrained multi-armed bandit observations
Σύνδεσμος Google Meeting: meet.google.com/usq-firh-fhs
ΠΕΡΙΛΗΨΗ
After a brief review of the multi-armed bandit (MAB) problem and its online machine learning applications, we present our work on the model with side constraints. The constraints represent circumstances in which bandit activations are restricted by the availability of certain resources that are replenished at a constant rate.
We consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy sample path wise the constraints. We first establish a necessary asymptotic lower bound for the rate of increase of the regret (i.e., loss due to the need to estimate unknown parameters) function of f-UF policies. Then, under pertinent conditions, we establish the existence of asymptotically optimal policies by constructing a class of f-UF policies that achieve this lower bound.
We provide the explicit form of such policies for cases in which the unknown distributions are a) Normal with unknown means and known variances, b) Normal distributions with unknown means and unknown variances and c) arbitrary discrete distributions with finite support.
Πέμπτη, Δεκέμβριος 17, 2020 - 12:30
ΚΥΚΛΟΣ ΣΕΜΙΝΑΡΙΩΝ ΣΤΑΤΙΣΤΙΚΗΣ ΔΕΚΕΜΒΡΙΟΣ 2020
Οδυσσέας Καναβέτας, Leiden University, Mathematical Institute
Optimal data driven policies under constrained multi-armed bandit observations
Σύνδεσμος Google Meeting: meet.google.com/usq-firh-fhs
ΠΕΡΙΛΗΨΗ
After a brief review of the multi-armed bandit (MAB) problem and its online machine learning applications, we present our work on the model with side constraints. The constraints represent circumstances in which bandit activations are restricted by the availability of certain resources that are replenished at a constant rate.
We consider the class of feasible uniformly fast (f-UF) convergent policies, that satisfy sample path wise the constraints. We first establish a necessary asymptotic lower bound for the rate of increase of the regret (i.e., loss due to the need to estimate unknown parameters) function of f-UF policies. Then, under pertinent conditions, we establish the existence of asymptotically optimal policies by constructing a class of f-UF policies that achieve this lower bound.
We provide the explicit form of such policies for cases in which the unknown distributions are a) Normal with unknown means and known variances, b) Normal distributions with unknown means and unknown variances and c) arbitrary discrete distributions with finite support.
- AUEB Stats Seminars 17/12/2020: Optimal data driven policies under constrained multi-armed bandit observations by Odysseas Kanavetas
- AUEB SEMINARS - 21/10/2015: Optimal Adaptive Policies in the Multi-Armed-Bandit Problem and Extensions
- AUEB STATS SEMINARS 6/2/2020: Monitoring Compositional Data using Multivariate EWMA by Philippe Castagliola
- AUEB Stats Seminars - Schedule Sep-Dec 2020
- AUEB STATS SEMINARS 13/9/2017: Optimal portfolio and Consumption allocation under a Disappointment Aversion Type Utility Function
Permissions in this forum:
You cannot reply to topics in this forum