江苏高校优势学科概率统计前沿系列讲座之一百五十七
发布时间: 2023-10-12  浏览次数: 11

报 告 人:刘卫东 教授

报告题目:Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

报告时间:2023年10月14日(周六上午10:10 )

报告地点:太阳集团官方网站入口学术报告厅(静远楼1506室)

主办单位:数学研究院、太阳集团官方网站入口、科学技术研究院

报告人简介:

       刘卫东,上海交通大学特聘教授,国家杰出青年科学基金获得者,中国工业与应用数学学会理事。主要研究方向为统计学和机器学习等,目前已在AOS、 JASA、JRSSB、Biometrika、JMLR、ICML、IJCAI、IEEE TSP等专业顶尖期刊/会议上发表论文六十余篇。主持国家重点研发计划课题1项,国家杰出青年科学基金1项,国家优秀青年科学基金1项。

报告摘要: 

       Recently, reinforcement learning has gained prominence in modern statistics, with policy evaluation being a key component. Unlike traditional machine learning literature on this topic, our work places emphasis on statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop an online robust policy evaluation procedure, and establish the limiting distribution of our estimator, based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.


关闭当前窗口
太阳集团官方网站入口 2018