CISE Seminar: April 5, 2019 – Mengdi Wang, Princeton University
BU Photonics Building
8 St. Mary’s Street, PHO 211
3:00pm-4:00pm
Mengdi Wang
Princeton University
State Compression and Primal-Dual Reinforcement Learning
Recent years have witnessed increasing empirical successes in reinforcement learning. However, many statistical questions about reinforcement learning are not well understood even in the most basic setting. For example, how many sample transitions are needed and sufficient for estimating a near-optimal policy for Markov decision problem (MDP)? In the first part, we survey recent advances on the methods and complexity for Markov decision problems (MDP) with finitely many state and actions – a most basic model for reinforcement learning. In the second part we study the statistical state compression of general finite-state Markov processes. We propose a spectral state compression method for learning state features and aggregation structures from data. The state compression method is able to “ sketch” a black-box Markov process from its empirical data, for which both minimax statistical guarantees and scalable computational tools are provided. In the third part, we propose a bilinear primal-dual pi learning method for learning the optimal policy of MDP, which utilizes given state features. The method is motivated from a saddle point formulation of the Bellman equation. Its sample complexity depends only on the number of parameters and is variant with respect to the dimension of the problem, making high-dimensional reinforcement learning possible using “small” data.
Mengdi Wang is an assistant professor at the Department of Operations Research and Financial Engineering at Princeton University. She is also affiliated with the Department of Computer Science and Princeton’s Center for Statistics and Machine Learning. Her research focuses on data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every threeyears), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, the Google Faculty Award in 2017, and the MIT Tech Review 35-Under-35 Innovation Award (China region) in 2018. She is currently serving as an associate editor for Operations Research.
Faculty Host: Yannis Paschalidis
Student Host: Noushin Mehdipour