Game theory, distributional reinforcement learning, control and verification
Wednesday, 7 June 2023, 12.00pm to 1.30pm
Speaker: Prof. Alessandro Abate, Dr. Licio Romao, Dr. Yulong Gao and Dr. Jiarui Gan. University of Oxford
Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38.
This week, the MLG looks forward to welcoming four guest speakers from Oxford.
_Title:_ Formal Synthesis with Neural Templates
Speaker: Prof. Alessandro Abate (Dept. Computer Science, Univ. of Oxford, UK)
_Abstract:_ I shall present recent work on CEGIS, a "counterexample-guided inductive synthesis'' framework for sound synthesis tasks that are relevant for dynamical models, control problems, and software programs. The inductive synthesis framework comprises the interaction of two components, a learner and a verifier. The learner trains a neural template on finite samples. The verifier soundly validates the candidates trained by the learner, by means of calls to a SAT-modulo-theory solver. Whenever the candidate is not valid, SMT-generated counter-examples are passed to the learner for further training.
_Bio:_ Alessandro Abate is Professor of Verification and Control in the Department of Computer Science at the University of Oxford, where he is also Deputy Head of Department. Earlier, he did research at Stanford University and at SRI International, and was an Assistant Professor at the Delft Center for Systems and Control, TU Delft. He received an MS/PhD from the University of Padova and UC Berkeley. His research interests lie on the formal verification and control of stochastic hybrid systems, and in their applications in cyber-physical systems, particularly involving safety criticality and energy. He blends in techniques from machine learning and AI, such as Bayesian inference, reinforcement learning, and game theory.
_Title:_ Policy synthesis with guarantees
_Speaker:_ Dr. Licio Romao (Dept. Computer Science, Univ. of Oxford, UK)
_Abstract:_ In this talk, I will present two techniques to perform feedback policy synthesis with guarantees. First, I will introduce a new concept of RL robustness and show how to obtain the best robust policy within a class of sub-optimal solutions by leveraging lexicographic optimisation. The proposed notion of robustness is motivated by the fact that, at deployment, the state of the system may not be precisely known due to measurement errors. In the second part of the talk, I will present a new technique to derive abstractions of stochastic dynamical systems. Our methodology is agnostic to the probability measure that generates the noise and leads to an interval Markov Decision Process (iMDP) representation of the original dynamics; the interval transition probability contains, with high probability, the true transition probability between states of the abstraction. The PAC guarantees of the proposed framework are obtained due to a non-trivial connection with the scenario approach theory, a technique that has had tremendous success within the control community.
_Bio:_ Licio Romao is a postdoctoral research assistant in the Department of Computer Science at the University of Oxford. He obtained his PhD in August 2021 from the Department of Engineering Science, and MSc and BSc from the University of Campinas (UNICAMP) and the Federal University of Campina Grande (UFCG), respectively. His PhD thesis was awarded the Institute of Engineering Technology’s (IET) Control and Automation Dissertation Prize 2021. His research combines techniques from formal verification, control theory, applied mathematics, and machine learning to enable the design of safer and more reliable feedback systems.
· D. Jarne, L. Romao, L. Hammond, M. Mazo Jr, A. Abate. Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives. 2023. Link: https://licioromao.com/assets/papers/JRHMA23.pdf.
· T. Badings, L. Romao, A. Abate, D. Parker, H. Poonwala, M. Stoelinga, N. Jensen. Robust Control for Dynamical Systems with Non-Gaussian via Formal Abstractions. Journal of Artificial Inteligence Research. 2023. Link: https://licioromao.com/assets/papers/BRAPPSJ23.pdf.
· T. Badings, L. Romao, A. Abate, N. Jensen. Probabilities are not enough: formal controller synthesis for stochastic dynamical systems with epistemic uncertainty. AAAI Conference On Artificial Intelligence , 2023. Link: https://licioromao.com/assets/papers/BRAJ23a.pdf.
_Title:_ Policy Evaluation in Distributional LQR
_Speaker:_ Dr. Yulong Gao (Dept. Computer Science, Univ. of Oxford, UK)
_Abstract:_ Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this talk, I will discuss a special class of DRL problems that rely on discounted linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call distributional LQR. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exoge- nous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results. (https://arxiv.org/abs/2303.13657)
_Bio:_ Yulong Gao is a postdoctoral researcher at the Department of Computer Science, University of Oxford. He received the joint Ph.D. degree in Electrical Engineering in 2021 from KTH Royal Institute of Technology, Sweden, and Nanyang Technological University, Singapore. Before moving to Oxford, he was a Researcher at KTH from 2021 to 2022. He was the receipt of the VR International Postdoc Grant from Swedish Research Council. His research interests include automatic verification, stochastic control and model predictive control with application to safety-critical systems.
_Title:_ Sequential information and mechanism design
_Speaker:_ Dr. Jiarui Gan (Dept. Computer Science, Univ. of Oxford, UK)
_Abstract:_ Many problems in game theory involve reasoning between multiple parties with asymmetric access to information. This broad class of problems lead to many research questions about information and mechanism design, with broad-ranging applications from governance and public administration to e-commerce and financial services. In particular, there has been a recent surge of interest in exploring the more generalized sequential versions of these problems, where players interact over multiple time steps in a changing environment. In this talk, I will present a framework of sequential principal-agent problems that is capable of modeling a wide range of information and mechanism design problems. I will talk about our recent algorithmic results on the computation and learning of optimal decision-making in this framework.
_Bio:_ Jiarui Gan is a Departmental Lecturer at the Computer Science Department, University of Oxford, working in the Artificial Intelligence & Machine Learning research theme. Before this he was a postdoctoral researcher at Max Planck Institute for Software Systems, and he obtained his PhD from Oxford. Jiarui is broadly interested in algorithmic problems in game theory. His current focus is on sequential information and mechanism design problems. His recent work has been selected for an Outstanding Paper Honorable Mention at the AAAI'22 conference.