|
B.S. 1967 University of Science and Technology of China
M.S. 1981 and Ph.D. 1984 Harvard University
Research Fellow Harvard University, 1984 to 1986
Chair Professor & Director The Research Center for Networking, The Hong Kong University of Science and Technology (HKUST)
Fellow IEEE
Member The Technical Board of IFAC
Chairman IEEE Fellow Evalution Comittee of IEEE Control System of Society
Editor-in-Chief Discrete Event Dynamic Systems: Theory and Applications
Associate Editor at Large IEEE Transactions of Automatic Control
Vice President Asian Control Association
The Outstanding Transactions Paper Award The IEEE Control System Society, 1987
The Outstanding Publication Award The College of Simulation, Institution of Management Science, U.S.A., 1990
Visiting Positions Harvard University, Tsinghua University, AT&T Labsen
|
|
Different approaches to learning and optimization of stochastic systems, such as perturbation analysis (PA) in control systems, Markov decision processes (MDPs) in operations research, and reinforcement learning (RL) in computer science, were developed to achieve the same goal: optimizing a system's performance by analyzing its dynamic properties. In this talk, we show that these different disciplines can be explained with a sensitivity point of view in a unified way. The fundamental elements of learning and optimization are two performance sensitivity formulas, one for performance gradients and the other for performance differences, which follow directly from the Poisson equation. These two formulas lead to many closely related results in different disciplines, including PA, MDPs, and RL. Many learning techniques and implementation methods, such as Q-learning, TD(), neuro-dynamic programming, PA-based gradient estimates, on-line policy iteration, potential aggregation, Lebesgue sampling, etc., fit well this sensitivity-based framework. We also briefly introduce a new optimization approach, called the event-based optimization, which was developed with the sensitivity-based view. The fundamental ideas can be illustrated clearly by a "map" of the learning and optimization world, with the two sensitivity formulas at its center.
|