Topics in Reinforcement Learning
Reinforcement learning (RL) is a powerful framework for solving sequential decision making problems and has enjoyed tremendous success, e.g., playing the game of Go and nuclear fusion control.
In this course, we will dive into some theoretical topics of RL. You are expected to be comfortable with reading and writing proofs involving linear algebra and probability. You are NOT expected to know RL. We will make sure you can catch up even if you do not know RL before.
After this course, you will be able to catch up with most RL papers easily and be well prepared to do research in RL.
Logistics:
- Instructor: Shangtong Zhang
- Location: Mechanical Engineering Building 339
- Time: Tuesday & Thursday, 9:30 - 10:45
- TA: Chijung Jung
- Office Hour:
- Shangtong: Tuesday & Thursday 11:00 - 12:00 (422 Rice Hall)
- Chijung: Tuesday 13:00 - 15:00 (Zoom by default, or in-persion by request)
- UVACollab: RL-CS6501-F22
Topics:
- properties of Markov chains
- Markov decision processes and performance metrics
- dynamic programming
- temporal difference learning
- stochastic approximation
Grading:
- Reading assignment (5% x 6 = 30%):
- A Research or Non-research Course Project (70%):
- Project proposal (5%)
- Progress report (5%)
- Write-up (30%)
- Presentation (30%): The length of the presentation will be decided later depending on the number of groups but is at most 35 minutes including questions.
You have the choice, before the due of the first reading assignment, to redistribute the 30% of the reading assignment to the research project. Then you will have reading assignment (0%), write-up (45%), and presentation (45%). Consequently, I will use a higher standard when evaluating your research project. Though it is totally up to you, I recommend considering this only if you are already familiar with Sutton and Barto’s book. Please email me for confirmation if you want to do so.
All the submissions are expected to be PDF files generated by LaTeX. Here are some tips for LaTeX. Here are some tips for writing.
Schedule:
Date | Comments |
---|---|
08/23 | introduction.pdf |
08/25 | markov_chains.pdf (Revised on Sept 6) |
08/30 | |
09/01 | |
09/06 | |
09/08 | markov_decision_process.pdf (Revised on Sept 26) |
09/13 | 1st reading assignment due |
09/15 | |
09/20 | 2nd reading assignment due; project proposal due |
09/22 | |
09/27 | dynamic_programming.pdf (Revised on Oct 6); 3rd reading assignment due |
09/29 | |
10/04 | No class, reading days |
10/06 | 4th reading assignment due |
10/11 | temporal_difference_learning.pdf (Revised on Oct 18) |
10/13 | 5th reading assignment due |
10/18 | |
10/20 | 6th reading assignment due |
10/25 | stochastic_approximation.pdf (Revised on Oct 31) |
10/27 | progress report due |
11/01 | |
11/03 | |
11/08 | No class, election day |
11/10 | Project Presentation |
11/15 | Project Presentation |
11/17 | Project Presentation |
11/22 | Project Presentation |
11/24 | No class, Thanksgiving recess |
11/29 | No class, NeurIPS |
12/01 | No class, NeurIPS |
12/06 | Project Presentation |
12/16 | Writeup due |
Resources:
Books:
- Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto
- Markov Decision Processes: Discrete Stochastic Dynamic Programming by Martin Puterman
- Stochastic Approximation: A Dynamical Systems Viewpoint by Vivek Borkar
- Neuro-Dynamic Programming by Dimitri Bertsekas and John Tsitsiklis
- Markov Chains and Mixing Times by David Asher Levin, Elizabeth Wilmer, and Yuval Peres
Theses:
- Safe Reinforcement Learning by Philip Thomas
- Breaking the Deadly Triad in Reinforcement Learning by Shangtong Zhang
- Actor-Critic Algorithms by Vijaymohan Konda
Notes:
- Introduction to discrete-time Markov chains I by Karl Sigman
- Markov chains II: recurrence and limiting (stationary) distributions by Karl Sigman
Policies:
No late submission is allowed except for medical needs and reasonable career development needs. See all policies here.