Topics in Reinforcement Learning

Reinforcement learning (RL) is a powerful framework for solving sequential decision making problems and has enjoyed tremendous success, e.g., playing the game of Go and nuclear fusion control.

In this course, we will dive into some theoretical topics of RL. You are expected to be comfortable with reading and writing proofs involving linear algebra and probability. You are NOT expected to know RL. We will make sure you can catch up even if you do not know RL before.

After this course, you will be able to catch up with most RL papers easily and be well prepared to do research in RL.

Logistics:

Instructor: Shangtong Zhang
Location: Mechanical Engineering Building 339
Time: Tuesday & Thursday, 9:30 - 10:45
TA: Chijung Jung
Office Hour:
- Shangtong: Tuesday & Thursday 11:00 - 12:00 (422 Rice Hall)
- Chijung: Tuesday 13:00 - 15:00 (Zoom by default, or in-persion by request)
UVACollab: RL-CS6501-F22

Topics:

properties of Markov chains
Markov decision processes and performance metrics
dynamic programming
temporal difference learning
stochastic approximation

Grading:

Reading assignment (5% x 6 = 30%):
A Research or Non-research Course Project (70%):
- Project proposal (5%)
- Progress report (5%)
- Write-up (30%)
- Presentation (30%): The length of the presentation will be decided later depending on the number of groups but is at most 35 minutes including questions.

You have the choice, before the due of the first reading assignment, to redistribute the 30% of the reading assignment to the research project. Then you will have reading assignment (0%), write-up (45%), and presentation (45%). Consequently, I will use a higher standard when evaluating your research project. Though it is totally up to you, I recommend considering this only if you are already familiar with Sutton and Barto’s book. Please email me for confirmation if you want to do so.

All the submissions are expected to be PDF files generated by LaTeX. Here are some tips for LaTeX. Here are some tips for writing.

Schedule:

Date	Comments
08/23	introduction.pdf
08/25	markov_chains.pdf (Revised on Sept 6)
08/30
09/01
09/06
09/08	markov_decision_process.pdf (Revised on Sept 26)
09/13	1st reading assignment due
09/15
09/20	2nd reading assignment due; project proposal due
09/22
09/27	dynamic_programming.pdf (Revised on Oct 6); 3rd reading assignment due
09/29
10/04	No class, reading days
10/06	4th reading assignment due
10/11	temporal_difference_learning.pdf (Revised on Oct 18)
10/13	5th reading assignment due
10/18
10/20	6th reading assignment due
10/25	stochastic_approximation.pdf (Revised on Oct 31)
10/27	progress report due
11/01
11/03
11/08	No class, election day
11/10	Project Presentation
11/15	Project Presentation
11/17	Project Presentation
11/22	Project Presentation
11/24	No class, Thanksgiving recess
11/29	No class, NeurIPS
12/01	No class, NeurIPS
12/06	Project Presentation
12/16	Writeup due

Resources:

Books:

Reinforcement Learning: An Introduction by Richard Sutton and Andrew Barto
Markov Decision Processes: Discrete Stochastic Dynamic Programming by Martin Puterman
Stochastic Approximation: A Dynamical Systems Viewpoint by Vivek Borkar
Neuro-Dynamic Programming by Dimitri Bertsekas and John Tsitsiklis
Markov Chains and Mixing Times by David Asher Levin, Elizabeth Wilmer, and Yuval Peres

Theses:

Safe Reinforcement Learning by Philip Thomas
Breaking the Deadly Triad in Reinforcement Learning by Shangtong Zhang
Actor-Critic Algorithms by Vijaymohan Konda

Notes:

Introduction to discrete-time Markov chains I by Karl Sigman
Markov chains II: recurrence and limiting (stationary) distributions by Karl Sigman

Policies:

No late submission is allowed except for medical needs and reasonable career development needs. See all policies here.