Reinforcement learning (RL) is a powerful framework for solving sequential decision making problems and has enjoyed tremendous success, e.g., playing the game of Go and nuclear fusion control.

In this course, we will dive into some theoretical topics of RL. You are expected to be comfortable with reading and writing proofs involving linear algebra and probability. You are NOT expected to know RL. We will make sure you can catch up even if you do not know RL before.

After this course, you will be able to catch up with most RL papers easily and be well prepared to do research in RL.


  • Instructor: Shangtong Zhang
  • Location: Mechanical Engineering Building 339
  • Time: Tuesday & Thursday, 9:30 - 10:45
  • TA: Chijung Jung
  • Office Hour:
    • Shangtong: Tuesday & Thursday 11:00 - 12:00 (422 Rice Hall)
    • Chijung: Tuesday 13:00 - 15:00 (Zoom by default, or in-persion by request)
  • UVACollab: RL-CS6501-F22


  • properties of Markov chains
  • Markov decision processes and performance metrics
  • dynamic programming
  • temporal difference learning
  • stochastic approximation


  • Reading assignment (5% x 6 = 30%):
  • A Research or Non-research Course Project (70%):
    • Project proposal (5%)
    • Progress report (5%)
    • Write-up (30%)
    • Presentation (30%): The length of the presentation will be decided later depending on the number of groups but is at most 35 minutes including questions.

You have the choice, before the due of the first reading assignment, to redistribute the 30% of the reading assignment to the research project. Then you will have reading assignment (0%), write-up (45%), and presentation (45%). Consequently, I will use a higher standard when evaluating your research project. Though it is totally up to you, I recommend considering this only if you are already familiar with Sutton and Barto’s book. Please email me for confirmation if you want to do so.

All the submissions are expected to be PDF files generated by LaTeX. Here are some tips for LaTeX. Here are some tips for writing.


Date Comments
08/23 introduction.pdf
08/25 markov_chains.pdf (Revised on Sept 6)
09/08 markov_decision_process.pdf (Revised on Sept 26)
09/13 1st reading assignment due
09/20 2nd reading assignment due; project proposal due
09/27 dynamic_programming.pdf (Revised on Oct 6); 3rd reading assignment due
10/04 No class, reading days
10/06 4th reading assignment due
10/11 temporal_difference_learning.pdf (Revised on Oct 18)
10/13 5th reading assignment due
10/20 6th reading assignment due
10/25 stochastic_approximation.pdf (Revised on Oct 31)
10/27 progress report due
11/08 No class, election day
11/10 Project Presentation
11/15 Project Presentation
11/17 Project Presentation
11/22 Project Presentation
11/24 No class, Thanksgiving recess
11/29 No class, NeurIPS
12/01 No class, NeurIPS
12/06 Project Presentation
12/16 Writeup due






No late submission is allowed except for medical needs and reasonable career development needs. See all policies here.