Reinforcement learning (RL) is a powerful framework for solving sequential decision making problems and has enjoyed tremendous success, e.g., playing the game of Go and training ChatGPT. This course is designed to cover basic but important ideas of RL, as well as milestone papers in deep RL, i.e., RL with deep neural networks.


  • Instructor: Shangtong Zhang
  • Location: TBA
  • Time: Tuesday & Thursday, 11:00 - 12:15
  • Office Hours:
    • TBA
  • UVACanvas: TBA
  • Prerequisite: This course will be light in math but still requires basic ideas of probability, linear algebra, and calculus. Many assignments will be programming-based so we also need Python.
  • If you need approval for registration, please move forward directly by submitting the proper forms, assuming I will approve it. All the information I have about this course is available on this website, so please exercise your judgment. Different schools and colleges have different required forms. It is your responsibility to figure out which form to submit and where to submit it - this is another hidden prerequisite for this course. I will not be able to manually handle any individual enrollment / permission request until near the end of the enrollment deadline.


Grading (TBA):

Roadmap (TBA):


  • Late Policy: Late submission within 8 hours (a grace period) has no penalty. Late submission within 24 hours loses 33% of the scores. Late submission within 48 hours loses 66% of the scores. Late submission after 48 hours loses all scores. No late submission is allowed for the final projection presentation and writeup. No exception will be made unless doctor notes or SDAC notifications are provided.
  • Regrading Policy: For every assignment, one regrading request is allowed. I will regrade the entire assignment and there is no guarantee that the score will not decrease.