Reinforcement learning (RL) is a powerful framework for solving sequential decision making problems and has enjoyed tremendous success, e.g., playing the game of Go and training ChatGPT. This course is designed to cover important ideas and milestone papers of RL.

Logistics

  • Instructor: Shangtong Zhang
  • TA: Jiuqi Wang
  • Location: Olsson Hall 011
  • Time: Tuesday, Thursday, 2:00 - 3:15
  • Office Hours:
    • Shangtong: Tuesday, Thursday, 1:30 - 2:00, Rice 422
    • Jiuqi: Friday, 10:00 - 11:00, Rice 442
  • UVACanvas
  • Prerequisite: This course requires a good understanding of probability, linear algebra, and calculus.
  • If you need approval for enrollment, please move forward directly by submitting the proper forms, assuming I will approve it. All the information I have about this course is available on this website, so please exercise your judgment. Different schools and colleges have different required forms. It is your responsibility to figure out which form to submit and where to submit it - this is another hidden prerequisite for this course. I will not be able to manually handle any individual enrollment / waitlist permission request until near the end of the enrollment deadline.

Teaching

Grading

  • Paper Review (60 points): You will read 12 papers / topics (one paper / topic each week starting from the ADD deadline). For each paper / topic, you need to write a summary and ask 3 questions. Each paper review has 5 points.
    • We will use some tool to test whether your paper review is generated by LLM. If the tool alarms, you will be asked to meet with me so I can ask you questions to evaluate whether you fully understand what you write in the paper review.
  • Project (40 points): You can solo the project or have a team of no more than 3 people. If you have a team, all the team member will receive exactly the same points for the project.
    • Milestone 1 (10 points): Identify a hypothesis you want to test. Articulate why this hypothesis is interesting and what the consequences are if the hypothesis is correct or wrong.
    • Milestone 2 (10 points): Design experiments to test your hypothesis.
    • Milestone 3 (10 points): Execute the experiments and write a report (importantly, it’s totally fine to have negative results)
    • Milestone 4 (10 points): Presentation (format TBD)
  • Bonus (5 points): complete the course evaluation survey at the end of the semester.

Policies

  • Late Policy: Each deadline has an 8-hour graceful period without any penalty. If you need extensions for career development purposes (e.g., attending a conference, preparing an important interview), you need to email me one week before the deadline. Everyone has a single chance for an 1-week late submission without any penalty (note that this cannot be used for Project Milestones 3 and 4). If you want to use this late submission opportunity, please declare here. No other hindsight extension is possible unless doctor notes or SDAC notifications are provided.
  • Regrading Policy: For every homework, one regrading request is allowed.