Date Lecture Readings Logistics
M 08/30 Lecture #1 (Katerina & Ruslan):
Introduction to Reinforcement and Representation Learning
[ slides | video ]

W 09/01 Lecture #2 (Ruslan):
Multi-armed Bandits
[ slides | video ]

F 09/03 Recitation #1:
Neural Nets, TensorFlow & Keras, OpenAI Gym, Bandits
[ slides | slides 2 | video | notes | notes 2 ]

M 09/06 Lecture #3 :
Labor Day - No Classes
[ slides ]

W 09/08 Lecture #4 (Ruslan):
Markov Decision Processes, Value Iteration, Policy Iteration
[ slides | video ]

HW1 out (tentative)

F 09/10 Recitation #2:
Pytorch, Training Resources & HW1
[ slides | video | notes ]

M 09/13 Lecture #5 (Ruslan):
Monte Carlo Learning and Temporal Difference Learning
[ slides | video ]

W 09/15 Lecture #6 (Ruslan):
Monte Carlo Learning and Temporal Difference Learning (Cont.)
[ slides | video ]

F 09/17 Recitation #3:
HW1
[ slides | video ]

M 09/20 Lecture #7 (Katerina):
Function approximation in prediction and control, Deep Q-learning, Deep SARSA
[ slides | video ]

W 09/22 Lecture #8 (Katerina):
Planning, Monte Carlo Tree search
[ slides | video ]

F 09/24 Recitation #4:
MCTS, TD Learning, Deep Q Learning
[ slides | video ]

M 09/27 Lecture #9 (Ruslan):
Policy gradients, REINFORCE, Actor-Critic methods
[ slides | video ]

HW1 due 09/27 11:59pm

W 09/29 Lecture #10 (Ruslan):
Policy gradients, REINFORCE, Actor-Critic methods (cont.)
[ slides | video ]

HW2 out (tentative)

F 10/01 Recitation #5:
Quiz 1 Review
[ slides | video ]

M 10/04 Lecture #11 (Katerina):
Natural PG, PPO, TRPO
[ slides | video ]

W 10/06 Lecture #12 (Katerina and Ben):
Maximum Entropy RL, soft actor critic, Deterministic Policy gradient, re-parametrized PG
[ slides | slides 2 | video ]

F 10/08 Quiz 1 [covering everything through Lecture 10, Wednesday, September 29]

M 10/11 Lecture #13 (Katerina):
Evolutionary methods for policy search
[ slides | video ]

W 10/13 Lecture #14 (Ruslan):
Imitation learning, behavior cloning
[ slides | video ]

F 10/15 Mid-semester break - No classes

M 10/18 Lecture #14 (Katerina):
Imitation learning (cont.), Adversarial imitation learning
[ slides | video ]

HW3 out (tentative), HW2 due 10/18 11:59PM

W 10/20 Lecture #15 (Katerina):
Model based RL, Low dimensional model, Explicit models.
[ slides | video ]

F 10/22 Recitation #7:
Homework 3
[ slides | video ]

M 10/25 Lecture #16 (Katerina):
MBRL (cont), AlphaGo, AlphaGoZero, MuZero
[ slides | video ]

W 10/27 Lecture #17 (Katerina):
MBRL (cont.) Holistic and graph-based world models
[ slides | video ]

F 10/29 Recitation #8:
Quiz 2 Review
[ slides | video ]

M 11/01 Lecture #18 (Katerina):
MBRL (cont.) Time dependent linear models, iLQR
[ slides | video ]

HW3 due 11/01 11:59pm

W 11/03 Lecture #19 (Katerina):
MBRL(cont), stochastic world models
[ slides | slides 2 | video ]

F 11/05 No Classes - Day for Community Engagement

M 11/08 Quiz 2 [covering everything from lectures 10-19 (Wednesday, Nov 03)]

Pass/Fail Grade Option Deadline

W 11/10 Lecture #20 (Ruslan):
Offline RL
[ slides | slides 2 | video ]

F 11/12 Lecture #21 (Katerina):
Intelligent Exploration
[ slides | video ]

M 11/15 Lecture #22 (Katerina):
Deep exploration (cont.) and Sim2Real tranfer
[ slides | video ]

W 11/17 Lecture #23 (Katerina):
Sim2Real tranfer (cont.) and Visual Imitation Learning
[ slides | video ]

F 11/19 Recitation #9:
Homework 4
[ slides ]

M 11/22 Lecture #24 (Daniel Seita (https://www.cs.cmu.edu/~dseita/)):
GUEST lecture: Visual Imitation Learning (cont.) and vision-based manipulation with Transporters
[ slides ]

W 11/24 Thanksgiving Break - No classes

F 11/26 Thanksgiving Break - No classes

M 11/29 Lecture #25 (Ruslan):
Deep RL for Navigation
[ slides ]

HW4 due 11/29 11:59pm

W 12/01 Lecture #26 (Ruslan):
Efficient Distributed RL
[ slides ]

F 12/03 Recitation #10:
Quiz 3 Review
[ slides ]

F 12/07 Quiz 3 (1-4pm)

Pass/Fail Grade Option Deadline