Date Lecture Readings Logistics
M 01/13 Lecture #1 :
Introduction to Reinforcement and Representation Learning
[ slides ]

W 01/15 Lecture #2 :
Multi-armed Bandits
[ slides ]

F 01/17 Recitation #1:
Neural Nets, PyTorch, OpenAI Gym, Bandits
[ slides ]

M 01/20 No Class, MLK Jr Day

W 01/22 Lecture #3 :
Value-based Methods
[ slides ]

HW1 out (tentative)

F 01/24 Recitation #2:
Bandits, MDPs & HW1
[ slides ]

M 01/27 Lecture #4 :
Value-based Methods (cont.)
[ slides ]

W 01/29 Lecture #5 :
Evolutionary Methods for Policy Search, Policy Gradients
[ slides ]

F 01/31 No Recitation

M 02/03 Lecture #6 :
Actor-Critic Methods
[ slides ]

W 02/05 Lecture #7 :
Trust Region Constraints
[ slides ]

F 02/07 Recitation #3:
MCTS, TD Learning, Deep Q Learning, HW2 (DQN)
[ slides ]

M 02/10 Lecture #8 :
Maximum Entropy RL, Distributional RL, Deterministic Policy Gradients
[ slides ]

HW1 due 11:59pm, HW2 out (tentative)

W 02/12 Lecture #9 :
Imitation Learning I: Behavioral Cloning, Online imitation learning
[ slides ]

F 02/14 Recitation #4:
Quiz 1 Review
[ slides ]

M 02/17 Lecture #10 :
Imitation learning II: Multimodal policies, Multitask Policies
[ slides ]

W 02/19 Lecture #11 :
Visual Imitation
[ slides ]

F 02/21 Quiz 1

M 02/24 Lecture #12 :
Model Based RL I: Shooting / Collocation methods, Dyna
[ slides ]

W 02/26 Lecture #13 :
Model Based RL II: Online planning, MCTS, AlphaGo, AlphaZero
[ slides ]

HW2 due 11:59PM

F 02/28 Recitation #5:
Solutions to Quiz 1
[ slides ]

M 03/03 Spring Break - No Classes

W 03/05 Spring Break - No Classes

F 03/07 Spring Break - No Classes

M 03/10 Lecture #14 :
Model Based RL III: diffusion models and advanced topics
[ slides ]

HW3 out (tentative)

W 03/12 Lecture #15 :
Summary so far - BUFFER
[ slides ]

F 03/14 Recitation #6:
Diffusion policies (cont.), HW3 Recitation
[ slides ]

M 03/17 Lecture #16 :
Exploration 1: Optimism in the face of uncertainty, count-based exploration, intrinsic motivation, RND, go explore
[ slides ]

W 03/19 Lecture #17 :
Exploration 2: curiosity, maximum entropy exploration, bootstrapped ensembles
[ slides ]

F 03/21 Recitation #7:
MBRL in explicit and observable low-dimensional state spaces
[ slides ]

M 03/24 Lecture #18 :
Offline RL 1: going beyond imitation, problem statement, challenges in doing offline RL, policy gradient methods / policy constraints
[ slides ]

HW4 out (tentative)

W 03/26 Lecture #19 :
Offline RL 2: conservative methods, model-based approaches, modern model-free algorithms
[ slides ]

F 03/28 Quiz 2

M 03/31 Lecture #20 :
RL for POMDPs / meta-RL: Belief-space RL, POMDP solutions different from MDP solutions, RL^2, information-gathering actions vs optimal actions,
[ slides ]

W 04/02 Lecture #21 :
sim2real history-conditioning + domain randomization robot locomotion, drone flying or robot acrobatics (all use PPO)
[ slides ]

F 04/04 No Class, Spring Carnvial

M 04/07 Lecture #22 :
Foundations models for RL
[ slides ]

W 04/09 Lecture #23 :
RL for Training LLMs
[ slides ]

F 04/11 Recitation #8:
Homework 5
[ slides ]

M 04/14 Lecture #24 :
RL for Training LLMs
[ slides ]

HW5 out (tentative), HW4 due 11:59PM

W 04/16 Lecture #25 :

[ slides ]

F 04/18 Recitation #9:
No Recitation - Thanksgiving Break
[ slides ]

M 04/21 Lecture #26 :

[ slides ]

W 04/23 Lecture #27 :

[ slides ]

HW5 due 11:59PM

F 04/25 Recitation #10:
Quiz 3 Review
[ slides ]