Schedule
Date | Lecture | Readings | Logistics | |
---|---|---|---|---|
M 01/13 |
Lecture #1
:
Introduction to Reinforcement and Representation Learning [ slides ] |
|
||
W 01/15 |
Lecture #2
:
Multi-armed Bandits [ slides ] |
|
||
F 01/17 |
Recitation #1:
Neural Nets, PyTorch, OpenAI Gym, Bandits [ slides ] |
|
||
M 01/20 | No Class, MLK Jr Day | |||
W 01/22 |
Lecture #3
:
Value-based Methods [ slides ] |
HW1 out (tentative) |
||
F 01/24 |
Recitation #2:
Bandits, MDPs & HW1 [ slides ] |
|||
M 01/27 |
Lecture #4
:
Value-based Methods (cont.) [ slides ] |
|
||
W 01/29 |
Lecture #5
:
Evolutionary Methods for Policy Search, Policy Gradients [ slides ] |
|
||
F 01/31 | No Recitation | |||
M 02/03 |
Lecture #6
:
Actor-Critic Methods [ slides ] |
|||
W 02/05 |
Lecture #7
:
Trust Region Constraints [ slides ] |
|
||
F 02/07 |
Recitation #3:
MCTS, TD Learning, Deep Q Learning, HW2 (DQN) [ slides ] |
|||
M 02/10 |
Lecture #8
:
Maximum Entropy RL, Distributional RL, Deterministic Policy Gradients [ slides ] |
|
HW1 due 11:59pm, HW2 out (tentative) |
|
W 02/12 |
Lecture #9
:
Imitation Learning I: Behavioral Cloning, Online imitation learning [ slides ] |
|
||
F 02/14 |
Recitation #4:
Quiz 1 Review [ slides ] |
|||
M 02/17 |
Lecture #10
:
Imitation learning II: Multimodal policies, Multitask Policies [ slides ] |
|||
W 02/19 |
Lecture #11
:
Visual Imitation [ slides ] |
|||
F 02/21 | Quiz 1 | |||
M 02/24 |
Lecture #12
:
Model Based RL I: Shooting / Collocation methods, Dyna [ slides ] |
|||
W 02/26 |
Lecture #13
:
Model Based RL II: Online planning, MCTS, AlphaGo, AlphaZero [ slides ] |
|
HW2 due 11:59PM |
|
F 02/28 |
Recitation #5:
Solutions to Quiz 1 [ slides ] |
|||
M 03/03 | Spring Break - No Classes | |||
W 03/05 | Spring Break - No Classes | |||
F 03/07 | Spring Break - No Classes | |||
M 03/10 |
Lecture #14
:
Model Based RL III: diffusion models and advanced topics [ slides ] |
|
HW3 out (tentative) |
|
W 03/12 |
Lecture #15
:
Summary so far - BUFFER [ slides ] |
|||
F 03/14 |
Recitation #6:
Diffusion policies (cont.), HW3 Recitation [ slides ] |
|||
M 03/17 |
Lecture #16
:
Exploration 1: Optimism in the face of uncertainty, count-based exploration, intrinsic motivation, RND, go explore [ slides ] |
|
||
W 03/19 |
Lecture #17
:
Exploration 2: curiosity, maximum entropy exploration, bootstrapped ensembles [ slides ] |
|
||
F 03/21 |
Recitation #7:
MBRL in explicit and observable low-dimensional state spaces [ slides ] |
|||
M 03/24 |
Lecture #18
:
Offline RL 1: going beyond imitation, problem statement, challenges in doing offline RL, policy gradient methods / policy constraints [ slides ] |
|
HW4 out (tentative) |
|
W 03/26 |
Lecture #19
:
Offline RL 2: conservative methods, model-based approaches, modern model-free algorithms [ slides ] |
|
||
F 03/28 | Quiz 2 | |||
M 03/31 |
Lecture #20
:
RL for POMDPs / meta-RL: Belief-space RL, POMDP solutions different from MDP solutions, RL^2, information-gathering actions vs optimal actions, [ slides ] |
|||
W 04/02 |
Lecture #21
:
sim2real history-conditioning + domain randomization robot locomotion, drone flying or robot acrobatics (all use PPO) [ slides ] |
|||
F 04/04 | No Class, Spring Carnvial | |||
M 04/07 |
Lecture #22
:
Foundations models for RL [ slides ] |
|||
W 04/09 |
Lecture #23
:
RL for Training LLMs [ slides ] |
|||
F 04/11 |
Recitation #8:
Homework 5 [ slides ] |
|||
M 04/14 |
Lecture #24
:
RL for Training LLMs [ slides ] |
HW5 out (tentative), HW4 due 11:59PM |
||
W 04/16 |
Lecture #25
:
[ slides ] |
|||
F 04/18 |
Recitation #9:
No Recitation - Thanksgiving Break [ slides ] |
|||
M 04/21 |
Lecture #26
:
[ slides ] |
|||
W 04/23 |
Lecture #27
:
[ slides ] |
HW5 due 11:59PM |
||
F 04/25 |
Recitation #10:
Quiz 3 Review [ slides ] |