10-403 Deep RL | Schedule

Date	Lecture	Readings	Logistics
M 01/12	Lecture #1 : Welcome and Introduction to the Class [ slides ]	S & B Textbook, Ch1 (optional) Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies (optional) Dan Wolpert't talk The real reason for brains
W 01/14	Lecture #2 : Introduction to Reinforcement Learning [ slides ]	S & B Textbook, Ch1 S&B ch3 ch4 5.1-5.4 5.7, 6.1-6.7
F 01/16	Lecture #3 : Value-based Methods [ slides ]	Mnih et al. Playing Atari with Deep Reinforcement Learning Martin Riedmiller Neural Fitted Q-Iteration S&B ch3 ch4 5.1-5.4 5.7, 6.1-6.7	HW1 out
M 01/19	No Class, MLK Day
W 01/21	Recitation #1: Open AI Gym, PyTorch and DQN Details, & HW1 [ slides ]	G, B & C Textbook, Ch9, Ch10 Gym tutorial notebook PyTorch tutorial notebook
F 01/23	Recitation #2: Setup and Debugging OH (in class) [ slides ]
M 01/26	Lecture #4 : Value based methods (Cont.), Evolutionary Methods for Policy Search [ slides ]	Deep Reinforcement Learning with Double Q-learning S & B Textbook, Ch8.11 Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning (optional) Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial
W 01/28	Lecture #5 : Policy Gradient Methods [ slides ]	G, B & C Textbook, Ch13 http://karpathy.github.io/2016/05/31/rl/	HW1 Due, HW2 out
F 01/30	Recitation #3: Policy Gradients & HW2 [ slides ]
M 02/02	Lecture #6 : How Far Can We Update a Policy? Step Size and Stability in On-Policy and Off-Policy Policy Gradients [ slides \| slides 2 ]	Proximal Policy Optimization Algorithms Mnih et al. Asynchronous Methods for Deep Reinforcement Learning
W 02/04	Lecture #7 : Step Size and Stability in Policy Gradients (cont.) [ slides \| slides 2 ]	Proximal Policy Optimization Algorithms	HW2 Due, HW3 Out
F 02/06	Recitation #4: Policy-based Methods & HW3 [ slides ]
M 02/09	Lecture #8 : Actor Critic with Pathwise Derivatives [ slides ]	Continuous control with deep reinforcement learning Addressing Function Approximation Error in Actor-Critic Methods Soft Actor-Critic Algorithms and Applications
T 02/10	HW4 Out
W 02/11	Lecture #9 : Imitation Learning, Behavior Cloning [ slides ]	(optional) Chen et al. Learning by Cheating Bagnell. An Invitation to Imitation, Up To Page 10 Bojarski et al. End to End Learning for Self-Driving Cars
F 02/13	Recitation #5: Midterm Review and HW4 [ slides ]
M 02/16	Lecture #10 : GAIL , Multi-goal RL and IL [ slides ]	Andrychowicz et al. Hindsight Experience Replay Ding et al. Goal-conditioned Imitation Learning	HW3 Due
W 02/18	Lecture #11 : Diffusion Models for Imitation Learning [ slides ]	Luo et al. Understanding Diffusion Models:A Unified Perspective Pearce et al. Imitating Human Behaviour with Diffusion Models
F 02/20	Midterm
M 02/23	Lecture #12 : Diffusion Models for Imitation Learning (cont.) [ slides ]	Luo et al. Understanding Diffusion Models:A Unified Perspective Pearce et al. Imitating Human Behaviour with Diffusion Models
T 02/24	HW4 Due , HW5 Out
W 02/25	Lecture #13 : Learning and Search: MCTS, AlphaGo, AlphaZero [ slides ]	(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search David Silver et al. Mastering the game of Go without human knowledge Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
F 02/27	Recitation #6: IL Diffusion Policies and HW5 [ slides ]
M 03/02	Spring Break - No Classes
W 03/04	Spring Break - No Classes
F 03/06	Spring Break - No Classes
M 03/09	Lecture #14 : MBRL (cont.) [ slides ]	Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (optional) Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games (optional) Kaiser et al. Model-Based Reinforcement Learning for Atari Hansen et al. Temporal Difference Learning for Model Predictive Control
W 03/11	Lecture #15 : MBRL (cont.) [ slides ]	Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models (optional) Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games (optional) Kaiser et al. Model-Based Reinforcement Learning for Atari Hansen et al. Temporal Difference Learning for Model Predictive Control	HW5 Due, HW6 Out
F 03/13	Recitation #7: TD-MPC / PETS & HW6 [ slides ]
M 03/16	Lecture #16 : Model-based Methods for offline RL [ slides ]	Julian Schrittwieser et al. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model Janner et al. Planning with Diffusion for Flexible Behavior Synthesis (optional) Yang et al. Diffusion-ES Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following
W 03/18	Lecture #17 : Guided Diffusion [ slides ]	Janner et al. Planning with Diffusion for Flexible Behavior Synthesis (optional) Yang et al. Diffusion-ES Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following
F 03/20	Recitation #8: Midterm Solutions [ slides ]
M 03/23	Lecture #18 : Offline RL [ slides ]	Kostrikov et al. Offline Reinforcement Learning with Implicit Q-Learning Kumar et al. Conservative Q-Learning for Offline Reinforcement Learning Fujimoto & Gu A Minimalist Approach to Offline Reinforcement Learning Mandlekar et al. IRIS- Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data Peng et al. Advantage-Weighted Regression- Simple and Scalable Off-Policy Reinforcement Learning
W 03/25	Lecture #19 : Offline RL (Cont.) [ slides ]	Kostrikov et al. Offline Reinforcement Learning with Implicit Q-Learning Kumar et al. Conservative Q-Learning for Offline Reinforcement Learning Fujimoto & Gu A Minimalist Approach to Offline Reinforcement Learning Mandlekar et al. IRIS- Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data Peng et al. Advantage-Weighted Regression- Simple and Scalable Off-Policy Reinforcement Learning	HW6 Due, HW7 Out
F 03/27	Recitation #9: Offline RL & HW7 [ slides ]
M 03/30	Lecture #20 : Intelligent Exploration [ slides ]	Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Exploration by Random Network Distillation Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems
W 04/01	Lecture #21 : Intelligent Exploration [ slides ]	Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Exploration by Random Network Distillation Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems
F 04/03	Recitation #10: Exploration [ slides ]
M 04/06	Lecture #22 : Sim2Real Learning [ slides ]	Akkaya et al. Solving Rubik’s Cube with A Robot Hand Kumar et al. Rapid Motor Adaptation for Legged Robots Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Zhang et al. Track Any Motions under Any Disturbances
W 04/08	Lecture #23 : Foundation Models for RL [ slides ]	Liang et al. Code as Policies: Language Model Programs for Embodied Control Sarch et al. VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought Ma et al. Eureka: Human-Level Reward Design via Coding Large Language Models	HW7 Due, HW8 Out
F 04/10	Recitation #11: Sim2Real & HW8 (Video Recitation) [ slides \| video ]
M 04/13	Recitation #12: RL for Foundation Models [ slides ]	Rafailov et al. Direct Preference Optimization: Your Language Model is Secretly a Reward Model Ouyang et al. Training language models to follow instructions with human feedback
W 04/15	Lecture #24 : RL for foundation models (cont.) [ slides ]	Sarch et al. Grounded Reinforcement Learning for Visual Reasoning Black et al. Training Diffusion Models with Reinforcement Learning Ren et al. Diffusion Policy Policy Optimization
F 04/17	Recitation #13: RL with Foundation Models [ slides ]
M 04/20	Lecture #25 (Aviral Kumar): Exploration, Extrapolation, and Chains of Thought [ slides ]
W 04/22	Lecture #26 : Training Diffusion Models with RL [ slides ]	Wagenmaker et al. Steering Your Diffusion Policy with Latent Space Reinforcement Learning Ren et al. Diffusion Policy Policy Optimization Black et al. Training Diffusion Models With Reinforcement Learning Wang et al. Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning
Th 04/23	HW8 Due
F 04/24	Recitation #14: Generative Models for RL & Final Review [ slides ]	slides 2 lectures 1 - 9 slides 3 lectures 10 - 18 slides 4 lectures 19 - 26
F 05/01	Final Exam (5:30pm - 8:30pm)