10-703 Deep RL | Schedule

Date	Lecture	Readings	Logistics
M 08/25	Lecture #1 : Welcome and Introduction to the Class [ slides \| slides 2 ]	S & B Textbook, Ch1 (optional) Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies (optional) Dan Wolpert't talk The real reason for brains
W 08/27	Lecture #2 : Introduction to Reinforcement Learning [ slides ]	S & B Textbook, Ch1
F 08/29	Recitation #1: Neural Nets, PyTorch, Gymnasium [ slides \| notes ]	G, B & C Textbook, Ch9, Ch10 Gym tutorial notebook PyTorch tutorial notebook
M 09/01	No Class, Labor Day
W 09/03	Lecture #3 : Policy Gradient Methods [ slides ]	G, B & C Textbook, Ch13 http://karpathy.github.io/2016/05/31/rl/	HW1 out
F 09/05	Recitation #2: MDPs, Policy Gradients, & HW1 [ slides ]
M 09/08	Lecture #4 : Actor-Critic Methods (cont.) Evolutionary Methods for Policy Search [ slides \| slides 2 ]	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning (optional) Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial
W 09/10	Lecture #5 : Value-based Methods [ slides \| slides 2 ]	Mnih et al. Playing Atari with Deep Reinforcement Learning Martin Riedmiller Neural Fitted Q-Iteration Fu et al. Diagnosing Bottlenecks in Deep Q-learning Algorithms
F 09/12	Recitation #3: Value-Based Methods and Actor-Critic Methods [ slides ]		HW1 Due, HW2 out
M 09/15	Lecture #6 : Value based methods (Cont.) [ slides \| slides 2 ]	Deep Reinforcement Learning with Double Q-learning S & B Textbook, Ch8.11 Stop Regressing - Training Value Functions via Classification for Scalable Deep RL (optional) Policy Evaluation with Temporal Differences: A Survey and Comparison (optional) A Distributional Perspective on Reinforcement Learning
W 09/17	Lecture #7 : Advanced Policy Gradient Methods [ slides ]	Approximately Optimal Approximate RL Advantage-weighted regression Proximal Policy Optimization Algorithms
F 09/19	Recitation #4: Lecture 7b Advanced Policy Gradient Methods (Cont.) [ slides \| slides 2 ]	Approximately Optimal Approximate RL Advantage-weighted regression Proximal Policy Optimization Algorithms	HW2 Due, Video Recitation HW3
M 09/22	Lecture #8 : Advanced Policy Gradient Methods (Cont.) [ slides \| slides 2 ]	Proximal Policy Optimization Algorithms Continuous control with deep reinforcement learning Addressing Function Approximation Error in Actor-Critic Methods	HW3 Out
W 09/24	Lecture #9 : Advanced Policy Gradient Methods (Cont.) MaxEntropy RL [ slides \| slides 2 ]	Soft Actor-Critic Algorithms and Applications
F 09/26	Recitation #5: Midterm Review and HW4 [ slides \| slides 2 \| video ]
M 09/29	Lecture #10 : Imitation Learning [ slides ]	(optional) Chen et al. Learning by Cheating Bagnell. An Invitation to Imitation, Up To Page 10 Bojarski et al. End to End Learning for Self-Driving Cars Andrychowicz et al. Hindsight Experience Replay Ding et al. Goal-conditioned Imitation Learning
T 09/30	Project Description Out (tentative)
W 10/01	Lecture #11 : Imitation Learning(Cont.) [ slides ]	Luo et al. Understanding Diffusion Models:A Unified Perspective Pearce et al. Imitating Human Behaviour with Diffusion Models	HW4 Out
F 10/03	Midterm
M 10/06	Lecture #12 : Model-based Methods [ slides ]	Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
W 10/08	Lecture #13 : Model-based Methods(Cont.) [ slides ]	(optional) Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games (optional) Kaiser et al. Model-Based Reinforcement Learning for Atari Hansen et al. Temporal Difference Learning for Model Predictive Control	HW3 Due
F 10/10	Lecture #14 : Model-based Methods(Cont.), Learning and Tree-Search Planning [ slides ]	(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search David Silver et al. Mastering the game of Go without human knowledge David Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
F 10/10	Recitation #6: IL Diffusion Policies and HW5 (Video) [ slides \| video ]
S 10/12	HW4 Due, HW5 Out
M 10/13	Fall Break - No Classes
W 10/15	Fall Break - No Classes
F 10/17	Fall Break - No Classes
M 10/20	Lecture #15 : Offline RL [ slides ]	Kumar and Levine Offline RL Tutorial, NeurIPS 2020 Levine et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems Kumar et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction Fujimoto et al. A Minimalist Approach to Offline Reinforcement Learning
W 10/22	Lecture #16 : Offline RL (Cont.) [ slides ]	Kumar et al. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction Kumar et al. Reinforcement Learning from Static Datasets (Chapters 1 to 5) Fujimoto et al. Off-Policy Deep Reinforcement Learning without Exploration Kumar and Levine Offline RL Tutorial, NeurIPS 2020
F 10/24	Recitation #7: Midterm Solution Session [ slides ]		Project Team Formation and Project Proposal Due (tentative)
M 10/27	Lecture #17 : Model-based Methods for offline RL [ slides \| slides 2 ]	Julian Schrittwieser et al. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model Julian Schrittwieser et al. Online and Offline Reinforcement Learning by Planning with a Learned Model Kumar et al. Conservative Q-Learning for Offline Reinforcement Learning Rahul et al. MOReL - Model-Based Offline Reinforcement Learning	HW5 Due
W 10/29	Lecture #18 : Guided Diffusion [ slides ]	Janner et al. Planning with Diffusion for Flexible Behavior Synthesis (optional) Yang et al. Diffusion-ES Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following	HW6 Out
F 10/31	Recitation #8: Offline RL and HW6 [ slides ]
M 11/03	Lecture #19 : Exploration [ slides ]	Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Exploration by Random Network Distillation
W 11/05	Lecture #20 : Sim2Real Learning [ slides ]	Akkaya et al. Solving Rubik’s Cube with A Robot Hand Kumar et al. Rapid Motor Adaptation for Legged Robots Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
F 11/07	Recitation #9: Exploration [ slides ]
M 11/10	Lecture #21 : Sim2Real Learning (Cont.) [ slides ]	Akkaya et al. Solving Rubik’s Cube with A Robot Hand Kumar et al. Rapid Motor Adaptation for Legged Robots Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
W 11/12	Lecture #22 : RL with Foundation Models [ slides ]	Ouyang et al. Training language models to follow instructions with human feedback Slides Direct Preference Optimization: A New RLHF Approach Xu et al. Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
F 11/14	Recitation #10: Sim2Real [ slides ]		HW6 Due
M 11/17	Lecture #23 : RL with Foundation Models (Cont.) [ slides ]	Ouyang et al. Training language models to follow instructions with human feedback Slides Direct Preference Optimization: A New RLHF Approach Xu et al. Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
W 11/19	Lecture #24 : Foudation Models for RL [ slides ]		Project Midway Report Due
F 11/21	Recitation #11: RL with Foundation Models [ slides ]
M 11/24	Lecture #25 : Generative Models for RL [ slides ]	Luo Understanding Diffusion Models:A Unified Perspective Pearce et al. Imitating Human Behaviour with Diffusion Models
W 11/26	Thanksgiving Break - No Classes
F 11/28	Thanksgiving Break - No Recitation
M 12/01	Lecture #26 : Goal-Driven Reasoning for Multimodal AI Agents [ slides ]	Sarch et al. Grounded Reinforcement Learning for Visual Reasoning Sarch et al. VLM Agents Generate Their Own Memories
W 12/03	Lecture #27 : Course Review [ slides ]
F 12/05	Recitation #12: Generative Models for RL [ slides ]
M 12/08	Project Final Presentation, 8:30AM-11:30AM
H 12/11	Project Final Presentation, 10:00AM-1:00PM
F 12/12	[ slides ]		Project Final Report due