10-703 Deep RL | Schedule

Date	Lecture	Readings	Logistics
M 08/26	Lecture #1 : Introduction to Reinforcement and Representation Learning [ slides ]	S & B Textbook, Ch1 Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies Dan Wolpert't talk The real reason for brains
W 08/28	Lecture #2 : Multi-armed Bandits [ slides ]	S & B Textbook, Ch2 2.1-2.7 Russo et al. A Tutorial on Thompson Sampling, Ch1-Ch4. Optional after Ch4
F 08/30	Recitation #1: Neural Nets, TensorFlow & Keras, OpenAI Gym, Bandits [ slides ]	G, B & C Textbook, Ch9, Ch10 Tensorflow tutorial notebook OpenAI Gym tutorial notebook PyTorch tutorial notebook The TensorFlow High Level (Keras) API
M 09/02	No Class, Labor Day
W 09/04	Lecture #3 : Markov Decision Processes, Value Iteration, Policy Iteration [ slides ]	S & B Textbook, Ch3, Ch4 The Path perspective on Value Learning (blogpost)	HW1 out (tentative)
F 09/06	Recitation #2: Bandits, MDPs & HW1 [ slides ]
M 09/09	Lecture #4 : Markov Decision Processes, Value Iteration, Policy Iteration (cont.) [ slides ]	S & B Textbook, Ch3, Ch4 The Path perspective on Value Learning (blogpost)
W 09/11	Lecture #5 : Monte Carlo Learning and Temporal Difference Learning [ slides ]	S & B Textbook, Ch5, Ch6
F 09/13	No Recitation
M 09/16	Lecture #6 : Monte Carlo Learning and Temporal Difference Learning (cont.) [ slides ]	S & B Textbook, Ch5, Ch6
W 09/18	Lecture #7 : Monte Carlo Tree Search, Function approximation in prediction and control, Deep Q-learning [ slides \| slides 2 ]	S & B Textbook, Ch6, Ch7 7.1-7.3 S & B Textbook, Ch8.11 S & B Textbook, Ch9.1-9.3, 9.6 and Ch10.1 Mnih et al. Playing Atari with Deep Reinforcement Learning Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
F 09/20	Recitation #3: MCTS, TD Learning, Deep Q Learning, HW2 (DQN) [ slides ]
M 09/23	Lecture #8 : Deep Q-learning (cont.) [ slides ]	S & B Textbook, Ch9.1-9.3, 9.6 and Ch10.1 Mnih et al. Playing Atari with Deep Reinforcement Learning Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning Parables on the Power of Planning in AI:_From Poker to Diplomacy:_Noam Brown (OpenAI)	HW1 due 11:59pm, HW2 out (tentative)
W 09/25	Lecture #9 : Policy gradients [ slides ]	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning S & B Textbook, Ch13 http://karpathy.github.io/2016/05/31/rl/
F 09/27	Recitation #4: Quiz 1 Review [ slides ]
M 09/30	Lecture #10 : Natural PG, PPO, TRPO [ slides \| slides 2 ]	(optional) Schulman et al. Trust Region Policy Optimization Schulman et al. Proximal Policy Optimization Algorithms (optional) Rajeswaran et al. Towards Generalization and Simplicity in Continuous Control
W 10/02	Lecture #11 : Evolutionary methods for policy search [ slides ]	Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning (optional) Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial (optional) Antoine Cully et al. Robots that can adapt like animals
F 10/04	Quiz 1
M 10/07	Lecture #12 : Deterministic Policy gradient, re-parametrized PG [ slides \| slides 2 ]	Lillicrap et al. Continuous control with deep reinforcement learning
W 10/09	Lecture #13 : Generative Adversarial Imitation Learning [ slides ]	(optional) Chen et al. Learning by Cheating Bagnell. An Invitation to Imitation, Up To Page 10 Bojarski et al. End to End Learning for Self-Driving Cars	HW2 due 11:59PM
F 10/11	Recitation #5: Solutions to Quiz 1 [ slides ]
M 10/14	Fall Break - No Classes
W 10/16	Fall Break - No Classes
F 10/18	Fall Break - No Classes
M 10/21	Lecture #14 : Multi-goal RL and Imitation learning, Visual Imitation Learning [ slides \| slides 2 ]	Ding et al. Goal-conditioned Imitation Learning (optional) Zeng et al. Transporter Networks Rearranging the Visual World for Robotic Manipulation Andrychowicz et al. Hindsight Experience Replay Peng et al. SFV:Reinforcement Learning of Physical Skills from Videos Baker et al. Video PreTraining (VPT):Learning to Act by Watching Unlabeled Online Videos	HW3 out (tentative)
W 10/23	Lecture #15 : Imitation Learning with Diffusion Models [ slides ]	Luo Understanding Diffusion Models:A Unified Perspective Pearce et al. Imitating Human Behaviour with Diffusion Models (optional) Nakkiran et al. Step-by-Step Diffusion:An Elementary Tutorial
F 10/25	Recitation #6: Diffusion policies (cont.), HW3 Recitation [ slides \| slides 2 ]
W 10/28	Lecture #16 : AlphaGo, AlphaGoZero, AlphaZero [ slides ]	(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search David Silver et al. Mastering the game of Go without human knowledge (optional) David Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
W 10/30	Recitation #7: Class Cancelled [ slides ]
F 11/1	Lecture #17 : MBRL in explicit and observable low-dimensional state spaces [ slides ]	Gonzalez et al. Learning to Simulate Complex Physics with Graph Networks Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
M 11/04	Lecture #18 : MBRL from Sensory Input, Planning in Sensory Space, Planning in a Latent State Space [ slides ]	Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games Hansen et al. Temporal Difference Learning for Model Predictive Control DeepMind Blog Post MuZero:Mastering Go, chess, shogi and Atari without rules Julian Schrittwieser et al. Mastering Atari, Go, chess and shogi by planning with a learned model	HW4 out (tentative)
W 11/06	Lecture #19 : Quiz 2 and HW4 Review Recitation [ slides ]
F 11/08	Quiz 2
M 11/11	Lecture #20 : MBRL with Multimodal Dynamics [ slides ]	Janner et al. Planning with Diffusion for Flexible Behavior Synthesis (optional) Yang et al. Diffusion-ES Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following
W 11/13	Lecture #21 : Intelligent Exploration [ slides ]	Burda et al. Large-Scale Study of Curiosity-Driven Learning Savinov et al. Episodic Curiosity through Reachability Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems Salimans et al. Learning Montezuma's Revenge from a Single Demonstration
F 11/15	Recitation #8: Solutions to Quiz 2 [ slides ]
W 11/18	Lecture #22 : Intelligent Exploration (cont.) [ slides ]	Burda et al. Large-Scale Study of Curiosity-Driven Learning Savinov et al. Episodic Curiosity through Reachability Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems Salimans et al. Learning Montezuma's Revenge from a Single Demonstration
W 11/20	Lecture #23 : Offline RL (Aviral Kumar) [ slides ]	Kumar et al. Conservative Q-Learning for Offline Reinforcement Learning Fujimoto et al. Off-Policy Deep Reinforcement Learning without Exploration
F 11/22	Recitation #9: Homework 5 [ slides ]
M 11/25	Lecture #24 : Sim2Real Transfer [ slides \| slides 2 ]	Kumar et al. Rapid Motor Adaptation for Legged Robots Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World (Optional) Muller et al. Driving Policy Transfer via Modularity and Abstraction Akkaya et al. Solving Rubik’s Cube with A Robot Hand	HW5 out (tentative), HW4 due 11:59PM
W 11/27	Lecture #25 : No Class - Thanksgiving Break [ slides ]
F 11/29	Recitation #10: No Recitation - Thanksgiving Break [ slides ]
M 12/02	Lecture #26 : Building Generalist Robots with Agility via Learning and Control: Humanoids and Beyond (Guanya Shi) [ slides ]	(optional) He et al. Learning Human to Humanoid Real Time Whole Body Teleoperation (optional) He et al. Agile But Safe Learning Collision-Free High-Speed Legged Locomotion Xiao et al. AnyCar to Anywhere Learning Universal Dynamics Model for Agile and Adaptive Mobility (optional) Xue et al. Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing (optional) Yang et al. Agile Continuous Jumping in Discontinuous Terrains
W 12/04	Lecture #27 : Language and Robot Control [ slides ]	Radford et al. Learning Transferable Visual Models From Natural Language Supervision Shridhar et al. CLIPort What and Where Pathways for Robotic Manipulation Brown et al. Language Models are Few-Shot Learners Liang et al. Code as Policies Language Model Programs for Embodied Control	HW5 due 11:59PM
F 12/06	Recitation #11: Quiz 3 Review [ slides ]
T 12/10	Quiz 3, 1:00pm - 4:00pm, PH 100