10-403 Deep RL | Schedule

Date	Lecture	Readings	Logistics
W 01/18	Lecture #1 : Introduction to Reinforcement and Representation Learning [ slides ]	S & B Textbook, Ch1 Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies Dan Wolpert't talk The real reason for brains
F 01/20	Recitation #1: Introduction to PyTorch and OpenAI Gym [ slides ]	G, B & C Textbook, Ch9, Ch10 The TensorFlow High Level (Keras) API
M 01/23	Lecture #2 : Multi-armed Bandits [ slides ]	S & B Textbook, Ch2 2.1-2.7 Russo et al. A Tutorial on Thompson Sampling, Ch1-Ch4. Optional after Ch4
W 01/25	Lecture #3 : Gaussian Processes, Experiment design [ slides ]	Rasmussen & Williams Gaussian Processes for Machine Learning Ch1, Ch2 2.1-2.3 Brochu et. al A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning Sections 1, 2.1, 2.2	HW1 out (tentative)
F 01/27	Recitation #2: Deep Learning Architectures, MDPs, Bandits [ slides ]
M 01/30	Lecture #4 : Markov Decision Processes, Value Iteration, Policy Iteration [ slides ]	S & B Textbook, Ch3, Ch4
W 02/01	Lecture #5 : Markov Decision Processes, Value Iteration, Policy Iteration [ slides ]	S & B Textbook, Ch3, Ch4
F 02/03	Recitation #3: Iterative Learning, Monte Carlo [ slides ]
M 02/06	Lecture #6 : Monte Carlo Learning and Temporal Difference Learning, Planning, Monte Carlo Tree search [ slides \| slides 2 ]	S & B Textbook, Ch5, Ch6 The Path perspective on Value Learning (blogpost)
W 02/08	Lecture #7 : Monte Carlo Learning and Temporal Difference Learning, Planning, Monte Carlo Tree search [ slides ]	S & B Textbook, Ch8.11 The Path perspective on Value Learning (blogpost)
F 02/10	Recitation #4: Monte Carlo Learning (contd), HW1 Q&A [ slides ]	S & B Textbook, Ch8.11 The Path perspective on Value Learning (blogpost)
M 02/13	Lecture #8 : Function approximation in prediction and control, Deep Q-learning [ slides ]	S & B Textbook, Ch6, Ch7 7.1-7.3 S & B Textbook, Ch9.1-9.3, 9.6 and Ch10.1 Mnih et al. Playing Atari with Deep Reinforcement Learning Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning	HW1 due 11:59pm
W 02/15	Lecture #9 : Policy gradients, REINFORCE, Actor-Critic methods [ slides ]	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning S & B Textbook, Ch13 http://karpathy.github.io/2016/05/31/rl/	HW2 out
F 02/17	Recitation #5: Quiz 1 Review, MCTS & DQN Review [ slides \| slides 2 ]
M 02/20	Lecture #10 : Policy gradients, REINFORCE, Actor-Critic methods (cont) [ slides ]	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning S & B Textbook, Ch13
W 02/22	Lecture #11 : Natural PG, PPO, TRPO [ slides ]	S & B Textbook, Ch13 Schulman et al. Trust Region Policy Optimization Schulman et al. Proximal Policy Optimization Algorithms (optional) Rajeswaran et al. Towards Generalization and Simplicity in Continuous Control
F 02/24	Quiz 1
M 02/27	Lecture #12 : No Class due to Illness [ slides ]
W 03/01	Lecture #13 : Re-parametrized policy gradients, deep deterministic policy gradients [ slides ]	Lillicrap et al. Continuous control with deep reinforcement learning
F 03/03	Lecture #14 : Evolutionary Methods [ slides ]	Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning (optional) Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial Antoine Cully et al. Robots that can adapt like animals
M 03/06	Spring Break - No Classes
W 03/08	Spring Break - No Classes HW2 due 11:59PM
F 03/10	Spring Break - No Classes
M 03/13	Lecture #15 : Imitation learning, behavior cloning [ slides ]	Chen et al. Learning by Cheating Bagnell. An Invitation to Imitation, Up To Page 10 Bojarski et al. End to End Learning for Self-Driving Cars Andrychowicz et al. Hindsight Experience Replay Ding et al. Goal-conditioned Imitation Learning
W 03/15	Lecture #16 : Adversarial imitation learning, Imitation learning for vision-based manipulation [ slides \| slides 2 ]	(optional) Ho et al. Generative Adversarial Imitation Learning Zhu et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills Zeng et. al Transporter Networks Rearranging the Visual World for Robotic Manipulation Seita et al. Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks Ding et al. Goal-conditioned Imitation Learning
F 03/17	Recitation #6: Recitation 7 [ slides ]
M 03/20	Lecture #17 : Transporter networks for Robot manipulation (cont.), Model-based RL in low-dim state space [ slides ]	Zeng et. al Transporter Networks Rearranging the Visual World for Robotic Manipulation Seita et al. Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models	HW3 out (tentative)
W 03/22	Lecture #18 : MBRL (cont), AlphaGo, AlphaGoZero [ slides ]	(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search David Silver et al. Mastering the game of Go without human knowledge David Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
F 03/24	Recitation #7: Imitation Learning [ slides ]
M 03/27	Lecture #19 : MBRL (cont.) latent dynamics models [ slides ]	Hafner, et.al. - Dream to Control:Learning Behaviors by Latent Imagination Hafner, et.al. - Mastering Atari with Discrete World Models Carl Doersch et al. Tutorial on Variational Autoencoders
W 03/29	Lecture #20 : Dynamics learning with graph neural networks [ slides ]	Fragkiadaki et al. Learning Visual Predictive Models of Physics for Playing Billiards Gonzalez et al. Learning to Simulate Complex Physics with Graph Networks
F 03/31	Quiz 2
M 04/03	Lecture #21 : MuZero, Intelligent Exploration [ slides ]	DeepMind Blog Post MuZero:Mastering Go, chess, shogi and Atari without rules Schrittwieser et al. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model (optional) Osband et al. Deep Exploration via Bootstrapped DQN (optional) Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Large-Scale Study of Curiosity-Driven Learning Savinov et al. Episodic Curiosity through Reachability Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems Salimans et al. Learning Montezuma's Revenge from a Single Demonstration	HW4 out (tentative), HW3 due 11:59PM
W 04/05	Lecture #22 : Intelligent Exploration [ slides ]	(optional) Osband et al. Deep Exploration via Bootstrapped DQN (optional) Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Large-Scale Study of Curiosity-Driven Learning Savinov et al. Episodic Curiosity through Reachability Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems Salimans et al. Learning Montezuma's Revenge from a Single Demonstration
F 04/07	Recitation #8: Recitation 9 [ slides ]
M 04/10	Lecture #23 : Self-Supervised Visual Learning [ slides ]	Chen et al. A Simple Framework for Contrastive Learning of Visual Representations Srinivas et al. CURL Contrastive Unsupervised Representations for Reinforcement Learning Radford et al. Learning Transferable Visual Models From Natural Language Supervision Khandelwal et al. Simple but Effective CLIP Embeddings for Embodied AI
W 04/12	Lecture #24 : Sim2Real Transfer [ slides ]	Kumar et al. Rapid Motor Adaptation for Legged Robots Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Muller et al. Driving Policy Transfer via Modularity and Abstraction Akkaya et al. Solving Rubik’s Cube with A Robot Hand
F 04/14	Spring Carnival - No Classes
M 04/17	Lecture #25 : Visual Imitation Learning [ slides ]	Hahn et al. No RL, No Simulation:Learning to Navigate without Navigating Aytar et al. Playing hard exploration games by watching YouTube Peng et al. SFV Reinforcement Learning of Physical Skills from Videos	HW5 out (tentative), HW4 due 11:59PM
W 04/19	Lecture #26 : Offline Reinforcement Learning [ slides ]	Fujimoto et al. Off-Policy Deep Reinforcement Learning without Exploration Mandlekar et al. IRIS Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data Hahn et al. No RL, No Simulation:Learning to Navigate without Navigating
F 04/21	Recitation #9: Recitation 10 [ slides ]
M 04/24	Lecture #27 (Katerina): Language-guided Control [ slides ]	(optional) Shridhar et al. CLIPort What and Where Pathways for Robotic Manipulation
W 04/26	Lecture #28 : Diffusion models for action prediction and planning [ slides ]	Jannar et al. Planning with Diffusion for Flexible Behavior Synthesis Pearce et al. Imitating Human Behaviour with Diffusion Models
F 04/28	Recitation #10: Recitation 11 [ slides ]
M 05/01	Quiz 3 HW5 due 11:59PM