10-703 Deep RL | Schedule

Date	Lecture	Readings	Logistics
M 08/31	Lecture #1 : Introduction to Reinforcement Learning and Representation Learning [ slides \| video ]	S & B Textbook, Ch1 Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies Dan Wolpert't talk The real reason for brains
W 09/02	Lecture #2 : Exploration-exploitation in multi-armed bandits [ slides \| video ]	S & B Textbook, Ch2 2.1 - 2.7 Russo et al. A Tutorial on Thompson Sampling
F 09/04	Recitation #1: CNNs, RNNs, Tensorflow [ slides ]	G, B & C Textbook, Ch9, Ch10 Tensorflow tutorial notebook OpenAI Gym tutorial notebook
M 09/07	Labor day - No classes
W 09/09	Lecture #3 : Evolutionary methods for policy search [ slides \| video ]	Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning Mouret and Clune. Illuminating search spaces by mapping elites Cully et al. Robots that can adapt like animals Wang et al. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions(Optional)
F 09/11	Recitation #2: Gaussian Processes [ slides \| video ]
M 09/14	Lecture #4 : Exploration-exploitation in experiment design, Bayesian optimization with Gaussian Processes [ slides \| video ]	Rasmussen & Williams Gaussian Processes for Machine Learning Chapter 1, 2.1-2.3 Brochu et. al A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning Sections 1, 2.1, 2.2	HW1 out
W 09/16	Lecture #5 : Imitation learning with behavior cloning [ slides \| video ]	Bagnell. An Invitation to Imitation Bojarski et al. End to End Learning for Self-Driving Cars Bansal et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst Florence et al. Self-Supervised Correspondence in Visuomotor Policy Learning Florence et al. Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation
F 09/18	Recitation #3: HW1 [ slides \| video ]
M 09/21	Lecture #6 : Generative adversarial / goal-conditioned imitation learning [ slides \| video ]	Ho et al. Generative Adversarial Imitation Learning Ding et al. Goal-conditioned imitation learning Background Material - G, B & C Textbook Ch20 Goodfellow et al. Generative Adversarial Nets
W 09/23	Lecture #7 : What if states are few, we know which state we are in, and the world model is known? Dynamic programming for policy search [ slides \| video ]	S & B Textbook, Ch3, Ch4 The Path perspective on Value Learning (blogpost)
F 09/25	Recitation #4: REC: Behavior Cloning and Quiz 1 Review [ slides \| video ]
M 09/28	Lecture #8 : Dynamic Programming, Policy Iteration, Value Iteration [ slides \| video ]	S & B Textbook, Ch3, Ch4 The Path perspective on Value Learning (blogpost)	HW1 due 09/28 11:59pm
W 09/30	Lecture #9 : Monte Carlo Learning and Temporal Difference Learning [ slides \| video ]	S & B Textbook, Ch5, Ch6
F 10/02	Quiz 1 (online) HW2 out
M 10/05	Lecture #10 : What if states are infinite and we do not know or wish to estimate the world model? Deep Q-learning, Deep SARSA [ slides \| video ]	S & B Textbook, Ch 6, Ch 7.1-7.3 Mnih et al. Playing Atari with Deep Reinforcement Learning Hasselt et al. Deep Reinforcement Learning with Double Q-learning Shaul et al. Prioritized Experience Replay
W 10/07	Lecture #11 : Monte Carlo Tree search / Quiz 1 recap [ slides \| video ]	S & B Textbook, Ch 8.11 Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
F 10/09	Recitation #5: Policy and Value Iterations [ slides \| video ]
M 10/12	Lecture #12 : Monte Carlo Tree search [ slides \| video ]	S & B Textbook, Ch 8.11 Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
W 10/14	Lecture #13 : Policy gradients, REINFORCE, Actor-Critic methods [ slides \| video ]	S & B Textbook, Ch13 Deep Reinforcement Learning: Pong from Pixels (blogpost) Mnih et al. Asynchronous Methods for Deep Reinforcement Learning	HW3 out
F 10/16	Community Engagement Day - No classes HW2 due 10/16 11:59pm
M 10/19	Lecture #14 : Actor-Critic methods (cont.), Deterministic PG, Re-parametrized PG [ slides \| slides 2 \| video ]	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning Lillicrap et al. Continuous control with deep reinforcement learning
W 10/21	Lecture #15 : Natural PG [ slides \| video ]	Natural Gradient Descent (blogpost) Schulman et al. Proximal Policy Optimization Algorithms Schulman et al. Trust Region Policy Optimization
F 10/23	Mid Semester Break - No classes
M 10/26	Lecture #16 : Natural PG (cont.) , off policy RL [ slides \| slides 2 \| video ]	Schulman et al. Proximal Policy Optimization Algorithms Schulman et al. Trust Region Policy Optimization Fujimoto et al. Off-Policy Deep Reinforcement Learning without Exploration Doersch Tutorial on Variational Autoencoders
W 10/28	Lecture #17 : SAC, TD3, soft q learning [ slides \| slides 2 \| video ]	Haarnoja et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor Haarnoja et al. Soft Actor-Critic Algorithms and Applications Fujimoto et al. Addressing Function Approximation Error in Actor-Critic Methods
F 10/30	Recitation #6: REC [ slides \| video ]		HW3 due 10/30 11:59PM
M 11/02	Lecture #18 : Model-based RL [ slides \| video ]	Nagabandi et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models Janner et al. When to Trust Your Model: Model-Based Policy Optimization Kurutach et al. Model-Ensemble Trust-Region Policy Optimization
W 11/04	Lecture #19 : Model-based RL (cont.) [ slides \| video ]	Silver et al. Mastering the Game of Go without Human Knowledge Schrittwieser et al. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games Kaiser et al. Model Based Reinforcement Learning for Atari
F 11/06	Quiz 2 (online) HW4 out 11/07
M 11/09	Lecture #20 : Model-based RL (cont.) [ slides \| video ]	Gonzalez et al. Learning to Simulate Complex Physics with Graph Networks Gonzalez et al. Graph Networks as Learnable Physics Engines for Inference and Control Agrawal et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics Tutorial - Representation Learning on Networks
W 11/11	Lecture #21 : Curiosity driven exploration, guest lecture, Deepak Pathak [ slides \| video ]	Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Large-Scale Study of Curiosity-Driven Learning Pathak et al. Self-Supervised Exploration via Disagreement
F 11/13	Recitation #7: HW4 [ slides \| video ]
M 11/16	Lecture #22 : Model-based RL (cont.), imitating planners [ slides \| video ]	Hafner et al. Learning Latent Dynamics for Planning from Pixels Hafner et al. Dream to Control: Learning Behaviors by Latent Imagination Levine et al. End-to-End Training of Deep Visuomotor Policies
W 11/18	Lecture #23 : Intelligent Exploration [ slides \| video ]	Savinov et al. Episodic Curiosity through Reachability Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems Osband et al. Deep Exploration via Bootstrapped DQN
F 11/20	Recitation #8: HW5 [ slides \| video ]
M 11/23	Lecture #24 : Learning and Planning [ slides \| video ]	Savinov et al. Semi-parametric Topological Memory for Navigation Eysenbach et al. Search on the Replay Buffer: Bridging Planning and Reinforcement Learning Emmons et al. Sparse Graphical Memory for Robust Planning Liu et al. Hallucinative Topological Memory for Zero-Shot Visual Planning	HW5 out 11/23 HW4 due 11/23 11:59pm
W 11/25	Thanksgiving Holiday - No classes
F 11/27	Thanksgiving Holiday - No classes
M 11/30	Lecture #25 : Learning from RL and demonstrations [ slides \| video ]	Mandlekar et al. IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data Salimans et al. Learning Montezuma's Revenge from a Single Demonstration Peng et al. SFV: Reinforcement Learning of Physical Skills from Videos
W 12/02	Lecture #26 : Sim2Real transfer [ slides \| video ]	Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Muller et al. Driving Policy Transfer via Modularity and Abstraction Akkaya et al. Solving Rubik’s Cube with A Robot Hand Zeng et al. TossingBot: Learning to Throw Arbitrary Objects with Residual Physics
F 12/04	Recitation #9: REC [ slides ]
M 12/07	Lecture #27 : Meta-Learning, learning to learn [ slides \| video ]	Botvinick et al. Reinforcement Learning, Fast and Slow Duan et al. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning Duan et al. A Simple Neural Attentive Meta-Learner Finn et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Nichol and Schulman Reptile- a Scalable Metalearning Algorithm Clavera et al. Learning to Adapt: Meta-Learning for Model-Based Control	HW5 due 12/07 11:59pm
W 12/09	Lecture #28 : RL and generalization: A closer look to state representations for generalization in model free and model based RL [ slides \| video ]	Florence et al. Self-Supervised Correspondence in Visuomotor Policy Learning Tung et al. Visually-Grounded Library of Behaviours for Transfering Manipulation across Objects and Views Zambaldi et al. Relational Deep Reinforcement Learning Google AI Blog: Using Selective Attention in Reinforcement Learning Agents Ding et al. Mutual Information Maximization for Robust Plannable Representations
F 12/11	Recitation #10: Quiz 3 Review [ slides \| video ]
M 12/14	Quiz 3 (online) 1:00 pm - 2:20 pm EST