10-403 Deep RL | Schedule

Date	Lecture	Readings	Logistics
W 01/19	Lecture #1 (Katerina): Introduction to Reinforcement and Representation Learning [ slides ]	S & B Textbook, Ch1 Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies Dan Wolpert't talk The real reason for brains
F 01/21	Recitation #1: Neural Nets, TensorFlow & Keras, OpenAI Gym, PyTorch [ slides \| slides 2 \| notes ]	G, B & C Textbook, Ch9, Ch10 Tensorflow tutorial notebook OpenAI Gym tutorial notebook PyTorch tutorial notebook The TensorFlow High Level (Keras) API
M 01/24	Lecture #2 (Katerina): Multi-armed Bandits [ slides ]	S & B Textbook, Ch2 2.1-2.7 Russo et al. A Tutorial on Thompson Sampling, Ch1-Ch4. Optional after Ch4
W 01/26	Lecture #3 (Katerina): Markov Decision Processes, Value Iteration, Policy Iteration [ slides ]	S & B Textbook, Ch3, Ch4 The Path perspective on Value Learning (blogpost)
F 01/28	Recitation #2: Bandits, MDPs & HW1 [ slides \| slides 2 ]
M 01/31	Lecture #4 (Katerina): Monte Carlo Learning and Temporal Difference Learning [ slides ]	S & B Textbook, Ch5, Ch6	HW1 out
W 02/02	Lecture #5 (Katerina): Monte Carlo Learning and Temporal Difference Learning (Cont.) [ slides ]	S & B Textbook, Ch5, Ch6
F 02/04	Recitation #3: HW1 [ slides ]
M 02/07	Lecture #6 (Katerina): Monte Carlo and TD (cont.), Function approximation in prediction and control, Deep Q-learning, Deep SARSA [ slides \| slides 2 ]	S & B Textbook, Ch6, Ch7 7.1-7.3 Mnih et al. Playing Atari with Deep Reinforcement Learning
W 02/09	Lecture #7 (Katerina): Planning, Monte Carlo Tree search [ slides ]	S & B Textbook, Ch8.11 Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
F 02/11	Recitation #4: MCTS, TD Learning, Deep Q Learning [ slides \| slides 2 ]
M 02/14	Lecture #8 (Katerina): MCTS (cont.), Policy gradients, REINFORCE, Actor-Critic methods [ slides ]	Mnih et al. Human-level control through deep reinforcement learning S & B Textbook, Ch 8.11 Pritzel et al. Neural Episodic Control, discussed in lecture Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning	HW1 due 02/14 11:59pm
W 02/16	Lecture #9 (Katerina): Policy gradients, REINFORCE, Actor-Critic methods (cont.) [ slides ]	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning S & B Textbook, Ch13	HW2 out (tentative)
F 02/18	Recitation #5: Quiz 1 Review [ slides ]
M 02/21	Lecture #10 (Katerina): Actor Critic Methods (cont.), Natural PG, PPO, TRPO [ slides \| slides 2 ]	S & B Textbook, Ch13 Schulman et al. Trust Region Policy Optimization Schulman et al. Proximal Policy Optimization Algorithms (optional) Rajeswaran et al. Towards Generalization and Simplicity in Continuous Control
W 02/23	Lecture #11 (Katerina): Deterministic Policy gradient, re-parametrized PG [ slides \| slides 2 ]	Lillicrap et al. Continuous control with deep reinforcement learning
F 02/25	Quiz 1 [covering everything through Lecture 10, Monday, February 21]
M 02/28	Lecture #12 (Katerina): Evolutionary methods for policy search [ slides \| slides 2 ]	Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning (optional) Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial Antoine Cully et al. Robots that can adapt like animals
W 03/02	Lecture #13 (Katerina): Imitation learning, behavior cloning [ slides ]	Chen et al. Learning by Cheating Bagnell. An Invitation to Imitation, Up To Page 10 Bojarski et al. End to End Learning for Self-Driving Cars Bansal et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
F 03/04	Recitation #6: No Recitation (Mid-semester break) [ slides ]
M 03/07	Spring break - No classes
W 03/09	Spring break - No classes
F 03/11	Recitation #7: No Recitation (Spring break) [ slides ]
M 03/14	Lecture #14 (Katerina): Imitation learning (cont.), Adversarial imitation learning [ slides ]	(optional) Ho et al. Generative Adversarial Imitation Learning Zhu et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills Andrychowicz et al. Hindsight Experience Replay Ding et al. Goal-conditioned Imitation Learning	HW3 out (tentative), HW2 due 03/14 11:59PM
W 03/16	Lecture #15 (Katerina): Model based RL, Low dimensional model, Explicit models. [ slides ]	Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models Janner et al. When to Trust Your Model: Model-Based Policy Optimization (optional) Kurutach et al. Model-Ensemble Trust-Region Policy Optimization (optional) Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games Lambert et al. Objective Mismatch in Model-based Reinforcement Learning
F 03/18	Recitation #8: Homework 3 [ slides ]
M 03/21	Lecture #16 (Katerina): MBRL (cont.) Holistic and graph-based world models [ slides ]	Yan et al. Learning Predictive Representations for Deformable Objects using Contrastive Estimation Fragkiadaki et al. Learning Visual Predictive Models of Physics for Playing Billiards Gonzalez et al. Learning to Simulate Complex Physics with Graph Networks (optional) Fish Tung et al. 3D-OES Viewpoint-Invariant Object-Factorized Environment Simulators
W 03/23	Lecture #17 (Katerina): MBRL (cont), AlphaGo, AlphaGoZero, MuZero [ slides ]	(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search David Silver et al. Mastering the game of Go without human knowledge David Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm DeepMind Blog Post MuZero:Mastering Go, chess, shogi and Atari without rules Julian Schrittwieser et al. Mastering Atari, Go, chess and shogi by planning with a learned model
F 03/25	Recitation #9: Quiz 2 Review [ slides ]
M 03/28	Lecture #18 (Katerina): MBRL (cont), AlphaGo, AlphaGoZero, MuZero (cont.) [ slides ]	(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search David Silver et al. Mastering the game of Go without human knowledge David Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm DeepMind Blog Post MuZero:Mastering Go, chess, shogi and Atari without rules Julian Schrittwieser et al. Mastering Atari, Go, chess and shogi by planning with a learned model	HW4 out (tentative), HW3 due 03/28 11:59pm
W 03/30	Lecture #19 (Katerina): MBRL (cont.) Time dependent linear models, iLQR [ slides ]
F 04/01	Quiz 2 [covering everything from lectures 10 through 17 (no MuZero) (Wednesday, March 23)] Pass/Fail Grade Option Deadline
M 04/04	Lecture #20 (Katerina): MBRL(cont), stochastic world models [ slides ]	Hafner, et.al. - Dream to Control:Learning Behaviors by Latent Imagination Hafner, et.al. - Mastering Atari with Discrete World Models Carl Doersch et al. Tutorial on Variational Autoencoders
W 04/06	Lecture #21 (Katerina): Intelligent Exploration [ slides ]	(optional) Osband et al. Deep Exploration via Bootstrapped DQN Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Large-Scale Study of Curiosity-Driven Learning Savinov et al. Episodic Curiosity through Reachability Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems (optional) Eccofet et al. First return, then explore Salimans et al. Learning Montezuma's Revenge from a Single Demonstration
F 04/08	No Classes - Spring Carnival
M 04/11	Lecture #22 (Katerina): Offline RL [ slides ]	Fujimoto et al. Off-Policy Deep Reinforcement Learning without Exploration Mandlekar et al. IRIS Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data	HW4 due 04/11 11:59pm, HW5 out (tentative)
W 04/13	Lecture #23 (Katerina): Sim2Real transfer [ slides ]	Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Muller et al. Driving Policy Transfer via Modularity and Abstraction Akkaya et al. Solving Rubik’s Cube with A Robot Hand
F 04/15	Recitation #10: Homework 5 [ slides ]
M 04/18	Lecture #24 (Katerina): Visual Imitation Learning [ slides ]	Peng et al. SFV Reinforcement Learning of Physical Skills from Videos Aytar et al. Playing hard exploration games by watching YouTube
W 04/20	Lecture #25 : Self-Supervised Visual Learning [ slides ]	Chen et al. A Simple Framework for Contrastive Learning of Visual Representations Srinivas et al. CURL Contrastive Unsupervised Representations for Reinforcement Learning Radford et al. Learning Transferable Visual Models From Natural Language Supervision Khandelwal et al. Simple but Effective CLIP Embeddings for Embodied AI
F 04/22	Recitation #11: Quiz 3 Review [ slides ]
M 04/25	Lecture #26 (Katerina): Control with 3D visual representations [ slides ]	(optional) Fish Tung et al. Learning Spatial Common Sense with Geometry-Aware Recurrent Networks (optional) Fish Tung et al. 3D-OES Viewpoint-Invariant Object-Factorized Environment Simulators (optional) Harley et al. Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping (optional) Yang et al. Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views
W 04/27	Lecture #27 (Katerina): Transporter Networks for Vision-Based Manipulation [ slides ]	Zeng et. al Transporter Networks Rearranging the Visual World for Robotic Manipulation Seita et al. Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks (optional) Shridhar et al. CLIPort What and Where Pathways for Robotic Manipulation
F 04/29	Recitation #12: Quiz 3 Review [ slides ]		Pass/Fail Grade Option Deadline
T 05/03	Quiz 3

W 01/19

Lecture #1 (Katerina):
Introduction to Reinforcement and Representation Learning
[ slides ]

S & B Textbook, Ch1
Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies
Dan Wolpert't talk The real reason for brains

F 01/21

Recitation #1:
Neural Nets, TensorFlow & Keras, OpenAI Gym, PyTorch
[ slides | slides 2 | notes ]

G, B & C Textbook, Ch9, Ch10
Tensorflow tutorial notebook
OpenAI Gym tutorial notebook
PyTorch tutorial notebook
The TensorFlow High Level (Keras) API

M 01/24

Lecture #2 (Katerina):
Multi-armed Bandits
[ slides ]

S & B Textbook, Ch2 2.1-2.7
Russo et al. A Tutorial on Thompson Sampling, Ch1-Ch4. Optional after Ch4

W 01/26

Lecture #3 (Katerina):
Markov Decision Processes, Value Iteration, Policy Iteration
[ slides ]

S & B Textbook, Ch3, Ch4
The Path perspective on Value Learning (blogpost)

F 01/28

Recitation #2:
Bandits, MDPs & HW1
[ slides | slides 2 ]

M 01/31

Lecture #4 (Katerina):
Monte Carlo Learning and Temporal Difference Learning
[ slides ]

S & B Textbook, Ch5, Ch6

HW1 out

W 02/02

Lecture #5 (Katerina):
Monte Carlo Learning and Temporal Difference Learning (Cont.)
[ slides ]

S & B Textbook, Ch5, Ch6

F 02/04

Recitation #3:
HW1
[ slides ]

M 02/07

Lecture #6 (Katerina):
Monte Carlo and TD (cont.), Function approximation in prediction and control, Deep Q-learning, Deep SARSA
[ slides | slides 2 ]