10-703 Deep RL | Schedule

Date	Lecture	Readings	Logistics
M 08/30	Lecture #1 (Katerina & Ruslan): Introduction to Reinforcement and Representation Learning [ slides \| video ]	S & B Textbook, Ch1 Smith & Gasser. The Development of Embodied Cognition - Six Lessons from Babies Dan Wolpert't talk The real reason for brains
W 09/01	Lecture #2 (Ruslan): Multi-armed Bandits [ slides \| video ]	S & B Textbook, Ch2 2.1-2.7 Russo et al. A Tutorial on Thompson Sampling, Ch1-Ch4. Optional after Ch4
F 09/03	Recitation #1: Neural Nets, TensorFlow & Keras, OpenAI Gym, Bandits [ slides \| slides 2 \| video \| notes \| notes 2 ]	G, B & C Textbook, Ch9, Ch10
M 09/06	Lecture #3 : Labor Day - No Classes [ slides ]
W 09/08	Lecture #4 (Ruslan): Markov Decision Processes, Value Iteration, Policy Iteration [ slides \| video ]	S & B Textbook, Ch3, Ch4 The Path perspective on Value Learning (blogpost)	HW1 out (tentative)
F 09/10	Recitation #2: Pytorch, Training Resources & HW1 [ slides \| video \| notes ]
M 09/13	Lecture #5 (Ruslan): Monte Carlo Learning and Temporal Difference Learning [ slides \| video ]	S & B Textbook, Ch5, Ch6
W 09/15	Lecture #6 (Ruslan): Monte Carlo Learning and Temporal Difference Learning (Cont.) [ slides \| video ]	S & B Textbook, Ch5, Ch6
F 09/17	Recitation #3: HW1 [ slides \| video ]
M 09/20	Lecture #7 (Katerina): Function approximation in prediction and control, Deep Q-learning, Deep SARSA [ slides \| video ]	S & B Textbook, Ch6, Ch7 7.1-7.3 Mnih et al. Playing Atari with Deep Reinforcement Learning
W 09/22	Lecture #8 (Katerina): Planning, Monte Carlo Tree search [ slides \| video ]	S & B Textbook, Ch8.11 Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
F 09/24	Recitation #4: MCTS, TD Learning, Deep Q Learning [ slides \| video ]
M 09/27	Lecture #9 (Ruslan): Policy gradients, REINFORCE, Actor-Critic methods [ slides \| video ]	Mnih et al. Human-level control through deep reinforcement learning S & B Textbook, Ch 8.11 Pritzel et al. Neural Episodic Control, discussed in lecture Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning	HW1 due 09/27 11:59pm
W 09/29	Lecture #10 (Ruslan): Policy gradients, REINFORCE, Actor-Critic methods (cont.) [ slides \| video ]	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning (optional) Fujimoto et al. Addressing Function Approximation Error in Actor-Critic Methods S & B Textbook, Ch13	HW2 out (tentative)
F 10/01	Recitation #5: Quiz 1 Review [ slides \| video ]
M 10/04	Lecture #11 (Katerina): Natural PG, PPO, TRPO [ slides \| video ]	S & B Textbook, Ch13 Schulman et al. Trust Region Policy Optimization Schulman et al. Proximal Policy Optimization Algorithms (optional) Rajeswaran et al. Towards Generalization and Simplicity in Continuous Control (optional) Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
W 10/06	Lecture #12 (Katerina and Ben): Maximum Entropy RL, soft actor critic, Deterministic Policy gradient, re-parametrized PG [ slides \| slides 2 \| video ]	Haarnoja et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (optional) Haarnoja et al. Soft Actor-Critic Algorithms and Applications Lillicrap et al. Continuous control with deep reinforcement learning
F 10/08	Quiz 1 [covering everything through Lecture 10, Wednesday, September 29]
M 10/11	Lecture #13 (Katerina): Evolutionary methods for policy search [ slides \| video ]	Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial, optional Antoine Cully et al. Robots that can adapt like animals (optional) Rui Wang et al. Paired Open-Ended Trailblazer (POET) Max Jaderberg et al. Population Based Training of Neural Networks
W 10/13	Lecture #14 (Ruslan): Imitation learning, behavior cloning [ slides \| video ]	Chen et al. Learning by Cheating Bagnell. An Invitation to Imitation, Up To Page 10 Ross et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, Optional Bojarski et al. End to End Learning for Self-Driving Cars (optional) Bansal et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst
F 10/15	Mid-semester break - No classes
M 10/18	Lecture #14 (Katerina): Imitation learning (cont.), Adversarial imitation learning [ slides \| video ]	Ho et al. Generative Adversarial Imitation Learning (optional) Zhu et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills Andrychowicz et al. Hindsight Experience Replay Ding et al. Goal-conditioned Imitation Learning	HW3 out (tentative), HW2 due 10/18 11:59PM
W 10/20	Lecture #15 (Katerina): Model based RL, Low dimensional model, Explicit models. [ slides \| video ]	Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models Janner et al. When to Trust Your Model: Model-Based Policy Optimization (optional) Kurutach et al. Model-Ensemble Trust-Region Policy Optimization (optional) Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games (optional) Kaiser et al. Model based Reinforcement Lerning for Atari Lambert et al. Objective Mismatch in Model-based Reinforcement Learning
F 10/22	Recitation #7: Homework 3 [ slides \| video ]
M 10/25	Lecture #16 (Katerina): MBRL (cont), AlphaGo, AlphaGoZero, MuZero [ slides \| video ]	(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search David Silver et al. Mastering the game of Go without human knowledge David Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm DeepMind Blog Post MuZero:Mastering Go, chess, shogi and Atari without rules Julian Schrittwieser et al. Mastering Atari, Go, chess and shogi by planning with a learned model
W 10/27	Lecture #17 (Katerina): MBRL (cont.) Holistic and graph-based world models [ slides \| video ]	Battaglia et al. Interaction Networks for Learning about Objects, Relations and Physics Gonzalez et al. Learning to Simulate Complex Physics with Graph Networks Gonzalez et al. Graph Networks as Learnable Physics Engines for Inference and Control Yan et.al. - Representation Learning on Networks
F 10/29	Recitation #8: Quiz 2 Review [ slides \| video ]
M 11/01	Lecture #18 (Katerina): MBRL (cont.) Time dependent linear models, iLQR [ slides \| video ]		HW3 due 11/01 11:59pm
W 11/03	Lecture #19 (Katerina): MBRL(cont), stochastic world models [ slides \| slides 2 \| video ]	Hafner, et.al. - Dream to Control:Learning Behaviors by Latent Imagination Hafner, et.al. - Mastering Atari with Discrete World Models Agrawal et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics Yan et al. Learning Predictive Representations for Deformable Objects using Contrastive Estimation
F 11/05	No Classes - Day for Community Engagement
M 11/08	Quiz 2 [covering everything from lectures 10-19 (Wednesday, Nov 03)] Pass/Fail Grade Option Deadline
W 11/10	Lecture #20 (Ruslan): Offline RL [ slides \| slides 2 \| video ]	Fujimoto et al. Off-Policy Deep Reinforcement Learning without Exploration Carl Doersch et al. Tutorial on Variational Autoencoders Wang et al. Instabilities of Offline RL with Pre-Trained Neural Representation
F 11/12	Lecture #21 (Katerina): Intelligent Exploration [ slides \| video ]	Osband et al. Deep Exploration via Bootstrapped DQN Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction Burda et al. Large-Scale Study of Curiosity-Driven Learning Savinov et al. Episodic Curiosity through Reachability Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems (optional) Eccofet et al. First return, then explore Salimans et al. Learning Montezuma's Revenge from a Single Demonstration
M 11/15	Lecture #22 (Katerina): Deep exploration (cont.) and Sim2Real tranfer [ slides \| video ]	Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Muller et al. Driving Policy Transfer via Modularity and Abstraction Akkaya et al. Solving Rubik’s Cube with A Robot Hand
W 11/17	Lecture #23 (Katerina): Sim2Real tranfer (cont.) and Visual Imitation Learning [ slides \| video ]	Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World Muller et al. Driving Policy Transfer via Modularity and Abstraction Akkaya et al. Solving Rubik’s Cube with A Robot Hand Aytar et al. Playing hard exploration games by watching YouTube
F 11/19	Recitation #9: Homework 4 [ slides ]
M 11/22	Lecture #24 (Daniel Seita (https://www.cs.cmu.edu/~dseita/)): GUEST lecture: Visual Imitation Learning (cont.) and vision-based manipulation with Transporters [ slides ]	Peng et al. SFV:Reinforcement Learning of Physical Skills from Videos Zeng et al. Transporter Networks:Rearranging the Visual World for Robotic Manipulation Seita et al. Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks
W 11/24	Thanksgiving Break - No classes
F 11/26	Thanksgiving Break - No classes
M 11/29	Lecture #25 (Ruslan): Deep RL for Navigation [ slides ]	Chen et al. Learning Exploration Policies for Navigation Singh Chaplot et al. Learning to Explore using Active Neural SLAM	HW4 due 11/29 11:59pm
W 12/01	Lecture #26 (Ruslan): Efficient Distributed RL [ slides ]	Parisotto et al. Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation Mnih et al. Asynchronous Methods for Deep Reinforcement Learning Duan et al. RL^2:Fast Reinforcement Learning via Slow Reinforcement Learning
F 12/03	Recitation #10: Quiz 3 Review [ slides ]
F 12/07	Quiz 3 (1-4pm) Pass/Fail Grade Option Deadline

W 09/01

Lecture #2 (Ruslan):
Multi-armed Bandits
[ slides | video ]

S & B Textbook, Ch2 2.1-2.7
Russo et al. A Tutorial on Thompson Sampling, Ch1-Ch4. Optional after Ch4

F 09/03

Recitation #1:
Neural Nets, TensorFlow & Keras, OpenAI Gym, Bandits
[ slides | slides 2 | video | notes | notes 2 ]

G, B & C Textbook, Ch9, Ch10

M 09/06

Lecture #3 :
Labor Day - No Classes
[ slides ]

W 09/08

Lecture #4 (Ruslan):
Markov Decision Processes, Value Iteration, Policy Iteration
[ slides | video ]

S & B Textbook, Ch3, Ch4
The Path perspective on Value Learning (blogpost)

HW1 out (tentative)

F 09/10

Recitation #2:
Pytorch, Training Resources & HW1
[ slides | video | notes ]

M 09/13

Lecture #5 (Ruslan):
Monte Carlo Learning and Temporal Difference Learning
[ slides | video ]

S & B Textbook, Ch5, Ch6

W 09/15

Lecture #6 (Ruslan):
Monte Carlo Learning and Temporal Difference Learning (Cont.)
[ slides | video ]

S & B Textbook, Ch5, Ch6

F 09/17

Recitation #3:
HW1
[ slides | video ]

M 09/20

Lecture #7 (Katerina):
Function approximation in prediction and control, Deep Q-learning, Deep SARSA
[ slides | video ]

S & B Textbook, Ch6, Ch7 7.1-7.3
Mnih et al. Playing Atari with Deep Reinforcement Learning

W 09/22

Lecture #8 (Katerina):
Planning, Monte Carlo Tree search
[ slides | video ]

S & B Textbook, Ch8.11
Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

F 09/24

Recitation #4:
MCTS, TD Learning, Deep Q Learning
[ slides | video ]

M 09/27

Lecture #9 (Ruslan):
Policy gradients, REINFORCE, Actor-Critic methods
[ slides | video ]

Mnih et al. Human-level control through deep reinforcement learning
S & B Textbook, Ch 8.11
Pritzel et al. Neural Episodic Control, discussed in lecture
Guo et al. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

HW1 due 09/27 11:59pm

W 09/29

Lecture #10 (Ruslan):
Policy gradients, REINFORCE, Actor-Critic methods (cont.)
[ slides | video ]

Mnih et al. Asynchronous Methods for Deep Reinforcement Learning
(optional) Fujimoto et al. Addressing Function Approximation Error in Actor-Critic Methods
S & B Textbook, Ch13

HW2 out (tentative)

F 10/01

Recitation #5:
Quiz 1 Review
[ slides | video ]

M 10/04

Lecture #11 (Katerina):
Natural PG, PPO, TRPO
[ slides | video ]

S & B Textbook, Ch13
Schulman et al. Trust Region Policy Optimization
Schulman et al. Proximal Policy Optimization Algorithms
(optional) Rajeswaran et al. Towards Generalization and Simplicity in Continuous Control
(optional) Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

W 10/06

Lecture #12 (Katerina and Ben):
Maximum Entropy RL, soft actor critic, Deterministic Policy gradient, re-parametrized PG
[ slides | slides 2 | video ]

Haarnoja et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
(optional) Haarnoja et al. Soft Actor-Critic Algorithms and Applications
Lillicrap et al. Continuous control with deep reinforcement learning

F 10/08

Quiz 1 [covering everything through Lecture 10, Wednesday, September 29]

M 10/11

Lecture #13 (Katerina):
Evolutionary methods for policy search
[ slides | video ]

Salimans et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Nikolaus Hansen. The CMA Evolution Strategy - A Tutorial, optional
Antoine Cully et al. Robots that can adapt like animals
(optional) Rui Wang et al. Paired Open-Ended Trailblazer (POET)
Max Jaderberg et al. Population Based Training of Neural Networks

W 10/13

Lecture #14 (Ruslan):
Imitation learning, behavior cloning
[ slides | video ]

Chen et al. Learning by Cheating
Bagnell. An Invitation to Imitation, Up To Page 10
Ross et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning, Optional
Bojarski et al. End to End Learning for Self-Driving Cars
(optional) Bansal et al. ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

F 10/15

Mid-semester break - No classes

M 10/18

Lecture #14 (Katerina):
Imitation learning (cont.), Adversarial imitation learning
[ slides | video ]

Ho et al. Generative Adversarial Imitation Learning
(optional) Zhu et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills
Andrychowicz et al. Hindsight Experience Replay
Ding et al. Goal-conditioned Imitation Learning

HW3 out (tentative), HW2 due 10/18 11:59PM

W 10/20

Lecture #15 (Katerina):
Model based RL, Low dimensional model, Explicit models.
[ slides | video ]

Chua et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
Janner et al. When to Trust Your Model: Model-Based Policy Optimization
(optional) Kurutach et al. Model-Ensemble Trust-Region Policy Optimization
(optional) Oh et al. Action-Conditional Video Prediction using Deep Networks in Atari Games
(optional) Kaiser et al. Model based Reinforcement Lerning for Atari
Lambert et al. Objective Mismatch in Model-based Reinforcement Learning

F 10/22

Recitation #7:
Homework 3
[ slides | video ]

M 10/25

Lecture #16 (Katerina):
MBRL (cont), AlphaGo, AlphaGoZero, MuZero
[ slides | video ]

(optional) David Silver et al. Mastering the game of Go with deep neural networks and tree search
David Silver et al. Mastering the game of Go without human knowledge
David Silver et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
DeepMind Blog Post MuZero:Mastering Go, chess, shogi and Atari without rules
Julian Schrittwieser et al. Mastering Atari, Go, chess and shogi by planning with a learned model

W 10/27

Lecture #17 (Katerina):
MBRL (cont.) Holistic and graph-based world models
[ slides | video ]

Battaglia et al. Interaction Networks for Learning about Objects, Relations and Physics
Gonzalez et al. Learning to Simulate Complex Physics with Graph Networks
Gonzalez et al. Graph Networks as Learnable Physics Engines for Inference and Control
Yan et.al. - Representation Learning on Networks

F 10/29

Recitation #8:
Quiz 2 Review
[ slides | video ]

M 11/01

Lecture #18 (Katerina):
MBRL (cont.) Time dependent linear models, iLQR
[ slides | video ]

HW3 due 11/01 11:59pm

W 11/03

Lecture #19 (Katerina):
MBRL(cont), stochastic world models
[ slides | slides 2 | video ]

Hafner, et.al. - Dream to Control:Learning Behaviors by Latent Imagination
Hafner, et.al. - Mastering Atari with Discrete World Models
Agrawal et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics
Yan et al. Learning Predictive Representations for Deformable Objects using Contrastive Estimation

F 11/05

No Classes - Day for Community Engagement

M 11/08

Quiz 2 [covering everything from lectures 10-19 (Wednesday, Nov 03)]

Pass/Fail Grade Option Deadline

W 11/10

Lecture #20 (Ruslan):
Offline RL
[ slides | slides 2 | video ]

Fujimoto et al. Off-Policy Deep Reinforcement Learning without Exploration
Carl Doersch et al. Tutorial on Variational Autoencoders
Wang et al. Instabilities of Offline RL with Pre-Trained Neural Representation

F 11/12

Lecture #21 (Katerina):
Intelligent Exploration
[ slides | video ]

Osband et al. Deep Exploration via Bootstrapped DQN
Pathak et al. Curiosity-driven Exploration by Self-supervised Prediction
Burda et al. Large-Scale Study of Curiosity-Driven Learning
Savinov et al. Episodic Curiosity through Reachability
Ecoffet et al. Go-Explore: a New Approach for Hard-Exploration Problems
(optional) Eccofet et al. First return, then explore
Salimans et al. Learning Montezuma's Revenge from a Single Demonstration

M 11/15

Lecture #22 (Katerina):
Deep exploration (cont.) and Sim2Real tranfer
[ slides | video ]

Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Muller et al. Driving Policy Transfer via Modularity and Abstraction
Akkaya et al. Solving Rubik’s Cube with A Robot Hand

W 11/17

Lecture #23 (Katerina):
Sim2Real tranfer (cont.) and Visual Imitation Learning
[ slides | video ]

Tobin et al. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Muller et al. Driving Policy Transfer via Modularity and Abstraction
Akkaya et al. Solving Rubik’s Cube with A Robot Hand
Aytar et al. Playing hard exploration games by watching YouTube

F 11/19

Recitation #9:
Homework 4
[ slides ]

M 11/22

Lecture #24 (Daniel Seita (https://www.cs.cmu.edu/~dseita/)):
GUEST lecture: Visual Imitation Learning (cont.) and vision-based manipulation with Transporters
[ slides ]

Peng et al. SFV:Reinforcement Learning of Physical Skills from Videos
Zeng et al. Transporter Networks:Rearranging the Visual World for Robotic Manipulation
Seita et al. Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

W 11/24

Thanksgiving Break - No classes

F 11/26

Thanksgiving Break - No classes

M 11/29

Lecture #25 (Ruslan):
Deep RL for Navigation
[ slides ]

Chen et al. Learning Exploration Policies for Navigation
Singh Chaplot et al. Learning to Explore using Active Neural SLAM

HW4 due 11/29 11:59pm

W 12/01

Lecture #26 (Ruslan):
Efficient Distributed RL
[ slides ]

Parisotto et al. Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation
Mnih et al. Asynchronous Methods for Deep Reinforcement Learning
Duan et al. RL^2:Fast Reinforcement Learning via Slow Reinforcement Learning

F 12/03

Recitation #10:
Quiz 3 Review
[ slides ]

F 12/07

Quiz 3 (1-4pm)

Pass/Fail Grade Option Deadline