Getting into deep reinforcement learning is a tough but rewarding journey. The primary reason it is so difficult for beginners is due to the shear number of fields involved. Deep Reinforcement Learning for Robotics can be broken down into three almost seperate parts:

Deep Learning (Neural Networks)
Reinforcement Learning Algorithm
Choice of State Space, Action Space and Reward Functions

How To Get Started

There are a finite number of ways to get started with DRL, but I believe that the right curriculum (learning xD) can make you go a long way.

RL Agents pretty much wither and die when left in environments with sparse rewards. One of the easiest solution for both RL Agents and beginners trying to learn DRL is to have a dense reward function that introduces a lot of fun and rewards at every timestep in the journey.

Target-Audiance

People who have prior experience with python, numpy(preferably) and basic understanding of probability and statistics. Added benefit if you are comfortable with basic calculus.

You can learn the maths and numpy on the go as and when it is required but a decent understanding of python datatypes, object-oriented-programming is expected.

End-Goal

The end-goal for this curriculum is to equip you with the skills required to read through research papers and reimplement/modify them, understand opensource projects and ultimately help you get started with research in DRL.

If this is not what your looking for then this probably isn’t the right curriculum for you.

Curriculum

The curriculum laid out below is my opinon and it might or might not be the best way for you to get into DRL, so please use it accordingly.

System Setup

Please create a virtualenv and switch to python 3.8 for the entire series. Linux(Ubuntu or any other distro) is the recommended OS, while some of the code and packages might work in windows, I will not be helping with any windows debugging.

Phase One

1. Get started with gymnasium

Prerequisites: python

Note: If you are an existing user of gym please refer to the migration guide to gymnasium here.

Explore the structure of gymnasium-api. Render a couple of environments until your comfortable with the syntax and have a general idea of what is happening in the code.
Progress Check: Assignmnet-1 | Solution-1

2. Play around with rewards, states and actions

Prerequisites: python, gymnasium

Get started with a 3rd party RL Library (SB3, CleanRL, RLlib or any other implementation of your choice) and a robust RL algorithm like PPO and focus on changing the rewards, states and actions in the basic environments to solve them. Gain an intution of how the RL framework works.
For robotics in paticular, try the MuJoCo environments like Ant or check here and here for other available options (This is not an exhaustive list).
What happens when you use torque instead of position in the action space ? What happens when you given a combination of negative and postive rewards ? What are termination conditions ?
Believe it or not, you are already in a position to replicate a couple of basic DRL papers. Search for papers that are related to blind locomotion in quadrupeds or robotic manipulators (for example), you should be able to comfortably work with any paper that involve changes to only the state, action and rewards.
Try importing different robot models into MuJoCo or any other physics engine of your choice and getting them to work or alternativly use one of the above listed rl-envs. Here here a couple of papers that you can implement along with a link to my implementation, feel free to try it on your own first or tinker around with my code directly:

Fu, Z., Kumar, A., Malik, J., & Pathak, D. (2021). Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots. doi:10.48550/ARXIV.2111.01674

Franceschetti, A., Tosello, E., Castaman, N., & Ghidoni, S. (2020). Robotic Arm Control and Task Training through Deep Reinforcement Learning. doi:10.48550/ARXIV.2005.02632

Michel Aractingi, Pierre-Alexandre Léziart, Thomas Flayols, Julien Perez, Tomi Silander, et al.. Controlling the Solo12 Quadruped Robot with Deep Reinforcement Learning. 2022. ⟨hal-03761331⟩

Fang-I Hsiao, Cheng-Min Chiang, Alvin Hou, et al.. Reinforcement Learning Based Quadcopter Controller
You can even try ideas like curriculum learning, dynamic goal generation and other ideas that vary the difficult of the training as per the agents performance.
Progress Check: Assignmnet-2 | Solution-2

3. Learn Tabular Reinforcement Learning Methods

Prerequisites: python, gymnasium, numpy

NOTE: You can continue to explore ideas and research papers from step 2 in parallel.

Learn about the basics/fundamentals of reinforcement learning mainly: K-Arm Bandits, MDP, Monte-Carlo Methods, Temporal Difference Methods, Bootstrapping
Refer to the Sources & References for links to external resources like video lectures.
Refer to the following sections of the blog series for code and theory:

Part 1 - Multi-Arm Bandits

Part 2 - Finite MDP

Part 3 - Dynamic Programming

Part 4 - Monte Carlo and Temporal Difference Methods
Solve some of the basic low dimenssional problems from the gym environments like toy-text problems
Progress Check: Assignmnet-3 | Solution-3

Phase Two

4. Deep Learning Framework

Prerequisites: python, numpy

Pick a library of your choice for learning Neural Networks, this guide will be based on PyTorch. (Other options like TensorFlow exist, pick whatever works best for you.)
Learn Deep Learning using PyTorch. In paticular try a couple of basic projects till your comfortable with the following ideas: Loss Functions, Activation Functions, PyTorch Syntax, MLP, CNN, RNN, LSTM, GAN, Autoencoders, Weight Initializations, Dropout, Optimizers.
Refer to the Sources & References for links to external resources for learning.
Do a couple of pure Deep-Learning projects like binary/multi-class classification, De-noising Images and so on..
Try going through some of the classic papers in DL that laid the foundation for modern DL. Here is a link to my collection of must read classics. Here are links to an external collection that is more exhaustive spinning-openai, awesome-deep-rl, awesome-deep-reinforcement-learning
Progress Check: Assignmnet-4 | Solution-4

5. Apply Deep Learning to RL

Prerequisites: python, numpy, rl-library-of-your-choice

Make small changes to the exisiting neural networks from your prior projects in Section 2. Play around with rewards, states and actions which were based on 3rd party RL libraries.
Change the number of layers, the type of NNet, the activation functon, maybe add a CNN and take camera input in the state.
Upgrade projects you worked on earlier like quadruped/robotics arm by adding camera inputs to the state space or change the NNet type to RNNs or LSTMs and check how the performace of the agent changes.
Progress Check: Assignmnet-5 | Solution-5

6. Approx. Methods in RL

Prerequisites: python, numpy, PyTorch

Learn Deep Q-Learning, Policy Gradient, Actor-Critic Methods and other algorithms and implement them.
Refer to the following sections of the blog series:

Part 5 - Deep SARSA and Q-Learning

Part 6 - REINFORCE, AC, DDPG, TD3, SAC

Part 7 - A2C, PPO, TRPO, GAE, A3C
You should now be able to implement a good number of research papers, explore ideas like HER, PER, World Models and other concepts.
Progress Check: Assignmnet-6 | Solution-6

Where Does This Blog Fit in ?

Some drawbacks of the existing resources for DRL:

Most of them only focus on the theory or the code but not both. A majority of the courses that cover the theory in great detail do not have any coding components making it very difficult to implement any learning. The courses which are coding centric only focus on the code and skip most of the theory and give a vague intution about the proof or the derivation for the formulas used.
Many courses use their own custom environments for teaching (ahm ahm Coursera Specialization) while this can make learning/teaching easy. Some use jupyter notebooks for teaching, most if not all the RL libraries and opensource projects in the internet use argparse and write their code in modular file structures. Once you step outside the course sandbox it becomes very difficult to switch or even follow other projects.
A good majority of courses are topic specific aka they only teach something with limits scope or prespective in mind. For example, there are tons of Deep Learning courses but there usually isn’t a deep learning for reinforcement learning course. So, you end up learning a lot more than what is needed and the course usually might focus on things that are not really required for DRL.

The primary goal for this blog series is to bridge the gap between theory and code in Deep Reinforcement Learning.

This blog isn’t a one stop solution and will not teach you DRL from start to finish, you will still need to learn a good portion of the curriculum from other resources. This blog is a sort of an extended cheatsheet for people to refer to when they are learning/implementing DRL via code. It contains a mix of theory and code that build on top of each other.

Structure of the blog

Part 0 - Getting Started
Part 1 - Multi-Arm Bandits
Part 2 - Finite MDP | Prerequisites: python, gymnasium, numpy
Part 3 - Dynamic Programming | Prerequisites: python, gymnasium, numpy
Part 4 - Monte Carlo, Temporal Difference & Bootstrapping Methods | Prerequisites: Python, gymnasium, numpy
Part 5 - Deep SARSA and Q-Learning | Prerequisites: python, gymnasium, numpy, torch
Part 6 - REINFORCE, AC, DDPG, TD3, SAC
Part 7 - A2C, PPO, TRPO, GAE, A3C
TBA (HER, PER, Distillation)

Sources & References

This section contains a collection of all the various sources for this blog series (in no paticular order):

Sutton, R. S., Barto, A. G. (2018 ). Reinforcement Learning: An Introduction. The MIT Press.
David Silver (2015). Lectures on Reinforcement Learning
Udemy Course Reinforcement Learning beginner to master - AI in Python
Udemy Course Modern Reinforcement Learning: Deep Q Learning in PyTorch
Chris G. Willcocks - Durham University Reinforcement Learning Lectures
(My repo) drl-algorithms
Pieter Abbeel Foundations of Deep RL
Weng, L. (2018, February 19). A (long) peek into reinforcement learning. Lil’Log
Aditya Chopra. (2022). Introduction to Concepts in Reinforcement Learning

This section contains a collection of various references which are required to learn DRL and have been mentioned in the curriculum but have not been covered in this blog series:

Udemy Course PyTorch for Deep Learning in 2023: Zero to Mastery
Udemy Course A deep understanding of deep learning (with Python intro)

This section contains other references that I have not used in this blog series but are in general useful:

Coursera Reinforcement Learning Specialization
HuggingFace Deep Reinforcement Learning Course
Professor Emma Brunskill, Stanford University Stanford CS234: Reinforcement Learning | Winter 2019
DeepMind x UCL RL Lecture Series
RAIL CS285: Deep Reinforcement Learning Series UC Berkeley

Deep Reinforcement Learning - Part 0 - Getting Started

How To Get Started

Target-Audiance

End-Goal

Curriculum

System Setup

Phase One

1. Get started with gymnasium

2. Play around with rewards, states and actions

3. Learn Tabular Reinforcement Learning Methods

Phase Two

4. Deep Learning Framework

5. Apply Deep Learning to RL

6. Approx. Methods in RL

Where Does This Blog Fit in ?

Structure of the blog

Sources & References

Trending Tags

Deep Reinforcement Learning - Part 0 - Getting Started

How To Get Started

Target-Audiance

End-Goal

Curriculum

System Setup

Phase One

1. Get started with gymnasium

2. Play around with rewards, states and actions

3. Learn Tabular Reinforcement Learning Methods

Phase Two

4. Deep Learning Framework

5. Apply Deep Learning to RL

6. Approx. Methods in RL

Where Does This Blog Fit in ?

Structure of the blog

Sources & References

Further Reading

Deep Reinforcement Learning - Part 3 - Dynamic Programming

Deep Reinforcement Learning - Part 4 - Monte Carlo, Temporal Difference & Bootstrapping Methods

Deep Reinforcement Learning - Part 2 - Finite MDP

Trending Tags