Live course

Fundamentals of Reinforcement Learning

Bandits, Bellman equations, Monte Carlo, TD learning, and planning.

An applied, foundations-first introduction to reinforcement learning for engineers and technically curious learners.

The course uses a small set of concrete environments and works the algorithms through carefully, often with actual numbers rather than only abstract notation.

It starts with bandits and Markov decision processes, then moves through Bellman equations, dynamic programming, Monte Carlo methods, temporal-difference learning, Q-learning, and tabular planning.

View on Udemy

What the live course covers

The current Udemy syllabus follows the live MVP course structure in Notion, moving from the simplest bandit problems through Bellman equations, dynamic programming, Monte Carlo methods, TD learning, and planning.

Introduction and RL framing

Start with the reinforcement-learning setup itself before moving into algorithms.

Introduction // tour of the repo
Goals, rewards, returns and episodes
From bandits to MDPs
From bandits to MDPs // real world examples
From bandits to MDPs // Frozen Lake walkthrough
Policies and value functions

K-armed bandits

Build intuition for explore-exploit tradeoffs before adding state.

Section intro
Environment intro // KAB testbed
Environment walkthrough // KAB testbed
Hands on: K-armed bandits
Action-value methods // part 1 - greedy
Action-value methods // part 2 - epsilon-greedy
Action-value methods // part 3 - efficient implementation details
Optimistic initial values
Upper-confidence-bound action selection
Non-stationary bandits

Markov decision processes and Bellman equations

Move from bandits to full sequential decision problems with value functions and Bellman reasoning.

Section intro
Bellman equations // P1: expectation equation for v(s)
Bellman equations // P2: expectation equation for q(s, a)
Bellman equations // P3: optimality equation
Walkthrough // Bellman expectation
Walkthrough // Bellman optimality
Frozen Lake - Bellman equation analytic solutions
Walkthrough // Matrix inversion and summary

Dynamic programming

Solve known-model environments by alternating evaluation and improvement.

Section intro
Environment intro // Jack's Car Rental
Environment walkthrough// Jack's Car Rental
Hands on: Jack's Car Rental
Intro to dynamic programming
Policy evaluation // intro
Policy evaluation // walkthrough
Policy improvement // intro & proof
Policy improvement // walkthrough
Policy iteration
Value iteration // intro
Value iteration // walkthrough

Monte Carlo methods

Learn model-free prediction and control from full returns and sample episodes.

Section intro
Monte Carlo example (pentagram)
Intro to Monte Carlo
Environment intro // Blackjack
Environment walkthrough // Blackjack
Hands on: Blackjack
Monte Carlo prediction
MC Control: Exploring Starts
MC Control: On Policy
MC Control: Off Policy // prelims
MC Control: Off Policy // prediction & control

TD learning and planning

Bridge the Monte Carlo and dynamic-programming views, then add model-based planning.

Section intro - screencast // part 1 - preamble
Section intro - screencast // part 2 - formal intro
Intro to TD
Environment walkthrough // Cliff Walking
Hands on: Cliff walking
TD prediction
Q-learning
Intro to Planning
Tabular Dyna-Q // algo intro
Tabular Dyna-Q // walkthrough
Tabular Dyna-Q+

Close

Conclusion & congratulations

Student feedback

Selected written 5-star Udemy reviews. Many other ratings do not include text feedback.

Featured review

WHAT have you done ?!

You turned a very complicated topic into a very smooth & clear material, not easy...but really clear and smooth. The course is truly helpful. Indeed as the name suggests, it is "Fundamental".

I am an automation engineer and I got my graduation degree many years ago. AI was not much introduced with such details in educational programs (back then). I almost only just heard about the term AI in college. So, I was struggling for some duration in the past months with the available material online (books, videos, etc.). But, truly, your effort in that course deserves every "Thank you".

For me, when I first encountered the algorithms in many books, it just seemed somehow complicated with all these symbols. But, you really made every point clear as much as possible.

As you mentioned in the introduction, you used one simple example (frozen lake) and patiently you showed us how to actually apply different algorithms by solving the example with actual numbers each time. Not many books did this!

There are 3 things I would like to highlight: 1) The "Recap" at the beginning of each section & "summary" at the end are very core factors that helped me to memorize our road map through sections and lectures. Thank you for choosing that strategy. 2) You took care of small details like the different in notation between CAPTIAL and small letters. For sorry, it was my first time to understand such differences in you lectures. It is in my nature that if I don't understand the notations very well, I get lost & also lose my focus on the topic. Again, thank you for this. 3) DON'T stop...Please continue the amazing work you are doing! There are some few steps left (:D). If you in the future posted more lectures, I will be glad and honored to subscribe. It doesn't even have to be this large (10 Hrs). May be one lecture per algorithm will be very fine. I will be the first to buy & subscribe. I hope you consider continuing your amazing work.

Now, I have to look for a good material that explains TD3 & SAC algorithms. See, that's why I recommend you to continue, as I am going to struggle again.

I highly recommend your course to anyone who needs to understand the fundamentals.

Very Helpful & amazing work. Thank you, Tom.

Mohamed | 5 months ago | 5 / 5

Truly amazed and appreciate the effort that went in to make the concepts clear. I have looked at many RL courses till date, but by far this is simply the best course to understand the foundations of RL. Looking forward to more courses from Tom.

Raja | 4 months ago | 5 / 5

The course makes it easier for me to understand RL. Better than my university's class.

Rany | 6 months ago | 5 / 5

In reinforcement learning terms, me enrolling to this course was a bit of an "exploratory action" in my own learning policy :), I'm halfway through and I'm very happy with it. I'm working towards expanding my skill set as a game developer / creative technologist and it's been a great introduction to RL concepts.

Sina | 7 months ago | 5 / 5

In this course you have the opportunity to see how the concepts comes to play while coding.

Luan Assis | 5 months ago | 5 / 5

Course is very good to understand the math and main concept of RL. I wish we had more practical coding part of it. Assignments are good but coding sessions are always good.

Isaac Matthew | 10 months ago | 5 / 5

Very clear explanations of the theory behind the techniques, and a great code repo with exercises to cement and build on what's taught in the lectures. An excellent introduction to the world of Reinforcement Learning!

Mark | 1 year ago | 5 / 5

Absolutely amazing course. I have learnt so much about reinforcement learning! I would recommend to anyone wanting to learn more about AI. I had relatively low starting level but found the explanations really clear and helpful.

Jack | 1 year ago | 5 / 5

Solid.

Andrey | 1 year ago | 5 / 5