Topic | Slides | Video |
---|---|---|
Topic 1.1: Foundations
What is reinforcement learning? Agent-environment interaction |
Slides | - |
Topic 1.2: Foundations
Comparison with supervised and unsupervised learning |
Slides | - |
Topic 1.3: Foundations
Key concepts: rewards, states, actions, policies |
Slides | - |
Topic 1.4: Foundations
Examples of RL applications (games, robotics, recommendation systems) |
Slides | - |
Topic 2.1: Markov Decision Processes (MDPs)
Markov property and Markov chains |
Slides | - |
Topic 2.2: Markov Decision Processes (MDPs)
Finite MDPs: states, actions, rewards, transition probabilities |
Slides | - |
Topic 2.3: Markov Decision Processes (MDPs)
Return, discounting, and value functions |
- | - |
Topic 2.4: Markov Decision Processes (MDPs)
Bellman equations for state and action values |
- | - |
Topic 2.5: Markov Decision Processes (MDPs)
Optimal policies and optimal value functions |
- | - |
Topic 3.1: Dynamic Programming
Policy evaluation (prediction problem) |
- | - |
Topic 3.2: Dynamic Programming
Policy improvement and policy iteration |
- | - |
Topic 3.3: Dynamic Programming
Value iteration |
- | - |
Topic 3.4: Dynamic Programming
Asynchronous dynamic programming |
- | - |
Topic 3.5: Dynamic Programming
Generalized policy iteration |
- | - |
Topic 4.1: Model-Free Prediction
Monte Carlo methods for value estimation |
- | - |
Topic 4.2: Model-Free Prediction
Temporal difference (TD) learning |
- | - |
Topic 4.3: Model-Free Prediction
TD(0) algorithm |
- | - |
Topic 4.4: Model-Free Prediction
Comparison of MC and TD methods |
- | - |
Topic 4.5: Model-Free Prediction
n-step TD methods |
- | - |
Topic 5.1: Model-Free Control
Monte Carlo control methods |
- | - |
Topic 5.2: Model-Free Control
On-policy vs off-policy learning |
- | - |
Topic 5.3: Model-Free Control
SARSA (State-Action-Reward-State-Action) |
- | - |
Topic 5.4: Model-Free Control
Q-learning |
- | - |
Topic 5.5: Model-Free Control
Expected SARSA |
- | - |
Topic 5.6: Model-Free Control
Exploration vs exploitation strategies (ε-greedy, softmax) |
- | - |
Topic 6.1: Function Approximation
Need for function approximation in large state spaces |
- | - |
Topic 6.2: Function Approximation
Linear function approximation |
- | - |
Topic 6.3: Function Approximation
Gradient Monte Carlo and TD methods |
- | - |
Topic 6.4: Function Approximation
Feature construction and basis functions |
- | - |
Topic 6.5: Function Approximation
Convergence issues with function approximation |
- | - |
Topic 7.1: Deep Reinforcement Learning
Neural networks as function approximators |
- | - |
Topic 7.2: Deep Reinforcement Learning
Deep Q-Networks (DQN) |
- | - |
Topic 7.3: Deep Reinforcement Learning
Experience replay and target networks |
- | - |
Topic 7.4: Deep Reinforcement Learning
Double DQN, Dueling DQN |
- | - |
Topic 7.5: Deep Reinforcement Learning
Policy gradient methods introduction |
- | - |
Topic 8.1: Policy Gradient Methods
REINFORCE algorithm |
- | - |
Topic 8.2: Policy Gradient Methods
Actor-critic methods |
- | - |
Topic 8.3: Policy Gradient Methods
Advantage functions |
- | - |
Topic 8.4: Policy Gradient Methods
Proximal Policy Optimization (PPO) overview |
- | - |
Topic 8.5: Policy Gradient Methods
Trust Region Policy Optimization (TRPO) concepts |
- | - |
Topic 9.1: Advanced Topics
Multi-armed bandits |
- | - |
Topic 9.2: Advanced Topics
Exploration strategies (UCB, Thompson sampling) |
- | - |
Topic 9.3: Advanced Topics
Partially observable environments (POMDP introduction) |
- | - |
Topic 9.4: Advanced Topics
Hierarchical reinforcement learning basics |
- | - |
Topic 10.1: Applications and Case Studies
Game playing (AlphaGo, chess, Atari games) |
- | - |
Topic 10.2: Applications and Case Studies
Robotics applications |
- | - |
Topic 10.3: Applications and Case Studies
Autonomous vehicles |
- | - |
Topic 10.4: Applications and Case Studies
Resource allocation and scheduling |
- | - |
Topic 10.5: Applications and Case Studies
Financial trading |
- | - |
Topic 11.1: Current Research and Future Directions
Meta-learning in RL |
- | - |
Topic 11.2: Current Research and Future Directions
Multi-agent reinforcement learning |
- | - |
Topic 11.3: Current Research and Future Directions
Safe reinforcement learning |
- | - |
Topic 11.4: Current Research and Future Directions
Real-world deployment challenges |
- | - |
Topic 11.5: Current Research and Future Directions
Open research problems |
- | - |