Home CV Google Scholar Teaching Papers Discussion


Introduction To Reinforcement Learning

Topic Slides Video
Topic 1.1: Foundations
What is reinforcement learning? Agent-environment interaction
-
Topic 1.2: Foundations
Comparison with supervised and unsupervised learning
-
Topic 1.3: Foundations
Key concepts: rewards, states, actions, policies
-
Topic 1.4: Foundations
Examples of RL applications (games, robotics, recommendation systems)
-
Topic 2.1: Markov Decision Processes (MDPs)
Markov property and Markov chains
-
Topic 2.2: Markov Decision Processes (MDPs)
Finite MDPs: states, actions, rewards, transition probabilities
-
Topic 2.3: Markov Decision Processes (MDPs)
Return, discounting, and value functions
- -
Topic 2.4: Markov Decision Processes (MDPs)
Bellman equations for state and action values
- -
Topic 2.5: Markov Decision Processes (MDPs)
Optimal policies and optimal value functions
- -
Topic 3.1: Dynamic Programming
Policy evaluation (prediction problem)
- -
Topic 3.2: Dynamic Programming
Policy improvement and policy iteration
- -
Topic 3.3: Dynamic Programming
Value iteration
- -
Topic 3.4: Dynamic Programming
Asynchronous dynamic programming
- -
Topic 3.5: Dynamic Programming
Generalized policy iteration
- -
Topic 4.1: Model-Free Prediction
Monte Carlo methods for value estimation
- -
Topic 4.2: Model-Free Prediction
Temporal difference (TD) learning
- -
Topic 4.3: Model-Free Prediction
TD(0) algorithm
- -
Topic 4.4: Model-Free Prediction
Comparison of MC and TD methods
- -
Topic 4.5: Model-Free Prediction
n-step TD methods
- -
Topic 5.1: Model-Free Control
Monte Carlo control methods
- -
Topic 5.2: Model-Free Control
On-policy vs off-policy learning
- -
Topic 5.3: Model-Free Control
SARSA (State-Action-Reward-State-Action)
- -
Topic 5.4: Model-Free Control
Q-learning
- -
Topic 5.5: Model-Free Control
Expected SARSA
- -
Topic 5.6: Model-Free Control
Exploration vs exploitation strategies (ε-greedy, softmax)
- -
Topic 6.1: Function Approximation
Need for function approximation in large state spaces
- -
Topic 6.2: Function Approximation
Linear function approximation
- -
Topic 6.3: Function Approximation
Gradient Monte Carlo and TD methods
- -
Topic 6.4: Function Approximation
Feature construction and basis functions
- -
Topic 6.5: Function Approximation
Convergence issues with function approximation
- -
Topic 7.1: Deep Reinforcement Learning
Neural networks as function approximators
- -
Topic 7.2: Deep Reinforcement Learning
Deep Q-Networks (DQN)
- -
Topic 7.3: Deep Reinforcement Learning
Experience replay and target networks
- -
Topic 7.4: Deep Reinforcement Learning
Double DQN, Dueling DQN
- -
Topic 7.5: Deep Reinforcement Learning
Policy gradient methods introduction
- -
Topic 8.1: Policy Gradient Methods
REINFORCE algorithm
- -
Topic 8.2: Policy Gradient Methods
Actor-critic methods
- -
Topic 8.3: Policy Gradient Methods
Advantage functions
- -
Topic 8.4: Policy Gradient Methods
Proximal Policy Optimization (PPO) overview
- -
Topic 8.5: Policy Gradient Methods
Trust Region Policy Optimization (TRPO) concepts
- -
Topic 9.1: Advanced Topics
Multi-armed bandits
- -
Topic 9.2: Advanced Topics
Exploration strategies (UCB, Thompson sampling)
- -
Topic 9.3: Advanced Topics
Partially observable environments (POMDP introduction)
- -
Topic 9.4: Advanced Topics
Hierarchical reinforcement learning basics
- -
Topic 10.1: Applications and Case Studies
Game playing (AlphaGo, chess, Atari games)
- -
Topic 10.2: Applications and Case Studies
Robotics applications
- -
Topic 10.3: Applications and Case Studies
Autonomous vehicles
- -
Topic 10.4: Applications and Case Studies
Resource allocation and scheduling
- -
Topic 10.5: Applications and Case Studies
Financial trading
- -
Topic 11.1: Current Research and Future Directions
Meta-learning in RL
- -
Topic 11.2: Current Research and Future Directions
Multi-agent reinforcement learning
- -
Topic 11.3: Current Research and Future Directions
Safe reinforcement learning
- -
Topic 11.4: Current Research and Future Directions
Real-world deployment challenges
- -
Topic 11.5: Current Research and Future Directions
Open research problems
- -