Sarwan Ali

Home CV Google Scholar Teaching Papers Discussion

Introduction To Reinforcement Learning

Topic	Slides	Video
Topic 1.1: Foundations What is reinforcement learning? Agent-environment interaction	Slides	-
Topic 1.2: Foundations Comparison with supervised and unsupervised learning	Slides	-
Topic 1.3: Foundations Key concepts: rewards, states, actions, policies	Slides	-
Topic 1.4: Foundations Examples of RL applications (games, robotics, recommendation systems)	Slides	-
Topic 2.1: Markov Decision Processes (MDPs) Markov property and Markov chains	Slides	-
Topic 2.2: Markov Decision Processes (MDPs) Finite MDPs: states, actions, rewards, transition probabilities	Slides	-
Topic 2.3: Markov Decision Processes (MDPs) Return, discounting, and value functions	-	-
Topic 2.4: Markov Decision Processes (MDPs) Bellman equations for state and action values	-	-
Topic 2.5: Markov Decision Processes (MDPs) Optimal policies and optimal value functions	-	-
Topic 3.1: Dynamic Programming Policy evaluation (prediction problem)	-	-
Topic 3.2: Dynamic Programming Policy improvement and policy iteration	-	-
Topic 3.3: Dynamic Programming Value iteration	-	-
Topic 3.4: Dynamic Programming Asynchronous dynamic programming	-	-
Topic 3.5: Dynamic Programming Generalized policy iteration	-	-
Topic 4.1: Model-Free Prediction Monte Carlo methods for value estimation	-	-
Topic 4.2: Model-Free Prediction Temporal difference (TD) learning	-	-
Topic 4.3: Model-Free Prediction TD(0) algorithm	-	-
Topic 4.4: Model-Free Prediction Comparison of MC and TD methods	-	-
Topic 4.5: Model-Free Prediction n-step TD methods	-	-
Topic 5.1: Model-Free Control Monte Carlo control methods	-	-
Topic 5.2: Model-Free Control On-policy vs off-policy learning	-	-
Topic 5.3: Model-Free Control SARSA (State-Action-Reward-State-Action)	-	-
Topic 5.4: Model-Free Control Q-learning	-	-
Topic 5.5: Model-Free Control Expected SARSA	-	-
Topic 5.6: Model-Free Control Exploration vs exploitation strategies (ε-greedy, softmax)	-	-
Topic 6.1: Function Approximation Need for function approximation in large state spaces	-	-
Topic 6.2: Function Approximation Linear function approximation	-	-
Topic 6.3: Function Approximation Gradient Monte Carlo and TD methods	-	-
Topic 6.4: Function Approximation Feature construction and basis functions	-	-
Topic 6.5: Function Approximation Convergence issues with function approximation	-	-
Topic 7.1: Deep Reinforcement Learning Neural networks as function approximators	-	-
Topic 7.2: Deep Reinforcement Learning Deep Q-Networks (DQN)	-	-
Topic 7.3: Deep Reinforcement Learning Experience replay and target networks	-	-
Topic 7.4: Deep Reinforcement Learning Double DQN, Dueling DQN	-	-
Topic 7.5: Deep Reinforcement Learning Policy gradient methods introduction	-	-
Topic 8.1: Policy Gradient Methods REINFORCE algorithm	-	-
Topic 8.2: Policy Gradient Methods Actor-critic methods	-	-
Topic 8.3: Policy Gradient Methods Advantage functions	-	-
Topic 8.4: Policy Gradient Methods Proximal Policy Optimization (PPO) overview	-	-
Topic 8.5: Policy Gradient Methods Trust Region Policy Optimization (TRPO) concepts	-	-
Topic 9.1: Advanced Topics Multi-armed bandits	-	-
Topic 9.2: Advanced Topics Exploration strategies (UCB, Thompson sampling)	-	-
Topic 9.3: Advanced Topics Partially observable environments (POMDP introduction)	-	-
Topic 9.4: Advanced Topics Hierarchical reinforcement learning basics	-	-
Topic 10.1: Applications and Case Studies Game playing (AlphaGo, chess, Atari games)	-	-
Topic 10.2: Applications and Case Studies Robotics applications	-	-
Topic 10.3: Applications and Case Studies Autonomous vehicles	-	-
Topic 10.4: Applications and Case Studies Resource allocation and scheduling	-	-
Topic 10.5: Applications and Case Studies Financial trading	-	-
Topic 11.1: Current Research and Future Directions Meta-learning in RL	-	-
Topic 11.2: Current Research and Future Directions Multi-agent reinforcement learning	-	-
Topic 11.3: Current Research and Future Directions Safe reinforcement learning	-	-
Topic 11.4: Current Research and Future Directions Real-world deployment challenges	-	-
Topic 11.5: Current Research and Future Directions Open research problems	-	-