Machine learning · Module 12
Reinforcement learning
Learn to act by trial and error, guided by reward.
Agent → takes action → environment responds → agent receives reward → agent updates policy.
Loop millions of times. The policy is the learned strategy.
Game-playing
AlphaGo, chess engines, StarCraft AIs.
Robotics
Walking, grasping, dexterous manipulation.
Trading
Allocating capital across actions under uncertainty (Egbert’s domain).
LLM fine-tuning
RLHF — reward signal comes from human preference judgements.
Reinforcement learning check
0 of 1 questions completed locally.
Reader progress is stored locally in this browser.
Source slide 13