Machine learning · Module 12

Reinforcement learning

Learn to act by trial and error, guided by reward.

Agent → takes action → environment responds → agent receives reward → agent updates policy.

Loop millions of times. The policy is the learned strategy.

Game-playing

AlphaGo, chess engines, StarCraft AIs.

Robotics

Walking, grasping, dexterous manipulation.

Trading

Allocating capital across actions under uncertainty (Egbert’s domain).

LLM fine-tuning

RLHF — reward signal comes from human preference judgements.

Reinforcement learning check

0 of 1 questions completed locally.

1. What guides a reinforcement-learning agent toward better behavior?

A reward signal from the environmentOnly static input-output labelsRandom actions without feedback

Answer feedback appears here.

Reader progress is stored locally in this browser.

Source slide 13