AI Primer

Machine learning · Module 12

Reinforcement learning

Learn to act by trial and error, guided by reward.

Agent → takes action → environment responds → agent receives reward → agent updates policy.

Loop millions of times. The policy is the learned strategy.

Game-playing

AlphaGo, chess engines, StarCraft AIs.

Robotics

Walking, grasping, dexterous manipulation.

Trading

Allocating capital across actions under uncertainty (Egbert’s domain).

LLM fine-tuning

RLHF — reward signal comes from human preference judgements.

Agent loop cycling through action, environment response, reward, and policy update
Draft for Pooneh review: reinforcement learning as repeated action and feedback.

Reinforcement learning check

0 of 1 questions completed locally.

1. What guides a reinforcement-learning agent toward better behavior?

Answer feedback appears here.

Reader progress is stored locally in this browser.

Source slide 13