AI Learned to Think

Art Of The Problem

Dec 1, 2024

12 notes

12 Notes in this Video

World Models and Algorithms: Core Components of Reasoning
TD-Gammon: Neural Network Position Intuition
AlphaGo Move Intuition: Policy Networks for Promising Moves
Monte Carlo Tree Search: Random Rollouts for Move Evaluation
AlphaGo Zero: Learning Without Human Games
Learned World Models: Discovering Physics from Experience
MuZero: Game-Agnostic Learning from Rewards Only
Chain of Thought: Step-by-Step Reasoning Prompts
Tree of Thought: Brainstorming Multiple Reasoning Paths
Monte Carlo for Reasoning: Adapting Game AI to Logic
Reinforcement Learning for Reasoning: Step-by-Step Feedback
Test-Time Compute: Thinking Longer Improves Performance

World Models and Algorithms: Core Components of Reasoning

WorldModels Algorithms Reasoning AiArchitecture

Computer science researchers developing machine reasoning systems require two essential components for any domain whether board games or general problem-solving—world models that simulate environments and algorithms that make decisions using those models.

TD-Gammon: Neural Network Position Intuition

TdGammon NeuralNetworks PositionEvaluation SelfPlay

The 1989 TD-Gammon system marked the first major breakthrough in machines mimicking human position intuition by using neural networks instead of hand-coded formulas, replacing Shannon’s traditional approach with learned evaluation functions.

AlphaGo Move Intuition: Policy Networks for Promising Moves

AlphaGo PolicyNetworks MoveIntuition GoAI

Researchers Clark and Storkey introduced move intuition learning in 2014 by training neural networks to predict human moves, while AlphaGo later combined this approach with position evaluation and Monte Carlo tree search to achieve superhuman Go performance.

Monte Carlo Tree Search: Random Rollouts for Move Evaluation

MonteCarloTreeSearch Rollouts GameSearch Sampling

Bruce Abramson proposed Monte Carlo research in 1987 as a radical alternative to exhaustive game tree search, with the breakthrough coming when researchers combined it with neural networks for position and move intuition to guide rollouts toward promising directions.

AlphaGo Zero: Learning Without Human Games

AlphaGoZero SelfPlay TabulaRasa ReinforcementLearning

DeepMind researchers created AlphaGo Zero to address concerns that learning from human games might enable cheating, developing a system that began with zero knowledge and learned entirely from self-play using only wins and losses as feedback.

Learned World Models: Discovering Physics from Experience

WorldModels ModelLearning NeuralSimulators DreamingAi

Ha and Schmidhuber demonstrated learned world models with neural networks in their 2018 World Models paper, building on Schmidhuber’s late 1980s ideas about learning game rules by watching play, training systems to act as simulators predicting future states.

MuZero: Game-Agnostic Learning from Rewards Only

MuZero TransferLearning GeneralAi GameAgnostic

DeepMind researchers developed MuZero immediately after the 2018 World Models paper to fully generalize AlphaGo Zero’s approach, creating systems that learn any game without being told rules, discovering everything entirely from experience of rewards.

Chain of Thought: Step-by-Step Reasoning Prompts

ChainOfThought Prompting Llm Reasoning

Researchers discovered that large language models trained to predict human-generated data across the entire web could effectively simulate any world model needed given context, revealing surprising capabilities in situation evaluation and action suggestion that enabled reasoning improvements.

Tree of Thought: Brainstorming Multiple Reasoning Paths

TreeOfThought Reasoning Exploration Evaluation

Researchers addressing chain of thought’s limitations developed tree of thought as a form of brainstorming where systems explore multiple reasoning paths rather than following just one chain, using language models themselves to evaluate which paths seem most promising.

Monte Carlo for Reasoning: Adapting Game AI to Logic

ReasoningSearch MonteCarloReasoning ThoughtSearch Convergence

Researchers merged gameplay reasoning techniques with language understanding by adapting game AI methods like Monte Carlo tree search to explore chains of thought, bringing together decades of game AI progress with recent language model capabilities.

Reinforcement Learning for Reasoning: Step-by-Step Feedback

ReinforcementLearning ReasoningFeedback VerificationRewards Learning

Researchers following OpenAI’s “Let’s verify step by step” paper turned to reinforcement learning to strengthen reasoning strategies, drawing parallels to how humans improve reasoning through social cues like nodding understanding or furrowed brows of confusion from teachers.

Test-Time Compute: Thinking Longer Improves Performance

TestTimeCompute ScalingLaws Deliberation ComputeEfficiency

Researchers confirmed the insight that thinking longer improves performance, discovering clear relationships between computation spent during reasoning processes measured in words or tokens generated during internal deliberation and resulting accuracy in problem-solving.