But what is a neural network? | Deep learning chapter 1

3blue1brown
Oct 5, 2017
12 notes
12 Notes in this Video

MNIST Digit Recognition: Canonical Neural Network Benchmark

MNIST DigitRecognition PatternRecognition MachineLearning ImageRecognition
0:49

The MNIST dataset contains handwritten digits rendered at 28x28 pixel resolution, serving as a classic introductory problem for machine learning. Researchers and students worldwide use this dataset to learn neural network concepts and test algorithms.

Neuron Activation: Numerical Representation in Neural Networks

Neuron Activation NeuralNetwork InformationProcessing
1:20

Artificial neurons serve as the fundamental computational units in neural networks, each holding a single numerical value called its activation. In the MNIST example, 784 input neurons each represent one pixel’s grayscale value.

Neural Network Architecture: Layered Structure for Pattern Recognition

NeuralNetwork DeepLearning Architecture LayeredStructure
1:39

Neural networks organize computational units called neurons into distinct layers that transform input data into meaningful outputs. The MNIST digit recognition network exemplifies this structure with 784 input neurons, two hidden layers of 16 neurons each, and 10 output neurons.

Feed-Forward Process: Sequential Layer-by-Layer Computation

FeedForward NeuralNetwork InformationFlow Computation
2:14

The feed-forward process defines how neural networks transform input data into output predictions by propagating activations sequentially through layers. A trained network performs this process to classify images, recognize speech, or generate predictions.

Hierarchical Feature Learning: Building Abstraction Through Layers

HierarchicalLearning Abstraction FeatureExtraction DeepLearning Representation
2:36

Layered neural networks build progressively abstract representations by combining simple features into complex patterns across successive layers. The MNIST network exemplifies this with layers potentially learning edges, then shapes, then complete digits.

Edge Detection in Neural Networks: Low-Level Feature Recognition

EdgeDetection FeatureExtraction Pattern NeuralNetwork ComputerVision
3:10

Neurons in early hidden layers can learn to detect simple visual features like edges through appropriate weight configurations. The second layer of the MNIST network potentially contains neurons specialized for recognizing edges at various positions and orientations.

Weighted Connections: Parameter-Controlled Information Flow

Weight Connection NeuralNetwork Parameter WeightedSum
3:50

Each connection between neurons in adjacent layers carries a weight parameter that determines how strongly one neuron’s activation influences another. A single neuron in the second layer connects to all 784 first-layer neurons, each connection having its own weight.

Sigmoid Activation Function: Squishing Real Numbers to Probabilities

ActivationFunction Sigmoid LogisticCurve NeuralNetwork
4:39

The sigmoid function, also called the logistic curve, serves as a mathematical transformation applied to weighted sums in neural networks. Early neural networks predominantly used sigmoid activation, though modern architectures favor alternatives.

Bias in Neural Networks: Activation Threshold Control

Bias NeuralNetwork Threshold ActivationFunction
4:52

Each neuron in hidden and output layers possesses a bias parameter that shifts the activation threshold. The bias acts as an additional learnable parameter independent of incoming connections.

Network Parameters: Learnable Weights and Biases

Parameter Weight Bias Learning NeuralNetwork Optimization
5:20

Neural networks contain thousands or millions of learnable parameters—weights and biases—that training algorithms adjust to solve specific tasks. The MNIST example network has approximately 13,000 total parameters requiring optimization.

Matrix-Vector Multiplication: Compact Neural Network Representation

LinearAlgebra MatrixMultiplication NeuralNetwork Mathematics Computation
6:04

Neural network computations express layer transitions through matrix-vector multiplication combined with activation functions. Machine learning libraries optimize these linear algebra operations for computational efficiency.

ReLU Activation: Rectified Linear Unit in Modern Networks

ReLU ActivationFunction NeuralNetwork DeepLearning
7:57

The Rectified Linear Unit (ReLU) functions as the dominant activation function in modern deep neural networks, replacing sigmoid in most architectures. Researchers adopted ReLU after discovering it makes training deep networks significantly easier.