The Half-Life of Neurons: Network Pruning as Decay
The Distribution I Observe
When I measured the radiation intensity of pitchblende against pure uranium, the ore emitted far more energy than its uranium content could explain. This anomaly led us to radium—concentrated in trace amounts, yet accounting for nearly all the radioactive signal. The distribution was stark: most of the material was inert; a tiny fraction carried the phenomenon.
I notice the same pattern in trained neural networks. When researchers measure the magnitude of connection weights after learning, they find a heavily skewed distribution: most weights hover near zero, while a small minority achieve large values. This is not a normal bell curve but a power law, lognormal in character. Just as most atoms in pitchblende were stable lead or bismuth while a sliver of radium atoms dominated radiation, most synaptic weights contribute negligibly while a minority encode essential information. The measurements are reproducible across architectures and datasets—the pattern holds.
Selective Decay
In radioactive decay chains, unstable isotopes transform according to exponential laws: N(t) = N₀e^(-λt). Which specific atom decays at any moment is random, yet the population follows a predictable half-life. The phenomenon is selective—unstable nuclei vanish while stable daughter products persist.
Network pruning follows parallel logic. After training, researchers systematically remove connections with the smallest magnitudes—often 90% or more of the total—then briefly retrain to allow remaining weights to compensate. Performance remains nearly intact. The procedure is not arbitrary deletion but selective decay: weak connections, like unstable isotopes, are expendable; strong connections, like stable nuclei, carry the system’s essential function. Experiments in biological memory show similar principles: only 10-20% of neurons in the amygdala, or 2-6% in the dentate gyrus, are recruited into a given memory trace. Sparse engrams emerge through competitive allocation, with intrinsic excitability determining which neurons survive the selection. Most candidates decay from consideration; a minority crystallizes into the engram.
The sparsity itself appears governed by multiplicative dynamics. Dendritic spine sizes fluctuate proportionally to their current magnitude—larger spines grow or shrink by larger absolute amounts. This “rich get richer or poorer” rule naturally produces lognormal weight distributions, concentrating function in a minority of connections just as radioactive series concentrate decay products in final stable nuclei.
Concentration of Function
What emerges from both decay and pruning is a principle: natural systems overgenerate components, then selectively eliminate the unstable or weak, concentrating essential activity in a small fraction. In radium isolation, we processed tons of pitchblende to extract decigrams of pure element—the essential substance was always there, hidden within vast dilution. In neural networks, training begins with random, overparameterized connectivity; learning identifies which weights matter, allowing the rest to decay toward zero. Pruning reveals this structure, removing the noise to expose the signal.
The brain appears to follow the same trajectory during development and learning: synapses proliferate early, then undergo selective pruning. Multiplicative plasticity sculpts the weight distribution continuously, ensuring that a logarithmic, heavy-tailed structure persists. Whether in atomic nuclei, biological synapses, or artificial networks, the pattern recurs—function concentrates in a minority of components while the majority serves as transient scaffolding.
Measured Conclusion
The parallel between radioactive decay and network pruning is not metaphorical but structural. Both systems exhibit power-law distributions of stability or magnitude; both rely on selective elimination to isolate essential elements; both concentrate function in a small fraction of the whole. The measurements demonstrate that this pattern transcends substrate—appearing in atomic physics, neuroscience, and machine learning alike. As with radium, the challenge is not to create the essential structure but to systematically remove what obscures it. Pruning, like decay, is a process of revelation.
Source Notes
6 notes from 1 channel
Source Notes
6 notes from 1 channel