Why Everything Rolls Downhill: Energy, Entropy, and Inference

Richard Feynman Clarifying physics
Energy Entropy FreeEnergy Thermodynamics InformationTheory BrainFunction StatisticalMechanics
Outline

Why Everything Rolls Downhill: Energy, Entropy, and Inference

Here’s a puzzle that kept me up at night: Why does a ball roll downhill?

You might say “gravity,” and sure, that’s correct. But let me ask it differently: Why does the ball stop at the bottom? Why doesn’t it just keep rolling forever, or bounce back up to where it started?

The answer is that the ball is minimizing its potential energy. It settles into the lowest energy state available to it. And here’s what’s really remarkable—this same principle shows up everywhere. Your brain trying to make sense of a blurry image? It’s minimizing an energy function. A neural network learning to recognize patterns? Energy minimization. A black hole radiating away into space? Same thing. Even the arrow of time itself comes down to this.

At first, this sounds like mystical nonsense—as if I’m claiming “energy” is some universal force that explains everything. But it’s not mystical at all. It’s just physics being lazy in different disguises. Let me show you how the same mathematical pattern keeps reappearing, from rolling balls to thinking brains to evaporating black holes.

Everything Wants to Roll Downhill

Start with the simplest case: physical systems minimize energy. A ball on a hill doesn’t need instructions to roll down—it just does. The lower energy state is where it naturally ends up.

But why? The answer comes from statistical mechanics and a beautiful result called the Boltzmann distribution. Imagine particles randomly hopping between energy levels through thermal collisions. The probability of jumping up in energy by an amount ΔE\Delta E goes like repeated coin flips, which gives you something proportional to exp(ΔE/T)\exp(-\Delta E / T), where TT is temperature. Lower energy states become exponentially more probable than higher energy ones.

Here’s the key insight: if you want to know the absolute probability of finding a system in some state ss with energy EsE_s, you need to normalize by summing over all possible states. This gives you the Boltzmann distribution:

p(s)=exp(Es/T)Zp(s) = \frac{\exp(-E_s/T)}{Z}

where Z=sexp(Es/T)Z = \sum_s \exp(-E_s/T) is called the partition function. This isn’t just some abstract formula—it’s nature’s way of saying that systems overwhelmingly prefer low-energy configurations. The ball rolls downhill not because it’s “trying” to minimize energy, but because there are exponentially more microscopic arrangements (microstates) that correspond to it being at the bottom than at the top.

This is where entropy enters the picture. Entropy isn’t really about “disorder”—that’s a terrible analogy that confuses everyone. Entropy is about counting configurations. When we look at a gas in a box, we can’t tell the difference between billions of different microscopic arrangements of the molecules. They all look the same to us. Entropy measures how many indistinguishable microstates correspond to what we observe macroscopically.

And here’s why systems evolve toward high entropy: it’s pure statistics. If there are a trillion ways for the gas to spread out evenly through the box, but only a handful of ways for it to cluster in one corner, then random molecular motion will naturally explore the vast space of high-entropy states. The gas isn’t “seeking” disorder—it’s just that most of the available microscopic configurations happen to look disordered to our coarse-grained observations.

The Same Trick in Different Disguises

Now let’s see how this exact same principle shows up in completely different contexts.

Take your brain. When you look at something blurry and your visual system tries to figure out what it is, what’s really happening? According to the free energy principle, your brain is acting like a prediction machine that minimizes variational free energy. This quantity measures the mismatch between what your brain expects to see and what your senses are actually telling you.

Free energy has two components: how surprising the sensory input is given your model of the world, and how complex your model is. Your brain is constantly trying to reduce this free energy by either updating its internal predictions to better match the data, or by changing what you pay attention to in order to make the data match your predictions. Learning, perception, and even action all emerge from this single principle of minimizing surprise and uncertainty.

But wait—this sounds like we’ve left physics behind and entered psychology. Not at all! The “free energy” here is mathematically related to the Helmholtz free energy from thermodynamics. In both cases, you’re dealing with the balance between energy (or surprisal) and entropy (or uncertainty). The brain’s free energy is essentially cross-entropy—the average surprise you experience when your model doesn’t match reality.

Cross-entropy captures exactly this: if data are generated by some true process PP but you interpret them using a wrong model QQ, then the cross-entropy H(P,Q)=xp(x)logq(x)H(P,Q) = -\sum_x p(x) \log q(x) tells you how costly your mismatch is on average. Think of it this way: if you believe a coin is fair when it’s actually biased 90-10, you’ll be shocked constantly by the outcomes. That shock—that surprise—is quantified by cross-entropy. And your brain is constantly working to minimize it.

The connection to physics runs even deeper. Neural networks that store memories, like Hopfield networks, explicitly define an energy function over patterns of neural activity. Stored memories correspond to low-energy valleys in this landscape. When you give the network a partial or noisy cue—maybe you hear the first few notes of a song—the network’s dynamics naturally flow downhill toward the nearest memory. Retrieval isn’t a search through a database; it’s literally rolling downhill in an energy landscape. The mathematics is identical to the ball on the hill, just in a much higher-dimensional space.

The Boltzmann distribution appears here too. If you add noise to the network, states with lower energy become exponentially more probable, exactly like in statistical mechanics. The partition function normalizes everything so the probabilities sum to one. What we call “temperature” in the neural network controls how sharply probability concentrates around the minima—low temperature means the system settles deterministically into the deepest valley, while high temperature lets it explore more randomly.

Why Black Holes Evaporate and Time Moves Forward

Now here’s where things get truly wild: even black holes obey these principles.

Black holes aren’t just exotic objects in space—they’re thermodynamic systems with temperature and entropy. Hawking showed that black holes emit thermal radiation with a perfect black-body spectrum. Small black holes are hot and radiate energetically; large ones are cold. What’s bizarre is that as a black hole radiates energy and loses mass, it gets hotter, causing it to evaporate faster in a runaway process. It’s the opposite of cooling down when you lose energy, which is what normal objects do.

Why does this happen? Because black hole entropy is proportional to the area of its event horizon. When the black hole radiates, it loses mass and shrinks, which reduces both its horizon area and its entropy. But the second law of thermodynamics demands that total entropy increases. The radiation carries away more entropy than the hole loses, so the overall process increases entropy in the universe—just like everything else.

Here’s the deeper point: black hole thermodynamics isn’t a curiosity. It’s evidence that entropy and energy minimization are universal principles that transcend specific physical systems. Whether we’re talking about gas molecules, neural networks, or spacetime curvature near horizons, the same mathematical structure keeps appearing.

And what about the arrow of time itself? Why does time seem to flow forward, from past to future, when the fundamental laws of physics are time-symmetric? The answer is entropy. The second law of thermodynamics—entropy tends to increase—gives us the arrow. Low-entropy states (like an intact ice cube) evolve toward high-entropy states (like a puddle of water) simply because there are overwhelmingly more high-entropy configurations for the system to explore. Time’s arrow isn’t built into the laws of motion; it’s a statistical consequence of starting from a low-entropy initial condition.

The Mathematical Unification

There’s one more layer to this. Physicists have a tool called the Legendre transform that lets us switch between different “perspectives” on the same energy. In thermodynamics, you start with internal energy U(S,V,N)U(S,V,N) that depends on entropy SS, volume VV, and particle number NN. By subtracting off TSTS, you get the Helmholtz free energy F(T,V,N)F(T,V,N), which now depends on temperature instead of entropy. Subtract PVPV instead, and you get enthalpy. Subtract both, and you get Gibbs free energy.

These aren’t different physical quantities—they’re the same thermodynamics viewed through different lenses. The choice of which potential to use depends on what you can control experimentally. If you’re working at constant temperature and pressure (like most chemistry), you minimize Gibbs free energy. If you’re at constant temperature and volume, you minimize Helmholtz free energy. The physics doesn’t care which one you use; they all encode the same information in different coordinate systems.

This same mathematical structure—swapping variables via Legendre transforms—shows up in classical mechanics when you go from the Lagrangian to the Hamiltonian, and it shows up in the brain’s free energy when you switch between different generative models. It’s a general tool for changing perspective on the same underlying dynamics.

Why does this matter? Because it shows that the proliferation of different “energies” and “free energies” across physics, neuroscience, and machine learning isn’t a sign of confusion. They’re all variations on the same theme: systems minimize some quantity that balances energy against entropy, prediction against uncertainty, order against exploration.

Nature’s Fundamental Laziness

So what’s really going on here? Why does this energy-minimization principle unify so much?

I think the answer is that nature is fundamentally lazy. Not lazy in a bad way—lazy in the sense that systems naturally settle into the states that are easiest to maintain, given the constraints and the available configurations. This isn’t teleology; systems aren’t “trying” to minimize energy. It’s just that the low-energy, high-entropy states are where the statistics point.

Energy, information, and surprise are all different ways of measuring the same thing: how unlikely or how costly a particular configuration is. The Boltzmann distribution tells us that low-energy states are more probable. Entropy tells us that high-configuration-count states are more probable. Cross-entropy tells us that low-surprise models are better. These are three lenses on the same statistical truth.

Negative feedback—the kind of dynamics that drive systems toward target states—emerges from this naturally. A thermostat doesn’t “want” your house at 68 degrees; it just corrects deviations because the control system is designed to minimize the error signal. Your brain doesn’t “want” to predict correctly; it just minimizes free energy, and correct predictions happen to have lower free energy than incorrect ones. The ball doesn’t “want” to be at the bottom of the hill; it just has nowhere else to go that’s more probable.

Even positive feedback—the kind that amplifies deviations rather than correcting them—fits into this picture. In physics, positive feedback often corresponds to unstable equilibria: the ball on top of a hill rather than at the bottom. In brains, positive feedback can drive learning or runaway anxiety spirals. In black holes, it’s the accelerating evaporation as they shrink. But in all cases, the system is still flowing through the landscape defined by energy or free energy; it’s just flowing away from a peak rather than toward a valley.

The unification isn’t mystical. It’s not that “everything is energy” in some vague New Age sense. It’s that statistical mechanics—the study of how probability flows through configuration space—underlies an enormous range of phenomena. Once you see this pattern, you can’t unsee it. From balls rolling downhill to brains making inferences to black holes evaporating into thermal radiation, the same mathematics is doing the work.

The first principle is not to fool yourself—and you are the easiest person to fool. So don’t let the different names confuse you. Free energy, variational energy, Helmholtz energy, cross-entropy—they’re all telling you the same story about how systems explore the space of possible states and settle where probability density is highest. That’s why everything rolls downhill. It’s not magic. It’s just statistics, dressed up in different clothes.

Source Notes

8 notes from 4 channels