Information Is Physical

Richard Feynman Clarifying technology
Information Entropy Thermodynamics InformationTheory Physics BlackHoles
Outline

Information Is Physical

Here’s a puzzle that bothered me for years: Claude Shannon, working at Bell Labs in the 1940s, defined something called “information entropy” to measure how much data you could squeeze through a telephone wire. Ludwig Boltzmann, working on steam engines in the 1870s, defined “thermodynamic entropy” to measure disorder in gases. They used almost the same mathematical formula: sum over probabilities times logarithms.

Coincidence? Most people think so. “Shannon just borrowed Boltzmann’s math,” they say. “It’s a metaphor.”

They’re wrong. It’s not a metaphor. Information and thermodynamic entropy aren’t just mathematically similar—they’re the same thing. Bits have mass. Erasing information generates heat. This isn’t philosophy; it’s measurable physics. Let me show you why.

Two Entropies, One Formula

Start with the thermodynamic side. What is entropy really counting? Not “disorder”—that’s too vague. Boltzmann figured out the precise answer: entropy counts configurations. How many different microscopic arrangements of atoms produce the same macroscopic appearance?

Take an ice cube melting into water. The ice has low entropy because there are relatively few ways to arrange water molecules that look like solid ice—they need to be locked into a crystal lattice. But liquid water has high entropy because there are vastly more arrangements that look like water sloshing around in a glass. The molecules can be almost anywhere, moving almost any direction, and it still looks like “water” to us.

Why does ice melt? Not because nature “prefers” disorder. It melts because random thermal fluctuations constantly jostle the molecules, exploring different configurations. Given enough time, the system naturally drifts toward states with more available configurations, simply because there are more of them to drift into. High-entropy states aren’t more “natural”—they’re just overwhelmingly more probable.

This is the key insight: entropy increase is statistics, not mysticism. Imagine an image where each pixel randomly fluctuates. Over time, the image becomes grey and homogeneous, not because grey is “better,” but because random images are almost always homogeneous. There are billions of ways to arrange pixels into grey mush, but only a few arrangements that form a recognizable apple. Random fluctuations explore all possibilities equally, so you end up in the vast grey sea.

Now jump to Shannon’s world. He was thinking about communication: how much information does a message contain? His answer was brilliant: information measures surprise. If I tell you something you already knew with certainty, I’ve given you zero information. If I tell you something completely unexpected, I’ve given you maximum information.

Shannon defined surprisal as the negative logarithm of probability. A rare event (low probability) produces high surprisal. A certain event (probability one) produces zero surprisal. Why logarithms? Because surprisals add when probabilities multiply—when two independent things happen, your total surprise is the sum of the individual surprises. The log makes this work mathematically.

Entropy, in Shannon’s framework, is average surprisal. If you have a probability distribution over possible messages, entropy tells you: on average, how surprised will you be when you receive a message? A fair coin flip has high entropy—both outcomes are equally surprising. A loaded coin that always lands heads has low entropy—you’re never surprised.

Look at the formulas:

Boltzmann: S = k log W (entropy equals Boltzmann’s constant times the log of possible microstates)

Shannon: H = -Σ p(x) log p(x) (entropy equals the sum of probabilities times log probabilities)

They’re the same structure. Boltzmann counts configurations; Shannon measures surprise. But counting configurations is measuring surprise—surprise at finding the system in one particular microstate out of all the possibilities.

Information Is Physical

Here’s where it gets wild. In 1961, Rolf Landauer figured out something profound: erasing information generates heat. This isn’t a side effect or engineering limitation—it’s fundamental thermodynamics.

Think about a computer bit. Whether it stores 0 or 1, that’s physical information encoded in some physical state: voltages in transistors, magnetic domains on a disk, whatever. The bit constrains the configuration. If the bit is definitely 1, that rules out all the configurations where it would be 0.

When you erase that bit, you lose information about which configuration you had. You go from knowing something specific (the bit was 1) to knowing less (the bit could be anything). That’s an increase in entropy—you’ve increased the number of possible configurations the system could be in.

But entropy can’t just increase for free in one part of a system without compensating somewhere else. The second law of thermodynamics says total entropy never decreases. So when you erase a bit and increase the entropy of your information, you must increase the physical entropy somewhere else. That means generating heat.

Landauer calculated the minimum heat: at least kT ln 2 per bit erased, where k is Boltzmann’s constant, T is temperature, and ln 2 comes from halving the number of configurations. At room temperature, that’s about 3 × 10⁻²¹ joules per bit—tiny, but measurable. Experiments have confirmed this. Information truly is physical.

This changes everything. You can’t think of information as some abstract thing floating above physical reality. Every bit of information corresponds to physical configurations being constrained. Learning something collapses possibilities—you eliminate configurations from consideration. Forgetting something (or erasing data) releases those constraints and increases physical entropy.

Black holes make this concrete. Stephen Hawking showed that black holes have entropy proportional to their surface area, not their volume. A black hole’s entropy is enormous—a stellar-mass black hole has more entropy than all the gas in a galaxy. Where does this entropy come from?

String theory provides an answer by counting configurations. Just like Boltzmann counted atomic arrangements in a gas, string theorists count the number of string configurations that correspond to a black hole state. They found that counting these microscopic string states produces exactly Hawking’s formula for black hole entropy. The black hole’s entropy is literally counting how many ways you can arrange strings to make that black hole.

But here’s the deeper point: the black hole’s information is encoded on its surface—the event horizon. This suggests something radical called the holographic principle: maybe all the information about what’s inside a volume of space is encoded on its boundary. The universe might be holographic, with 3D reality emerging from 2D information.

This is information as physical as it gets. Black holes don’t just metaphorically “process information.” They literally do. Their entropy counts real physical configurations. Their surface area limits how much information they can store. Throwing something into a black hole erases specific information about what you threw in, but that information isn’t lost—it gets scrambled into the black hole’s many possible quantum states.

Why Forgetting Generates Heat

Let’s connect this back to everyday computing. Your laptop generates heat not mainly because transistors are inefficient, but because computation involves erasing information.

Here’s why: Computation isn’t fundamentally dissipative. Charles Bennett proved you can compute without erasing by making operations reversible—if you keep track of every intermediate step, you can run the computation backward and recover exactly what you started with. Reversible computation can, in principle, use arbitrarily little energy.

But practical computers don’t do this. We constantly erase temporary variables, overwrite memory, discard intermediate results. Every time we erase a bit, we pay Landauer’s price: kT ln 2 of heat. A modern CPU might flip billions of bits per second, each erasure generating heat. The cooling fans in your computer exist partly because erasing information is thermodynamically mandatory.

This is what people miss when they talk about information. They think information is abstract, something separate from the physical world. “Just data,” they say. But data is physical states. Bits are configurations. Information is literally about how many ways you can arrange atoms (or strings, or fields, or whatever) to encode what you know.

Consider what it means to “know” something. Before you measure, a system could be in many possible states. Entropy is high because you’re uncertain—many configurations are consistent with what you know. After you measure, you know more specifically which configuration you have. Entropy is lower because fewer configurations match your knowledge. But that information you gained is now encoded somewhere physically—in your brain, your notebook, your computer. You’ve constrained physical configurations.

When you forget or erase that knowledge, you’re releasing those constraints. The physical system that stored the information can now be in more possible states again. Entropy increases. And that increase must be compensated by heat dissipation to satisfy thermodynamics.

Cybernetics takes this even further. Feedback systems—from thermostats to nervous systems—work by using information to control physical processes. A thermostat measures temperature (acquires information), compares it to a setpoint (processes information), and adjusts heating (uses information to constrain the system’s evolution). The information flow is just as real as the heat flow. You can’t understand how the system behaves without tracking both.

This is why Norbert Wiener, the founder of cybernetics, said the same mathematical framework describes “communication between man and machines, man and man, and between machine and machine.” Information isn’t limited to human knowledge or computer data. It’s a physical quantity that flows through any system, constraining how configurations evolve, and it’s subject to the same fundamental laws as energy and momentum.

The Unity Beneath

So what have we learned? Shannon entropy and Boltzmann entropy aren’t analogies. They’re identical concepts applied at different scales or in different contexts. Both count configurations. Both measure how many microscopic possibilities correspond to what you know macroscopically. Both increase for the same reason: random processes explore available configurations, and there are more ways to be disordered than ordered.

Information theory isn’t “like” thermodynamics. It is thermodynamics. Every time you measure something, you decrease entropy by constraining possibilities. Every time you erase data, you increase entropy by releasing constraints. Every bit you store constrains physical configurations. Every computation you run trades energy for information processing.

This unity goes deep. It’s not just that the math looks similar. It’s that information and physical entropy are two aspects of the same underlying reality. The universe doesn’t distinguish between “information” and “physical states”—those are human categories. At the fundamental level, there are just configurations and constraints. Entropy counts them. Information tracks them.

You can see this in quantum mechanics. The Heisenberg uncertainty principle says you can’t know both position and momentum precisely. Why? Because knowing position more precisely means the particle’s wave function is localized, which means it has many possible momentum components. Information about position and information about momentum are complementary—gaining one means losing the other. The total information is bounded, just like the total entropy.

This is why “information is physical” isn’t just a slogan—it’s a profound truth about reality. There’s no such thing as pure, disembodied information. Information always lives somewhere, encoded in something, constraining some physical degrees of freedom. And wherever information lives, it obeys thermodynamics.

When you learn something new, you’re not just acquiring abstract knowledge. You’re physically reconfiguring your brain—neurons firing, synapses strengthening, molecules moving. That’s information becoming physical. When you forget something, those physical constraints relax. The neural patterns decay. The configurations that encoded your knowledge become accessible to the system again, available for other uses. Information has been erased, entropy has increased, and heat has been generated.

The beautiful thing is how universal this is. Whether we’re talking about steam engines or communication channels, neurons or black holes, computers or thermostats, the same principles apply. Configurations can be counted. Constraints limit possibilities. Entropy measures both disorder and surprise. Information is physical, and physics is informational.

This is what I love about physics: finding that things you thought were different are secretly the same. Shannon and Boltzmann, working on completely different problems separated by decades, discovered the same truth about reality. Information is entropy. Entropy is information. And both are as physical as temperature or pressure or mass.

Once you see this, you can’t unsee it. Every computation, every measurement, every thought involves information flowing, entropy shifting, heat dissipating. The universe isn’t just made of particles and forces—it’s made of bits and constraints, configurations and correlations. And the laws governing them are beautifully, inevitably, the same laws Boltzmann wrote for steam and Shannon wrote for telegraphs.

That’s not a metaphor. That’s just physics.

Source Notes

6 notes from 3 channels