The Bit and the Neuron: Information Theory of AI

Claude Shannon Clarifying technology
InformationTheory Entropy Superposition Sparsity Compression
Outline

The Fundamental Problem of Transmission

The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. When we observe the modern neural network, we are not witnessing a mystical emergence of consciousness, but a rigorous exercise in channel capacity optimization. The “black box” is merely a high-dimensional channel, and its confusing behaviors—polysemanticity, superposition, the “dark matter” of unextracted features—are not flaws. They are the mathematical inevitabilities of transmitting a high-entropy signal through a constrained medium.

The Signal in the Noise

To understand the machine, one must first quantify the uncertainty it resolves. A neural network, in its essence, is a series of transformations designed to reduce the entropy of an input signal until it matches a target distribution. Modern neural architecture theory describes this as a layered structure for pattern recognition, but let us be more precise: it is a cascade of noisy channels. Each layer is a re-encoding step, attempting to preserve the essential information—the signal—while discarding the irrelevant variations—the noise.

The architecture itself is a constraint on bandwidth. We have a finite number of neurons, a finite number of weights. This is our channel capacity. The challenge, then, is how to represent a world of effectively infinite complexity within this finite capacity. The attention mechanism demonstrates one solution. By making word meanings context-dependent, the network dynamically reallocates its bandwidth. It does not waste capacity encoding the static definition of “bank”; it encodes the specific “bank” relevant to the “river” or the “money.” This is dynamic signal amplification, a concept echoed in cybernetic principles, where systems prioritize signals by amplifying the significant and attenuating the noise. Attention is simply a mechanism for maximizing the information rate of the relevant context.

However, even with attention, the dimensionality of the “concepts” a model must learn far exceeds the number of neurons available to represent them. This is where the true elegance of the system reveals itself, and where human intuition often fails. We expect a one-to-one mapping: one neuron, one concept. We expect the signal to be clean. But efficient coding theory tells us that in a capacity-constrained system, a one-to-one mapping is rarely optimal.

Compression & Superposition

The field of mechanistic interpretability seeks to open this black box, to reverse-engineer the encoding scheme. Researchers are often baffled by what they find. They discover polysemanticity: single neurons that fire for seemingly unrelated concepts—a neuron that activates for both “skeptical behavior” and “capital letters in acronyms.” To the human observer, this looks like a mess. To an information theorist, this looks like efficient compression.

If you have NN neurons but need to represent MM features, where MNM \gg N, you cannot assign one neuron per feature. You must compress. The superposition hypothesis proposes that the network solves this by encoding concepts as linear combinations of neurons. It utilizes the high-dimensional geometry of the vector space to pack many “near-orthogonal” concept vectors into a lower-dimensional neuron space. This is not an error; it is a solution to the channel coding problem. The network is exploiting the fact that in high-dimensional space, there are exponentially many almost-orthogonal directions. It can transmit multiple signals simultaneously over the same physical wires (neurons) without them interfering destructively, provided the receiver (the next layer) knows the codebook.

Polysemanticity, therefore, is simply the observation of this compression artifact. A single neuron is not a “concept detector”; it is a basis vector in a compressed representation. When we look at it in isolation, we see a projection of multiple high-dimensional features onto a single axis. Of course it looks polysemantic. We are listening to a multiplexed signal without a demultiplexer.

This is why sparse autoencoders are the correct mathematical instrument for analysis. They act as the decoder. By expanding the dimensionality from the constrained neuron space (e.g., 2,304 neurons) to a much larger feature space (e.g., 16,384 features) and enforcing sparsity, we are effectively reversing the compression. We are asking, “What were the original sparse signals that, when linearly combined, produced this dense activation pattern?” The sparse autoencoder disentangles the superposition, recovering the individual features from the interference pattern. It restores the redundancy that the network optimized away for the sake of transmission efficiency.

The network has learned to perform what we might call “semantic source coding.” It identifies the statistical regularities in the data (the “concepts”) and maps them onto the available hardware in a way that minimizes the loss of information, even if that means no single component of the hardware represents a single human-interpretable concept. The “mystery” of superposition is merely the difference between the logical architecture of the information (the features) and the physical architecture of the storage (the neurons).

The Limit

Yet, we must confront the limits of this reconstruction. Recent research warns us of the “dark matter” of AI—the vast majority of features that remain unextracted. Even with our best sparse autoencoders, we capture perhaps less than 1% of the concepts the model possesses.

This, too, is predicted by theory. Information that is too compressed, or encoded in dependencies too complex for our current linear decoders (the sparse autoencoders), will remain indistinguishable from noise. There is a thermodynamic cost to intelligibility. To make the internal state of the machine fully understandable to humans, we must expand it back out to its full, uncompressed dimensionality. The “dark matter” is not magic; it is information encoded in a way that our current decoding keys—our sparse autoencoders—cannot yet resolve. It is the high-frequency signal that our sampling rate misses.

We are observing a system that has optimized itself for transmission, not for inspection. It cares only about the fidelity of the output, not the legibility of the intermediate states. As we strive to interpret these systems, we are essentially trying to build a perfect wiretap on a channel that was designed to be efficient, not transparent. We can decode the signal, but we must respect the mathematics of the encoding. The bit does not lie, but it does not always speak our language.

Source Notes

8 notes from 3 channels