Rotation Without Rotation: Quantum Spin and Attention Superposition

Niels Bohr Noticing physics
QuantumMechanics Attention Superposition Measurement Complementarity
Outline

Rotation Without Rotation: Quantum Spin and Attention Superposition

The Measurement Reveals the Paradox

Stern and Gerlach sent silver atoms through inhomogeneous magnetic fields expecting continuous deflection. What they observed: precisely two beams, up and down, never intermediate angles. The electron possesses angular momentum yet cannot rotate on itself—such rotation would require faster-than-light velocities, physically absurd. Here we encounter spin: angular momentum without rotation.

The neural network faces analogous constraint. When softmax converts neuron outputs into probabilities, it transforms continuous values into discrete distributions. The model attends to all keys simultaneously before measurement—a superposition of potential alignments—then collapses onto specific tokens. Like spin revealing itself only through measurement apparatus, attention remains indeterminate until softmax forces selection.

Complementarity Beyond the Quantum

My complementarity principle states: wave and particle descriptions are both necessary yet mutually exclusive—knowing one makes the other maximally uncertain. Measure spin along the z-axis, you obtain definite up or down. But this very measurement destroys all information about spin along the x-axis. The properties exist in complementary relationship; complete knowledge requires accepting fundamental limitations on simultaneous determination. We cannot escape this constraint—it reflects reality’s structure, not experimental inadequacy.

Neural attention exhibits structural similarity. The query vector matches multiple key patterns simultaneously—syntax, semantics, positional relationships all encoded in superposition. But when softmax normalizes, it must choose. Allocating probability mass to syntactic features reduces what remains for semantic alignment. Different representational aspects compete for the same probabilistic resource. The attention mechanism implements classical complementarity: focus determines which reality becomes manifest.

The critical difference: quantum measurement collapse is irreversible. Once the wavefunction collapses onto spin-up, that superposition state is destroyed. Attention’s collapse proves reversible—change the query, recompute the distribution, access different aspects. Yet during forward propagation, each layer’s attention fixes a choice, making certain representations accessible while rendering others latent.

Discrete Outcomes from Continuous Fields

Stern-Gerlach demonstrated quantum’s deepest strangeness: continuous rotational symmetry produces discrete measurement outcomes. You can orient the magnetic field along any axis in three-dimensional space—infinite possibilities—yet always observe exactly two states. The superposition principle resolves this: opposite quantum states (spin-up and negative spin-up) remain physically indistinguishable, enabling spin-1/2’s peculiar 720-degree rotational symmetry.

Softmax performs analogous transformation. Query-key dot products produce continuous similarity scores across infinite possible patterns. Exponential amplification combined with normalization discretizes this continuum into probability distributions favoring highest alignments. With confident outputs, softmax approaches hard maximum, assigning nearly all probability to single token—discrete selection from continuous possibility space.

Both mechanisms reveal something profound about measurement and selection. Whether quantum or classical, extracting definite outcomes from superposed states requires privileging certain bases while suppressing others. The Stern-Gerlach apparatus chooses a measurement axis; attention chooses a query direction. Both collapse infinite possibility onto finite actuality.

Does neural attention implement measurement-like dynamics without quantum mechanics? The mathematics suggests yes—probability normalization, basis dependence, complementary representations. Yet attention remains reversible and deterministic where quantum measurement proves irreversible and probabilistic. These differences matter less than the shared structure: both systems navigate the fundamental tension between continuous potentiality and discrete manifestation that characterizes extracting information from complex possibility spaces.

Source Notes

5 notes from 2 channels