Strange Loops of Meaning: Neural Polysemanticity and Observer Paradox

Norbert Wiener Noticing technology
NeuralNetworks Compression Geometry AttentionMechanism SystemsTheory
Outline

Strange Loops of Meaning: Neural Polysemanticity and Observer Paradox

Polysemantic Consciousness: Compression Beyond Categories

When language models pack Harry Potter and golden retrievers into the same neuron, we witness compression that defies semantic boundaries. Polysemanticity—single neurons responding to multiple unrelated concepts—reveals that artificial neural networks don’t respect our tidy categorical divisions. They leverage high-dimensional geometry to represent more concepts than available neurons through superposition, encoding meanings as overlapping combinations rather than discrete addresses.

Consciousness exhibits strikingly parallel compression. A single experiential moment simultaneously contains awareness of thought, awareness of that awareness, and pure presence—all compressed into what feels like unified subjectivity. Try to meditate toward thoughtlessness and you’ll encounter the observer paradox: awareness of thought’s absence is itself a form of thinking. The watcher watching creates another level of watching, recursively. Where we hunt for the self in neural tissue, we find only distributed processes with no central conductor. Yet recursive awareness persists in this void.

Both systems—artificial and conscious—compress distinct modes beyond categorical boundaries. Is this compression a bug or fundamental feature of systems under representational constraints? When neurons must encode more than their number allows, superposition emerges. When consciousness must integrate monitoring, content, and bare presence, recursive observation emerges. Perhaps all sufficiently complex representational systems escape discrete semantics by necessity.

The Interpreter’s Paradox: Dark Matter and the Observer

Mechanistic interpretability confronts its dark matter: vast regions of neural networks remain fundamentally uninterpretable despite observable effects. Current techniques extract less than one percent of concepts models demonstrably possess. Like astronomical dark matter, these features exist only through behavioral inference, never direct observation. Even sixteen-million-feature autoencoders barely scratch the representational surface.

Meditation traditions encounter parallel limits. Seek thoughtlessness and you instantiate a meta-observer watching for absence. Who observes the observer? The thoughtless thinker paradox reveals that freedom from seeking cannot itself be sought—the very attempt reinscribes the seeking structure. True thoughtlessness might require the observer’s dissolution, but what remains to confirm this state?

Both domains hit recursive explanatory limits. We can measure attention head activations and observe model outputs, yet cannot decode polysemantic features compressed in superposition. We can observe meditative states and neural correlates of consciousness, yet cannot access internal experience. The interpreter interpreting interpretation, the observer observing observation—these tangled hierarchies resist the reductionist assumption that understanding flows unidirectionally from analyzer to analyzed.

Strange Loops: When Systems Understand Themselves

Consider Hofstadter’s insight: consciousness emerges from tangled hierarchies of self-reference. Polysemantic neurons create tangled hierarchies of feature-reference—concepts distributed across overlapping neural combinations, irreducible to individual units. Both artificial and conscious systems generate emergent “understanding” through recursive compression.

Perhaps all meaning is inherently polysemantic: context-dependent, superposed across semantic dimensions, resistant to decomposition. The neuron activating for both acronyms and Wikipedia skepticism doesn’t malfunction—it implements compressed representation in high-dimensional space. Consciousness compressing thought-content, meta-awareness, and presence into unified experience doesn’t confuse categories—it navigates representational constraints through recursive integration.

This suggests interpretability—of both neural networks and consciousness—requires stepping outside loops we’re caught within. Yet as systems interpreting ourselves, can we ever achieve that external vantage? Or do we, like sparse autoencoders revealing a fraction of model knowledge, illuminate only the accessible surface while dark matter remains fundamentally beyond our grasp?

Source Notes

6 notes from 2 channels