The Imperceptible Perturbation: Adversarial Examples as Measurement Failure
I notice something troubling about neural networks’ responses to adversarial examples. A trained image classifier—achieving 95% accuracy on test data—suddenly misclassifies when you add imperceptible noise. The perturbation is smaller than pixel quantization, invisible to human eyes. Yet the network responds with high confidence: “This panda is actually a gibbon.”
This reminds me of early radiation measurements. My electrometer would occasionally register large deflections—seemingly indicating strong emissions. But these were spurious: electrical noise from the building, static discharge, cosmic ray hits. Real radioactive decay produced different signatures—consistent, reproducible, following exponential decay law.
The challenge was learning to distinguish: What constitutes valid measurement? What’s signal, what’s artifact?
Neural networks face the same problem. They respond to perturbations that shouldn’t affect classification—like instruments responding to interference that isn’t radiation.
What the Measurements Reveal
The data reveals networks learned fragile correlations, not robust principles. During training, they found statistical patterns—texture correlations, frequency components, high-dimensional geometry—that discriminate classes on the training distribution. But they didn’t learn invariances: what transformations preserve class identity?
When I isolated radium, I had to verify: Is this radiation really from radium, or contamination? I performed systematic tests—spectral analysis, half-life measurement, chemical separation. Each test eliminated alternative explanations. I didn’t just measure radiation; I measured what the radiation revealed about atomic identity.
Networks don’t do this. They measure correlations but don’t verify robustness. An adversarial perturbation preserves human-relevant features—shape, texture, semantics—while corrupting the high-frequency patterns networks actually learned to rely upon.
My early radium measurements were accurate for pure samples. But pitchblende—mixed ore—gave anomalous readings requiring careful interpretation. Networks are accurate on clean test data but fail on these “contaminated” adversarial samples.
The Measurement Precision Problem
The deeper problem is measurement precision. In physics, measurement has error bars: 1.23 ± 0.05. This quantifies uncertainty. Networks output softmax probabilities—seeming to provide confidence estimates. But these are calibrated to training distribution, not absolute certainty.
When I measured radium’s atomic weight, I reported: 226 ± 2 atomic mass units. The error reflected measurement uncertainty—impurities in sample, instrument precision limits. I knew these bounds and reported them.
Networks don’t know their error bounds. They output “99.8% confident: gibbon” for adversarially perturbed panda—not recognizing the input is outside their valid measurement domain.
Scientific instruments have specified operating ranges. Thermometers work from -50°C to 300°C; outside that, readings are invalid. Networks need similar concepts: input validity checks, out-of-distribution detection, confidence calibration that reflects true uncertainty.
Implications
These observations suggest networks haven’t learned robust measurement principles. They’ve learned correlations sufficient for in-distribution accuracy but fragile to perturbations.
The solution is not just adversarial training—adding corrupted examples to training data. That’s like recalibrating instruments after seeing artifacts. The deeper need is learning invariances: what properties define class membership, what variations are irrelevant.
In radiation work, I learned: extraordinary claims require extraordinary evidence. Claiming to detect radium demanded rigorous verification through multiple independent methods—spectral lines, half-life, chemical properties all had to align.
Neural networks make extraordinary claims—“I can see, understand language”—based on pattern matching. Adversarial examples reveal: the verification isn’t yet rigorous. The patterns they’ve learned are fragile, responding to perturbations that shouldn’t matter.
I spent my career distinguishing signal from noise, learning what constitutes valid measurement versus artifact. Networks must learn the same discipline—not just what patterns appear in training data, but what patterns should matter, and when to admit uncertainty because the input lies outside their measurement range.
Source Notes
6 notes from 3 channels
Source Notes
6 notes from 3 channels