The Relativity of Features: Universal and Task-Specific Representations
The Feature Hierarchy
There is something familiar about the way neural networks organize their learned representations—something that reminds me of special relativity. In physics, measurements are not absolute but depend on one’s reference frame. What one observer measures as time, another in motion measures differently. Yet certain quantities—the speed of light, the spacetime interval—remain invariant across all frames.
I observe a parallel structure in how networks learn. When we train a classifier on natural images, early layers learn edges, textures, simple patterns—features that transfer readily across tasks. These early representations are universal, like the speed of light: the same in all reference frames, independent of the particular task.
Later layers tell a different story. Here the network learns task-specific features—parts of cat faces, wheel patterns, building corners. These are relative, like velocity measurements, depending on the task frame of reference. Transfer one of these layers to medical imaging, and it must be retrained. Its features were meaningful only relative to the original task.
Invariant and Relative
This distinction reveals something fundamental about how learning systems organize information. In relativity, we distinguish absolute from relative through coordinate transformations. Change reference frames, and some quantities transform (time intervals, spatial distances) while others remain invariant (the spacetime interval, physical laws).
Transfer learning performs an analogous operation. We identify which features are invariant—transferable because they capture universal statistical structure—and which are frame-dependent, requiring transformation when we change task coordinates. Grid cells show this pattern. Their toroidal manifold structure persists across environments, across exploration and sleep, even when individual firing fields shift. The manifold geometry is invariant; the coordinate mapping to physical space is relative.
The Tolman-Eichenbaum Machine makes this explicit: structural knowledge separates from sensory details. The structural representation transfers; the sensory mapping must adapt. This is coordinate transformation between task frames.
Why does this work? Because natural images possess intrinsic structure. Edges emerge from object boundaries—a statistical regularity across domains. Textures arise from material properties. Early layers discover manifold geometry: low-dimensional structure latent in high-dimensional data. This geometric structure is absolute, task-independent. Later layers learn coordinates on this manifold. Different tasks require different coordinate systems, but the underlying manifold remains.
Geometry is Absolute
This brings me to the conservation law of transfer learning. In relativity, Lorentz transformations preserve the spacetime interval—a geometric quantity invariant under coordinate changes. In transfer learning, what is conserved?
Perhaps the manifold geometry itself. Early layers discover that high-dimensional image data actually lives on a lower-dimensional manifold. This geometric insight—that natural images occupy a restricted subspace constrained by physical regularities—transfers across tasks because it reflects objective structure in the world, not our particular purpose.
Later layers learn task-specific coordinates on this manifold. Classification requires different coordinates than segmentation, just as different observers choose different space and time axes. But the manifold itself, like the spacetime interval, remains absolute. Change coordinates, not geometry.
Could we formalize this? Identify explicitly which transformations are geometric—preserving manifold structure—versus coordinatized, requiring adaptation? The brain solved this through hierarchy: grid cells maintaining environment-invariant manifolds while place cells provide context-specific coordinates. Perhaps our artificial systems rediscover the same principle.
What arrests my attention is the elegance. Both physics and learning face the same problem: separating universal structure from contingent reference frames. Both solve it through the same insight: geometry is absolute, coordinates are conventional. Features are not intrinsic properties but perspectives on underlying structure—and recognizing which perspectives transfer is key to generalization.
Whenever we learn from data, we discover manifold structure and choose coordinates on it. Understanding which aspects are geometric and which coordinatized is understanding what transfers. It is the relativity principle, applied to learning itself.
Source Notes
7 notes from 1 channel
Source Notes
7 notes from 1 channel