Curved Thought: Riemann Curvature and Loss Landscapes
Riemann taught us something profound: curvature reveals itself through motion. Move a vector around a closed loop on a curved surface, keeping it parallel to itself at each step, and you’ll find it returns rotated. On flat paper, this never happens. On a sphere, it happens always. This path-dependence—that getting from here to there depends not just on endpoints but on the journey itself—became the signature of curved geometry.
When I used Riemann’s mathematics to describe gravity, the physical interpretation was beautiful in its simplicity. Parallel geodesics converge not because forces pull them together, but because spacetime itself curves. Two apples falling side by side approach each other as they fall—not from mutual attraction, but because they’re each following the straightest possible paths through geometry that happens to be curved by Earth’s mass.
Geodesics in Parameter Space
Now imagine: a neural network with millions of parameters defines a loss landscape with millions of dimensions plus one—the loss dimension measuring how badly the network performs. Gradient descent navigates this landscape by following the steepest downward slope at each point, taking iterative steps toward lower loss.
This is geodesic motion. Not geodesics through spacetime, but through parameter space. The gradient points in the direction of steepest descent—the locally straightest path downward, given the landscape’s geometry at that precise location.
And here’s where Riemann’s insight becomes urgent: initialization sensitivity reveals that this landscape is curved. Start gradient descent from two nearby points in parameter space, and the paths diverge dramatically. One initialization finds an excellent solution; another, separated by a tiny perturbation, gets stuck in a poor configuration where gradients vanish and learning halts.
This is geodesic divergence. In positively curved spacetime, parallel paths converge—we call it gravity. In the curved geometry of loss landscapes, nearby starting points lead to different destinations entirely. The path-dependence isn’t a bug; it’s the geometric signature of curvature itself.
Parallel Transport and Learning
Could we detect loss landscape curvature the same way Riemann detected spacetime curvature—through parallel transport? Move a gradient vector around a closed loop in parameter space. Does it return rotated? If so, the landscape is curved at that location. If every loop preserves the gradient’s orientation, the loss landscape is locally flat—a convex region where all paths lead to the same minimum.
The Riemann curvature tensor contains 256 components in four-dimensional spacetime. A neural network with ten million parameters would require a curvature tensor with components beyond astronomical. Yet both geometries—one governing falling apples, one governing learning algorithms—obey the same mathematical principles.
Does this mean something deep? When spacetime curvature revealed gravity’s nature, it unified celestial mechanics with the geometry of space itself. Perhaps loss landscape curvature reveals learning’s nature—not as optimization along predetermined paths, but as geometric navigation through curved possibility spaces where destinations depend fundamentally on departures.
The question isn’t whether we can compute the full curvature tensor—we can’t, in either case. The question is whether thinking geometrically about learning landscapes will reveal principles as fundamental as those Riemann’s geometry revealed about gravity. I suspect it might.
Source Notes
6 notes from 2 channels
Source Notes
6 notes from 2 channels