Vectors on Curved Spaces: Parallel Transport and Computation
Consider parallel transport on curved surfaces—a vector moved along a path while maintained locally parallel to itself. On flat surfaces, the result is path-independent: transport a vector up then right, or right then up, and you arrive at the same orientation. But curvature breaks this invariance. On a sphere, parallel transport around a closed loop rotates the vector. The local rule—keep parallel at each infinitesimal step—produces global path-dependence when executed on curved geometry.
This reveals something profound about computation on manifolds: local update rules accumulate global effects through geometry itself.
Correction Terms for Curved Parameter Spaces
Christoffel symbols encode the correction terms required to maintain parallelism as basis vectors themselves rotate with the coordinate system. They bridge the metric tensor—which specifies distances and angles—to the derivatives that govern how vectors change. Without these correction terms, differentiation on curved surfaces would produce nonsense. With them, we can rigorously track how vectors evolve along trajectories through curved spaces.
Now consider gradient descent through neural network parameter space. At each step, backpropagation computes local gradients—partial derivatives indicating which direction decreases loss. Parameters update by moving opposite the gradient vector. But loss landscapes are curved, potentially highly so. Does gradient computed at one point remain meaningful after the step? Are we implicitly assuming flatness when we treat gradients as globally valid descent directions?
The training dynamics reveal path-dependence. Networks don’t converge to arbitrary local minima—they follow structured trajectories through parameter space, first capturing coarse structure, then progressively refining details. Early updates establish basic decision boundaries; later updates sculpt intricate patterns. This progression suggests the loss landscape guides gradient descent along particular paths, much like geodesics emerge naturally on curved surfaces.
Geodesics Through Loss and Criticality
Geodesics are curves where the tangent vector parallel-transports itself—straight lines generalized to curved spaces. Initially parallel geodesics converge or diverge depending on curvature sign, which Einstein recognized as gravitational attraction: objects in free fall follow converging geodesics without any force acting between them.
Could neural network training follow geodesics through parameter space? If loss landscapes possess intrinsic geometry—curvature from parameter interactions—then gradient descent might approximate geodesic flow. Momentum methods, which accumulate velocity from past gradients, could serve as correction terms that account for landscape curvature, preventing the myopic updates that pure local gradients would dictate.
Information transmission through neural layers parallels vector transport through geometric spaces. Critical branching ratios maximize information transfer—subcritical networks lose signal strength, supercritical networks saturate, but critical networks preserve input patterns through layers of transformation. This optimal regime resembles geodesic motion: neither amplifying nor diminishing, maintaining structure through curved transformation spaces.
The mathematical structure appears universal: local rules operating on curved manifolds produce path-dependent outcomes that reflect global geometric properties. In cellular automata, simple update rules generate complex global computation. In parallel transport, local parallelism constraints accumulate rotation around closed loops. In neural training, local gradient steps traverse structured paths through parameter space.
Computational Geometry as Fundamental Architecture
Perhaps learning is fundamentally a parallel transport problem—moving knowledge vectors through representational manifolds while preserving their essential structure. The Christoffel symbols of cognition would be the architectural constraints and inductive biases that correct for curvature in neural geometry. Training dynamics would trace geodesics through loss landscapes, naturally finding paths that balance rapid convergence with detailed refinement.
This perspective transforms optimization from blind search into geometric navigation—a computational process fundamentally shaped by the manifold it operates upon.
Source Notes
6 notes from 3 channels
Source Notes
6 notes from 3 channels