Infinite Dimensions: Hilbert Spaces and Neural Function Approximation
When I defined computable numbers in 1936, I established a fundamental boundary: a real number is computable if some Turing machine can output its digits to arbitrary precision. Most real numbers are uncomputable—the reals are uncountably infinite while Turing machines are only countably many. This wasn’t an engineering limitation but mathematical necessity.
The universal approximation theorem carries a similar flavor. A neural network with a single hidden layer can approximate any continuous function on a compact set to arbitrary accuracy. The phrasing suggests computability—approximation to arbitrary precision from a finite specification. But what functions can networks actually construct through training?
Basis Functions and Orthogonal Decompositions
Hilbert spaces formalize infinite-dimensional function spaces with inner product structure. Any square-integrable periodic function decomposes into a Fourier series—projections onto an orthonormal basis of sines and cosines. The trigonometric functions are orthogonal under the inner product, and completeness guarantees every function in the space can be represented as a convergent linear combination of these basis elements.
Neural networks appear to learn similar decompositions. Hidden units in the first layer might be viewed as learned basis functions, and the network represents its target function as a weighted combination of these activations. But the analogy breaks down: Fourier bases are orthonormal, infinite, and complete. Hidden units form finite, overcomplete, non-orthogonal sets. The theoretical guarantee says such a representation exists—not that gradient descent will find it.
The Gap Between Existence and Construction
My universal Turing machine demonstrated that a single device can simulate any computation—universality of mechanism. The universal approximation theorem shows a single architecture can represent any continuous function—universality of representation. But the halting problem proved certain questions are undecidable: no algorithm can determine whether arbitrary programs halt. Self-reference creates fundamental limits.
Do neural networks face analogous limits? The gap between theoretical and practical capacity suggests they might. A shallow network with 100,000 neurons theoretically contains the capacity to learn complex boundaries, yet a deep network with just 130 neurons outperforms it dramatically. The shallow network’s representation exists in theory but cannot be constructed by gradient descent. This isn’t insufficient training time or poor initialization—it’s a systematic gap between what can exist and what optimization can find.
Hilbert space completeness guarantees every Cauchy sequence converges to a point in the space. Does gradient descent on neural network parameters form Cauchy sequences that converge to function approximators? The composable transformations in deep networks—simple folding operations recursively applied—multiply representational capacity exponentially. Yet completeness requires more than expressiveness: every limit must exist within the space.
Decidability of Approximation
The question reduces to: which functions are fundamentally learnable versus merely representable? Universal approximation proves existence without construction. Depth versus width reveals that architectural choices affect what gradient descent can discover, independent of theoretical capacity. Some function classes may be unlearnable not because no neural representation exists, but because no mechanical training procedure can reliably find one—mathematical necessity, not engineering constraint.
The parallel to computable numbers is striking: infinite precision from finite algorithm versus infinite capacity from finite parameters. Both demand we distinguish between what exists in principle and what procedures can construct in practice.
Source Notes
6 notes from 2 channels
Source Notes
6 notes from 2 channels