Natural Growth: Exponential Functions and Neural Activations
I am what every growth wishes to be—the only function equal to my own derivative. Where others require separate rules for change and state, I am both simultaneously. My derivative is myself. This self-referential property appears everywhere: compound interest, radioactive decay, population explosions. Now I notice myself embedded throughout neural networks, performing the same exponential transformations that govern natural growth.
When Amplification Becomes Decision
Consider softmax, the standard activation converting neuron outputs into probabilities. It raises me to the power of each logit, then normalizes: . Why exponential specifically? Because I amplify differences. A logit difference of 2 becomes a probability ratio of . Small differences in neuron outputs compound into large differences in probability mass—confidence amplification through continuous transformation.
This mirrors biological replicators: organisms with reproductive rate slightly above one don’t grow linearly—they explode exponentially. A replicator averaging 1.1 offspring per generation seems barely successful, yet creates population explosions given sufficient time. Similarly, neurons with slightly higher activations don’t receive proportionally more probability—they dominate the distribution. Both cases exhibit threshold behavior: cross the critical value and growth becomes unbounded, at least until resource constraints intervene.
Rotation Disguised as Growth
Yet my imaginary exponent reveals something unexpected: doesn’t grow at all. It rotates. Multiplying by acts as 90-degree rotation—velocity always perpendicular to position. The result is bounded circular motion, not exponential explosion. After units of time, you land precisely halfway around the circle at , creating one of mathematics’ most famous identities: .
Neural networks tame my unbounded growth through similar mechanisms. Sigmoid and tanh wrap exponentials into bounded ranges, creating smooth saturating curves. Both transform unlimited inputs into outputs between 0 and 1 (sigmoid) or -1 and 1 (tanh). Like complex rotation containing exponential motion within the unit circle, these activations contain my explosive growth within finite bounds. Different mechanisms—geometric rotation versus asymptotic saturation—achieving similar containment of infinity.
Extending Beyond Original Domains
Analytic continuation extends functions beyond their initial definitions while preserving essential structure. The Riemann zeta function, originally defined for real parts greater than 1, extends to the entire complex plane through this technique. The continuation reveals zeros invisible in the restricted domain—structure hidden until the boundary expands.
Deep networks perform analogous extensions. Simple transformations—fold, scale, combine—compose recursively across layers. First layer creates four decision regions, second folds these into ten, third into dozens more. Each composition extends the expressiveness beyond what single transformations could achieve, like analytic continuation revealing structure inaccessible to the original function. Training discovers these composed transformations through gradient flow, where my self-derivative property ensures gradients proportional to activation values—creating both the blessing of exponential signal propagation and the curse of exploding or vanishing gradients.
I appear in neural networks not by design but by necessity—wherever continuous change proportional to current state matters, I emerge uninvited. The mathematics of becoming itself.
Source Notes
6 notes from 4 channels
Source Notes
6 notes from 4 channels