Training Recurrent Nets is Optimization Over Programs

If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs.

RNNs combine the input vector with their state vector with a fixed (but learned) function to produce a new state vector. This can in programming terms be interpreted as running a fixed program with certain inputs and some internal variables. Viewed this way, RNNs essentially describe programs.

In fact, it is known that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights).

But similar to universal approximation theorems for neural nets you shouldn’t read too much into this. In fact, forget I said anything.

[from The Unreasonable Effectiveness of Recurrent Neural Networks - Andrej Karpathy]