RL can be used to train Non-Differentiable Models

In The Unreasonable Effectiveness of Recurrent Neural Networks - Andrej Karpathy, the author writes:

My personal favorite RNNs in Computer Vision paper is Recurrent Models of Visual Attention, both due to its high-level direction (sequential processing of images with glances) and the low-level modeling (REINFORCE learning rule that is a special case of policy gradient methods in Reinforcement Learning, which allows one to train models that perform non-differentiable computation (taking glances around the image in this case)).
The paper Recurrent Models of Visual Attention, uses Reinforcement Learning to train an non-differential model for vision task like the MNIST digit classification. A sliding window of image region to attend to is moved by using RNN and then after \(T\) steps the digit is predicted. Attention part is trained by RL, and the prediction part is trained by traditional backpropagation.
The paper Reinforcement Learning Neural Turing Machines improves upon Neural Turing Machines by using a discrete memory access instead of continuous one. The control of memory location to access and write to is trained by RL.