Jürgen Schmidhuber
Table of Contents
He claimed he already had the main ideas of Yann LeCun's position paper A Path Towards Autonomous Machine Intelligence.
1. Artificial Curiosity Since 1990-91
Problem of Coming up with good question:
There are two important things in science:
- Finding answers to given questions,
- Coming up with good questions
But how to implement the creative part (2) in artificial systems?
- through reinforcement learning (RL)
- gradient-based artificial neural networks (NNs),
- or other machine learning methods ?
Through Generative Adversarial Networks:*
The first NN is the controller C. C probabilistically generates outputs that may influence an environment. The second NN is the world model M. It predicts the environmental reactions to C's outputs. Using gradient descent, M minimizes its error, thus becoming a better predictor. But in a zero sum game, the reward-maximizing C tries to find sequences of output actions that maximize the error of M. Thus M's loss is C's gain.
The term generative adversarial networks (GANs) is actually a new name for an instance of the principle published in 1990.
Curiosity Through Maximizing Learning Progress:
An agent controlled by C might get stuck in front of a TV screen showing highly unpredictable white noise. The refore, in stochastic environments, C's reward should not be the errors of M, but an approximation of the first derivative of M's errors across subsequent training iterations.
C's reward should be M's learning progress or improvements. As a consequence, despite M's high errors in front of the noisy TV above, C won't get rewarded for getting stuck there,
simply because M's errors won't improve: both the totally predictable and the fundamentally unpredictable will get boring.
- RL for maximizing information Gain or Baysian Surpise:
- Adversarial agents design surprising computational experiments:
Maximizing compression progress like scientists and artists do:
I have frequently pointed out that the history of science is a history of data compression progress through incremental discovery of simple laws that govern seemingly complex observation sequences.
Read here how my Formal Theory of Fun uses the concept of compression progress to explain not only science but also art, music, and humor. Take humor for example.
Consider the following statement: Biological organisms are driven by the "Four Big F's": Feeding, Fighting, Fleeing, Mating.Some subjective observers who read this for the first time think it is funny. Why? The punch line after the last comma is unexpected for those who expected another "F." Initially this failed expectation results in sub-optimal
data compression—storage of expected events does not cost anything, but deviations from predictions require extra bits to encode them. The compressor, however, does not stay the same forever: within a short time interval, its learning algorithm kicks in and improves it's performance on the data seen so far. The number of saved bits (or a similar measure of learning progress) becomes the observer's intrinsic reward, possibly strong enough to motivate her to read on in search for more reward through additional yet unknown patterns.
While previous attempts at explaining humor also focused on the element of surprise, they lacked the essential concept of novel pattern detection measured by compression progress due to learning. This progress is zero whenever the unexpected is just random white noise, and thus no fun at all
Does curiosity distort the basic RL problem?
The controller/model systems above (aka CM systems) typically maximize the sum of standard external rewards (for achieving user-given goals) and intrinsic curiosity rewards. Does this distort the basic RL problem? It turns out not so much. In totally learnable environments, in the long run, the intrinsic reward even vanishes next to the external reward.