Imitative Reinforcement Learning as an AGI Approach

by TIMUR ZEKI VURAL 1 min read21st May 20181 comment

4


I've been thinking that reinforcement-learning-driven imitation between agents may be an explanation of human intelligence, and is worth exploring more as an approach to AGI.

It's difficult to get agents to exhibit the complex behaviors humans do with most optimization functions, like "acquire food". But rewarding agents for imitating each other in addition to satisfying basic needs is an efficient method of building up complex survival strategies. Babies, for example, aren't aware how learning to speak will benefit them years later when they start doing it - they just observe adults talking a lot and find that they can get a dopamine hit from talking in response. The explanation for why humans have been so much more successful than chimpanzees is mostly that our dopamine system rewards imitation more strongly, not that we're better at discovering new things directly from the environment. After all, it would seem strange if a little bit of extra optimization of an already highly evolved system in the time between humans and chimpanzees resulted in a gigantic gap in the success of each species. Instead, two existing systems linked together - social behavior and the dopamine system.

Current machine learning focuses mostly on learning directly from an environment, and has found success in domains where humans do the same, like image processing, while lagging severely in areas like natural language, where humans learn imitatively. Deepmind was the pioneer in demonstrating that deep sensory neural nets and dopamine-like reinforcement learning could be combined, in their case to solve Atari games. However, Atari games only have a couple agents, so there was no incentive to use imitation learning.

It'a obvious that computers would be incentivized to learn imitatively if they want to tap into our existing knowledgebase rather than rediscover everything from scratch. But even if they do want to rediscover everything from scratch, or operate in a different environment than the real world, they still may be incentivized to run many agents that learn from each other. The strongest reason is that parallelism is where all the improvements in processors will happen in the future - circuits can't get any denser, but they can in theory get almost arbitrarily cheap. The secondary reason is that in the presence of high context switching costs, many slow agents that collaborate can be more efficient than one fast agent.

4