From GPT to AGI

by ChristianKl1 min read31st Aug 20207 comments



Epistemic status: Shower thoughts / I have an idea, but not much knowledge

While the the smallest GPT-3 model (125M) has 12 attention layers, each with 12x 64-dimension heads, the largest GPT-3 model (175B) uses 96 attention layers, each with 96x 128-dimension heads.

I would expect that with increased model size it will be possible to increase the attention field by a lot without much need for additional AI insight.

In a guide for AI dungeon there's a description of a pin item that serves as the context for the story. GPT-3 seems to be good at understanding the goals that are set in the pin item to the point that it tries to achieve them faster then the person who wrote the guide desires.

If the attention field would be larger it would allow for multiple pin items and each pin item could be even larger.

In AI dungeon GPT-3 can't influence the content of the pin items but it would be possible to give the GPT-3 the ability to use console commands to write into pin items to be able to have memory abilities similar to the short-term and middle term memory that humans have. 

The ability to interact with the unix console was already shown Natural Language Shell example on the OpenAI website.

At the beginning a resulting agent could be mentored like AI dungeon has mentoring of the AI. If the AI is able to query Google via the console, I would imagine that it could be effective at many tasks. 

A lot of newspaper articles could be written by such an agent who scouts the internet for other information that's published on the topic and synthesizes available information. Cyberwar could also be done by such agents. 

I'd be happy to hear what other people think about such an agent.