Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Steven Byrnes called for a brain-like AGI research agenda, and three guys from the Hamburg EA community listened.

We are excited about Steven's five-star program 15.2.1.2 "Reverse-engineer human social instincts," and kicked off work a few weeks ago in June. We familiarized ourselves with Steven's brain-like AGI framework, and meet weekly now. 

This post is an announcement and a request for feedback and collaboration.

Why us?

We have a great skill fit: 

  • A professional data scientist with the required machine learning experience to implement RL agents,
  • A professional Python developer with game-programming background to implement the world mode, and visualization.
  • A seasoned software engineer with startup CTO experience who takes care of everything else, e.g., this blog post (me).

What have we done so far?

We have already implemented a toy world and simple RL agent in a first iteration of Steven's framework. We build on top of the Python framework PettingZoo. Our code is in a private Github repo that we believe should stay private given the potential impact. Looking for thoughts on this. 

We have collected a list of more than 60 candidate instincts from neuroscience and other sources that we can implement and experiments with.

The project website will be here: https://www.aintelope.net/ (effectively empty right now).

The project and our progress so far were presented at the Human-aligned AI Summer School in Prague on August 5th, where we got feedback about the project, brain-like AGI in general, and found the three participants who wanted to collaborate.

What do we want to do?

Implementing models and running tests is a proven way to test theories and check our understanding of the models. Better quick success/failure on something where it is too easy to build a big theory

Specifically, we want to:

  • Show that a relatively simple set of instincts can shape complex behaviors of a single agent.
  • Show whether the instincts lead to significantly reduced training time compared to agents without such instincts. 
  • Extend the simulation to groups of agents. 
  • Show that prosocial behavior can be shaped with few instincts.
  • See if we can get Ersatz Interpretability working.

In the ideal case, the simulated agents show behavior consistent with having values like altruism or honesty.

Immediate next steps:

How can you help?

  • Please give us feedback.
  • Review what we already have (requires an invite, please request).
  • Help with funding and growth of the project (I am applying to LTFF).
     
New Comment
2 comments, sorted by Click to highlight new comments since: Today at 5:34 AM

Good luck!! :)

Excited and happy that you are moving forward with this project. It's great to know that more paths to alignment are being actively investigated.