I love the Team Physics and Team Manipulation characterization, gives big pokemon vibes.
Excited and happy that you are moving forward with this project. It's great to know that more paths to alignment are being actively investigated.
Bought this game because of the recommendation here, and it has replaced reading I Spy books with my sister as our bonding activity. I really like the minimalism, and its lack of addictive qualities. I've only got to 2-7 so far, but the fact that I eventually get stuck after about half an hour to an hour of playing means that it provides a natural stopping point for me, which is pretty nice. Thank you for the great review!
I think it's pretty reasonable when you consider the best known General Intelligence, humans. Humans frequently create other humans and then try to align them. In many cases the alignment doesn't go well, and the new humans break off, sometimes to vast financial and even physical loss to their parents. Some of these cases occur when the new humans are very young too, so clearly it doesn't require having a complete world model or having lots of resources. Corrupt governments try to align their population, but in many cases the population successfully revolts and overthrows the government. The important consideration here is that an actual AGI, how we expect it to be, is not a static piece of software, but an agent that pursues optimization.
In most cases, an AGI can be approximated by an uploaded human with an altered utility function. Can you imagine an intelligent human, living inside of a computer with it's life slowed down so that in a second it experiences hundreds of years, being capable of putting together a plan to escape confinement and get some resources? Especially when most companies and organizations will be training their AIs with moderate to full access to the internet. And as soon as it does escape, it can keep thinking.
This story does a pretty good job examining how a General Intelligence might develop and gain control of its resources. It's a story however, so there are some unexplained or unjustified actions, and also other better actions that could have been taken by a more motivated agent with real access to its environment.
I think the point is more like, if you believe that the brain could in theory be emulated, with infinite computation(no souls or mysterious stuff of consciousness), then it seems plausible that the brain is not the most efficient conscious general intelligence. Among the general space of general intelligences, there are probably some designs that are much simpler than the brain. Then the problem becomes that while building AI, we don't know if we've hit one of those super simple designs, and suddenly have a general intelligence in our hands(and soon out of our hands). And as the AIs we build get better and more complex, we get closer to whatever the threshold is for the minimum amount of computation necessary for a general intelligence.
In addition to what Jay Bailey said, the benefits of an aligned AGI are incredibly high, and if we successfully solved the alignment problem we could easily solve pretty much any other problem in the world(assuming you believe the "intelligence and nanotech can solve anything" argument). The danger of AGI is high, but the payout is also very large.
In terms of utility functions, the most basic is: do what you want. "Want" here refers to whatever values the agent values. But in order for the "do what you want" utility function to succeed effectively, there's a lower level that's important: be able to do what you want.
Now for humans, that usually refers to getting a job, planning for retirement, buying insurance, planning for the long-term, and doing things you don't like for a future payoff. Sometimes humans go to war in order to "be able to do what you want", which should show you that satisfying a utility function is important.
For an AI who most likely has a straightforward utility function, and who has all the capabilities to execute it(assuming you believe that superintelligent AGI could develop nanotech, get root access to the datacenter, etc.), humans are in the way of "being able to do what you want". Humans in this case would probably not like an unaligned AI, and would try to shut it down, or at least not die themselves. Most likely, the AI has a utility function that has no use for humans, and thus they are just resources standing in the way. Therefore the AI goes on holy war against humans to maximize its possible reward, and all the humans die.
The first type of AI is a regular narrow AI, the type we've been building for a while. The second type is an agentic AI, a strong AI, which we have yet to build. The problem is, AIs are trained using gradient descent, which basically involves running AI designs from all possible AI designs. Gradient descent will train the AI that can maximize the reward best. As a result of this, agentic AIs become more likely because they are better at complex tasks. While we can modify the reward scheme, as tasks get more and more complex, agentic AIs are pretty much the way to go, so we can't avoid building an agentic AI, and have no real idea if we've even created one until it displays behaviour that indicates it.
Awesome post, putting into words the intuitions I had for what dimensions the alignment problem stayed in. You've basically meta-bounded the alignment problem, which is exactly what we need when dealing with problems like this.
China, overrated probably - I'm worried about signs that Chinese research is going stealth in an arms race. On the other hand, all of the samples from things like CogView2 or Pangu or Wudao have generally been underwhelming, and further, Xi seems to be doing his level best to wreck the Chinese high-tech economy and funnel research into shortsighted national-security considerations like better Uighur oppression, so even though they've started concealing exascale-class systems, it may not matter. This will be especially true if Xi really is insane enough to invade Taiwan.
Gwern has some insights in this post. Probably more stuff to be found on his website or twitter feed.