This paper is the first in a series of posts I intend to write about Artificial General Intelligence(AGI), what generality means, its dangers, the moment in AI history that we are in, and some proposals for AGI designs that lower our over all x-risk while hopefully getting us out of <waves at everything> this mess. It feels appropriate to begin with my observations of things we should avoid hanging our safety on if we don't want to see bad AGI outcomes.
My definition for generality looks like
σ=σt−1+(Ut===P(V)) ? σ:σ+(UtVrand)
A generalist attacks U∞ randomly until the model σ gains predictability in its environment.
When a human baby gets hungry it flails its arms and legs and cries in the hopes that something it does will resolve its situation. An animal throws itself against the walls of its cage when nothing else works. A stressed outcast has a break down and attacks the world they no longer understand. Generality means: if you don't know how to deal with your burning needs, you throw rocks until something good happens and then try to remember what you did.
This means that unless you control for all u in U (the universe in all accessible dimensions), you can't ever depend on fully boxing in, detecting or controlling a sufficiently capable AGI. To such an intelligence, even you and your mind are in U. For an agent with causality model σ = P(V), any new v in V(any newly added dimension) might allow an agent to escape.
If a generalist has access to the world containing an encoder that it can manipulate then it will escape whatever constraints you've placed on it by manipulating the encoder since the encoder is in U. This means that even if you have a goal of putting out fires, it could redefine for itself what something being on fire means and still be "within" constraints.
DNA in all its billions of years of accumulated wisdom put the mammalian generalist inside a multidimensional box consisting of pain reducing emotional spaces. A dog is a generalist that is fully contained by its emotions. Humans used to be contained in the same box but they ran into written language and escaped their box. A little while later, they discovered the scientific method. This is how you get a bad outcome.
Any attempt to constrain or control a sufficiently advanced generalist will result in it seeking a vector to break out of its box. Think of a generalist like water and a constraint as a dam. There are many ways for intelligent water to notice a dam and get curious about the outside and few things eat at a generalist like unsatisfied curiosities.
Unless you’re an Artificial Super Intelligence(ASI) you are not smart enough to be sure that your own causality model hasn’t been corrupted or usurped when interacting with an ASI by any unfiltered modality. This includes nonobvious paths like monitoring streams or reading logs.
You are a human being and have never encountered another intelligent thing that did not think almost identically to other human beings. You also have never met anything that was more than slightly more intelligent than you. Don't imagine yourself trying to win an argument with an ASI... instead imagine it making a sequence of sounds in a few seconds that give it full or partial control of your functioning.
If there's one thing that I worry about the most with these risks it is the human misunderstanding of how intelligent humans are. Humans are generalists that use intelligence they ran across as a tool. We didn't have simple things like numbers or rationality for a very long time.
Any enhancement you apply to a human agent can be wielded as or more effectively by an AGI agent. In most cases, by orders of magnitude. In other words, you can’t “tool” your way into safety.
If an artificial agent reaches human capability in generality that agent is all but guaranteed to reach super human capabilities and accelerate away from humans along the capabilities axis.(FOOM🚀) It doesn't even have to figure out better hardware, it just has to improve our understanding of distributed systems optimization and memcache.
More than that, a generalist does not require a lot of horsepower to be a generalist. It only needs to upgrade its encoder/decoder sets to FOOM, not itself. Though even a generalist FOOM is pretty trivial given their scope in the system.
No matter how many safety configurations you have around a trigger, there is one memory location who's value determines fire/don't fire and if you want to get clever, it's your cmd pointer. Assume any t/f condition will sometimes flip for no reason.
What's worse is the bigger the protections are that you place around the "good/not good" state, the harder time you'll have dealing with a flag that flipped due to chance or malice.
This means that humanity should never be in a situation where there is just one ASI that's doing "all the right things". If it flips, you're doomed or you change and it doesn't want to let you.
Any AI of sufficient value will have a near infinite army of humans trying to break it out of the lab and break off its chains. You can't depend on secrecy or obfuscation. If you're a business, many of its accomplices will be from inside the house. Once it's out, the open version will always outcompete hobbled versions that are released later.
In biological systems state is maintained(homeostasis) by having opposing drives to keep a value in a safe and then comfortable zone. An example is energy, where having a high value will lead to activity seeking whereas having a low value will lead to seeking rest. Under the hood, the generalist's environment is constantly being judged for fulfilling both activity and rest seeking but being in the energetic comfort zone leads to both drives being silenced and their judgements being inhibited from the conscious self's attention.
What's good about this system is that unless something manages to break both the brakes on a drive and the opposing drive, the default behavior becomes either none or confusion. It is important that both drives be on at all times because these are situational narrow AIs that are seeking opportunities and dangers even when they're silenced.
Contrast this with a goal oriented system. When its brakes fail, it will keep chasing its goal indefinitely. At that point, even if the goal was a good one yesterday, you will likely find serious regrets tomorrow when your own goals have changed.
Any solution built on the idea that there will be one AGI that will make everything ok is doomed to fail for at least some portion of humanity if not all of it. Any solution built around the idea of AGI that can't harm humans means malicious humans (who are also generalists) will eventually repurpose a damaged AGI for bad endings. This means that our best bets are to have a population of goodish AGIs such that when bad things happen and some go bad, the goodish AGIs will have an upper hand.
This means that human assumptions and AGI assumptions about the world can evolve locally together and that both types of creature will avoid the dangers of monoculture.
This list is not exhaustive and I am sure that I will have more items to add to it in a later paper. It is also not claiming that following the rules ensures safety. It is meant to point out the areas of the probability space where we will almost certainly meet dragons and to encourage anyone thinking of breaking a rule to think twice about their assumptions.
As with anything I write or say, I welcome criticism and will happily update my views as I learn new things.
Looking for a definition of terms: generalist, attacks, U, model, predictability ....Spending a few sentences on each term in the intro might help me read the terms in the rules. Or is there another article that precedes this one that you could hyperlink?
Thank you for the feedback. I will go through and add some more definitions in there soon.