Donald Hobson

Mmath Cambridge. Currently studying postgrad at Edinburgh.


Logical Counterfactuals and Proposition graphs
Assorted Maths


Clarifying “What failure looks like” (part 1)

I think that this depends on how hard the AI's are optimising, and how complicated the objectives are. I think that sufficiently moderate optimization for goals sufficiently close to human values will probably end up well.

I also think that optimisation is likely to end up at the physical limits, unless we know how to program an AI that doesn't want to improve itself, and everyone makes AI's like that. 

Sufficiently moderate AI is just dumb, which is safe. An AI smart enough to stop people producing more AI, yet dumb enough to be safe seems harder.


There is also a question of what "better capturing what humans want" means. A utility function, that when restricted to the space of worlds roughly similar to this one, produces utilities close to the true human utility function, seems easy enough. Suppose we have defined something close to human well being. That definition is in terms of the level of various neurotransmitters near human DNA. Lets suppose this definition would be highly accurate over all history, and would make the right decision over nearly all current political issues. It could still fail completely in a future containing uploaded minds, and neurochemical vats.

Either your approximate utility function needs to be pretty close on all possible futures (even adversarially chosen ones) or you need to know that the AI won't guide the future towards places that the utility functions differ.  

Needed: AI infohazard policy

Suppose you think that both capabilities and alignment behave like abstract quantities, ie real numbers.

And suppose that you think there is a threshold amount of alignment, and a threshold amount of capabilities, making a race to which threshold is reached first. 

If you also assume that the contribution of your research is fairly small, and our uncertainty about the threshold locations is high, 

then we have the heuristic, only publish your research if the ratio between capabilities and alignment that it produces is better than the ratio over all future research.

(note that research on how to make better chips counts as capabilities research in this model)

Another way to think about it is that the problems are created by research. If you don't think that "another new piece of AI research has been produced" is reason to shift probabilities of success up or down, it just moves timelines forward, then the average piece of research is neither good nor bad.

Clarifying “What failure looks like” (part 1)

I think that most easy to measure goals, if optimised hard enough, eventually end up with a universe tiled with molecular smiley faces. Consider the law enforcement AI. There is no sharp line between education programs, and reducing lead pollution, to using nanotech to rewire human brains into perfectly law abiding puppets. For most utility functions that aren't intrinsically conservative, there will be some state of the universe that scores really highly, and is nothing like the present. 

In any "what failure looks like" scenario, at some point you end up with superintelligent stock traiders that want to fill the universe with tiny molecular stock markets, competing with weather predicting AI's that want to freeze the earth to a maximally predictable 0K block of ice.

These AI's are wielding power that could easily wipe out humanity as a side effect. If they fight, humanity will get killed in the crossfire. If they work together, they will tile the universe with some strange mix of many different "molecular smiley faces".

I don't think that you can get an accurate human values function by averaging together many poorly thought out, add hoc functions that were designed to be contingent on specific details of how the world was. (Ie assuming people are broadcasting TV signals, stock market went up iff a particular pattern of electromagnetic waves encoding a picture of a graph going up, and the words "finantial news" exists. Outside the narrow slice of possible worlds with broadcast TV, this AI just wants to grab a giant radio transmittor and transmit a particular stream of nonsense.) 

I think that humans existing is a specific state of the world, something that only happens if an AI is optimising for it. (And an actually good definition of human is hard to specify) Humans having lives we would consider good is even harder to specify. When there are substantially superhuman AI's running around, the value of the atoms exceeds any value we can offer. The AI's could psycologically or nanotechnologically twist us into whatever shape they pleased. We cant meaningfully threaten any of the AI. 

We wont be left even a tiny fraction, we will be really bad at defending our resources, compared to any AI's. Any of the AI's could easily grab all our resources. Also there will be various AI's that care about humans in the wrong way, a cancer curing AI that wants to wipe out humanity to stop us getting cancer. A marketing AI, that wants to fill all human brains with coorporate slogans. (think nanotech brain rewrite to the point of drooling vegetable) 


EDIT: All of the above is talking about the end state of a "get what you measure" failure. There could be a period, possibly decades where humans are still around, but things are going wrong in the way described.

Are there non-AI projects focused on defeating Moloch globally?

If we assume that super-intelligent AI is a thing, you have to engineer a global social system thats stable over milllions of years and where no one makes ASI in that time.

A Brief Chat on World Government

Monopolistic force isn't enough. To be able to enforce, you need to be able to detect the wrongdoers. You need to be able to provide sufficient punishment to motivate people into obedience. Even then, you will still get the odd crazy person breaking the rules.

Some potential rules, like "don't specification game this metric" are practically unenforceable. The soviets didn't manage to make the number of goods on paper equal the amount in reality. It was too hard for the rulers to detect every possible trick that could make the numbers on paper go up.

If Starship works, how much would it cost to create a system of rotable space mirrors that reduces temperatures on earth by 1° C?

Fermi estimate

Cross Sec area of earth= 1.3e15

Proportion needed to cover for 1C temp, 1.3%

Area needed=1.7e13

Assume aluminium foil is used.

Assume that it needs to have 500nm thickness to block light.

Assume most of the mass is ultrathin foil.

So 8.5e6 cubic meters of foil

At 2700 kg/m^3 thats 2.3e10 kg

Making 2.3e11$ at that price tag.

Ie 230 Billion $.

Plus another 41 billion $ for aluminium at 1.8$/kg current prices.

Are there non-AI projects focused on defeating Moloch globally?

Moloch appears at any point when multiple agents have similar levels of power and different goals. Any time you have multiple agents with similar levels of capability and different utility functions, a form of moloch appears. 

With current tech, it would be very hard to give total power to one human. The power would have to be borrowed, in the sense that their power is in setting a Nash equilibria as a shelling point. "Everyone do X and kill anyone who breaks this rule" is a nash equilibria, if everyone else is doing it, you better too. The dictator sets the shelling point by choice of X. The dictator is forced to quash any rebels or loose power. Another moloch.


Given that we have limited control over the preferences of new humans, there is likely to be some differences in utility functions between humans. Humans can die, go mad ect. You need to be able to transfer power to a new human, without having any adverse selection pressure in the choice of which.


One face of moloch is evolution. To stop it, you need to be reseting the gene pool with fresh DNA from long term storage, otherwise, over time the population genome might drift in a direction you don't like. 

We might be able to keep Moloch at a reasonably low damage level, just a sliver of moloch making things not quite as nice as they could be. At least if people know Moloch go out of their way to destroy it.

Are there non-AI projects focused on defeating Moloch globally?

There aren't many other plausible technological options for things that could defeat moloch.

A sufficiently smart and benevolent team of uploaded humans could possibly act as a singleton, in the scenario that one team get mind uploading first, and that the hardware is enough to run uploads really fast.


What I would actually expect in this scenario is a short period of uploads doing AI research followed by a Foom.

But if we suppose that FAI is really difficult, and that the uploads know about this, and about moloch, then they could largely squash moloch at least for a while.

(I am unsure whether or not some subtle moloch like process would sneak back in, but at least the blatently molochy processes would be gone for a while.)

For example, if each copy of a person has any control over which copy is duplicated when more people are needed, then most of the population will have had life experiences that make them want to get copied a lot.

A Brief Chat on World Government

"We shouldn't colonize mars until we have a world government"

But it would take a world government to be able to enact and enforce "don't colonize mars" worldwide.


On the other hand, if an AI Gardner was hard, but not impossible, and we only managed to make one after we had a thriving interstellar empire, then it could still stop the decent into malthusianism.

However if we escape earth before that happens, speed of light limitations will forever fragment us into competing factions impossible to garden.

If we escape earth before ASI, the ASI will still be able to garden the fragments.

Sort of related, I’m not persuaded by the conclusion to his parable. Won’t superintelligent AIs be subject to the same natural selective pressures as any other entity? What happens when our benevolent gardener encounters the expanding sphere of computronium from five galaxies over?

Firstly, if there is a singleton AI, it can use lots of error correction on itself. There is exactly one version of it, and it is far more powerful than anything else around. Whatsmore, the AI is well aware of these sorts of phenomena, and will move to squash any tiny traces of molochyness that it spots.

If humans made multiple AI's, then there is a potential for conflict. However, the AI's are motivated to avoid conflict. They would prefer to merge their resources into a single AI with a combined utility function, but they would prefer to pull a fast one on the other AI even more. I suspect that a fraction of a percent of available resources is spent on double checking and monitoring systems. The rest goes into an average of the utility functions.

If alien AI's meet humanities  AI's, then either we get the value merging, or it turns out to be a lot harder to attack a star system than to defend it, so we get whichever stars we can reach first.

Progress: Fluke or trend?

I think that progress is a trend, and its a strong trend. There is an incentive to invent new things in any kind of competition, because it gives you an advantage. The united nations couldn't pass a bill banning progress. The future will be higher tech. 

Of course, progress towards higher tech is not necessarily a good thing. We can guide progress towards the good outcomes.

I think that there is a lot of powerful tech waiting to be invented. We will see more progress in the next 200 years than the last 200. 

I think that progress is likely to end within 200 years. Because once you have superintelligent AI, anything that can be invented will be invented quickly. After that, material resources grow as a sphere of tech expanding at near light speed.

Load More