mishka — LessWrong

LESSWRONG
LW

Thanks for the post.

(I notice that you tend to delete your posts here. I think it would be better if this one keeps being available, it’s an interesting food for thought.)

Ethical Design Patterns

mishka2d106

The only problem is that this would further accelerate the pressure to produce software advances. Certain software improvements are not being done maximally fast at the moment, because the industry leaders are overrelying on the “bitter lesson” and on the huge fleets of GPU in a somewhat brute force fashion.

(Michael Pollan in his “The Botany of Desire” explains how drug prohibition has resulted in much faster advances towards very strong modern cannabis by creating pressure to produce a stronger punch per unit of weight and volume.

People look at nuclear non-proliferation as a semi-successful example of prohibition, but the situation might be closer to our drug war. It’s easy to target the AI leaders with their large fleets of GPUs and large teams, just like it’s feasible to regulate the big pharma. It might be way more difficult to figure out who are all the small groups all over the world pursuing non-saturating recursive self-improvement of scaffoldings and such on top of already released open weight LLMs. A failed prohibition is likely to lower the odds of reasonable outcome by making the identity of the winner and the nature of the winning approach very unpredictable.)

Some of the ways the IABIED plan can backfire

mishka10d20

I am agnostic. I don’t think humans necessarily need to modify the GPT architecture, I think GPT-6 would be perfectly capable of doing that for them.

But I also think that those “brains-in-the-box” systems will use (open weights or closed weights) LLMs as important components. It’s a different ball game now, we are more than halfway through towards the “true self-improvement”, because one can incorporate open weight LLMs (until the system progresses so much they can be phased out).

The leading LLMs systems are starting to provide real assistance in AI research + even their open versions are pretty good at being the components of the next big thing. So yes, this is purely from LLM trends (the labs insiders are tweeting that chatbots are saturated and that their current focus is how much LLMs can assist in AI research; and plenty of people express openness to modifying the architecture at will). I don’t know if we are going to continue to call them LLMs, but it does not matter, there is no strict boundary between LLMs and what is being built next.

I don’t want to continue to elaborate the technical details (what I am saying above is reasonably common knowledge, but if I start giving further technical details, I might say something actually useful for acceleration efforts).

But yes, I am saying that one should expect the trends to be faster than what follows from previous LLMs trends, because of how the labs are using them more and more for AI research. METR doubling periods should start shrinking soon.

Some of the ways the IABIED plan can backfire

mishka10d20

The book is neutral about timelines, or whether ASI is easy or not. It specifically calls those “hard issues” which it is going to sidestep.

So it would be difficult to make this a crux. The book implies that its recommendations don’t depend on one’s position on those “hard issues”.

If, instead, its recommendations were specific depending on the views on timelines and on the difficulty to achieve the ASI, that would be different.

As it is, the only specific dependency is the assumption that AI safety is very hard (in this sense, I did mention that I am speaking from the viewpoint of people who think it’s hard, but not very hard, the “10%-90% doom probability”).

I agree that humans are “disaster monkeys” poorly equipped to handle this, but I think they are also poorly equipped to handle the ban. They barely handle nuclear controls, and this one is way more difficult. I doubt the P(doom) conditional on the ban will be lower than P(doom) conditional on no ban. (Here I very much agree with Eliezer’s methodology of not talking of absolute values of P(doom), but of comparing conditional P(doom) values.)

Some of the ways the IABIED plan can backfire

mishka10d20

Yeah, I think my position is ASI is easy/Doom likelihood is medium.

And, more specifically, I think that a good part of doom likelihood is due to people seeming to disagree super radically with each other about the details of anything related to AI existential safety.

This an empirical observation, I don’t have a good model for where this radical disagreement comes from; but this seeming inability to even start narrowing the disagreements down is not a good sign; first of all, this means that people are likely to keep disagreeing sharply with each other even when working together under the umbrella of a possible ban treaty.

So, without a ban, this seems to suggest that the views of the “race winners” on AI safety are very unpredictable (that’s not good, very difficult to predict what would happen). And with a ban, this seems to suggest high likelihood of ideologically driven rebellions against the ban (perhaps covert rather than overt, given the threat of the armed force).

With people being able to talk to each and converge to something, I would expect the doom likelihood to be reducible to more palatable levels. But without being able to narrow disagreements down somewhat, the doom chances have to be quite significant.

Some of the ways the IABIED plan can backfire

mishka10d40

This all makes sense. I do think underground orgs exist; right now their chances of beating the leaders are not too high.

There are even “intermediate status orgs”; e.g. we know about Ilya’s org because he told us, but we don’t know much about it.

The post is for people to ponder how likely is all this to backfire in these (or other) fashions. I am not optimistic personally (all these estimates depend on the difficulty of achieving the ASI, and my personal estimates are that ASI is relatively easy to achieve, and that timelines are really short; the more tricky the task of creating ASI is, the stronger are the chances of such a plan not backfiring).

If one thinks about the original Eliezer’s threat model, someone launching a true recursive self-improvement in their garage… the only reason we don’t think about that much at the moment is because the high compute orgs are moving fast; the moment they stop, the original Eliezer’s threat model will start coming back into play more prominently.

Some of the ways the IABIED plan can backfire

mishka10d00

What I don't understand is why the underground lab wouldn't join the INTERNATIONAL megaproject.

Because they don't want to be known. That's what the word "underground" means.

The enforcement regime of this kind is prone to abuses, so there will be a lot of distrust; also they might feel that everyone is too incapacitated, and while they would not normally have a chance against larger above-the-ground orgs, the new situation is different.

to want to take over the world

Yes, this would be their plan, to take over the world, or to pass the control to the ASI which they would presume to be friendly to them (and, if they have an altruistic mindset, to everyone else too, but even in this case, the problem is that their assumptions of friendliness might be mistaken).

Some of the ways the IABIED plan can backfire

mishka10d40

I usually think about “reversible merges” for the purpose of intelligence augmentation (not for the purpose of space travel, though).

I tend to think that high-end non-invasive BCI are powerful enough for that and safer than implants. But yes, there still might be serious risks, both personal and existential.

Some of the ways the IABIED plan can backfire

mishka10d31

Yes.

What I am going to say is semi-off-topic for this post (I was trying not to consider potential object-level disagreements), but I have noticed that when discussing human intelligence augmentation, the authors of IABIED always talk only about genetic enhancements and never about direct merge between humans and electronic devices (which seems to also be consistent with their past writings on this). So it seems that for unspecified (but perhaps very rational) reasons, they want to keep enhanced humans purely biological for quite a while.

(Perhaps they think that we can't handle close coupling of humans and electronics in a way which is existentially safe at this time.)

Whereas, sufficient uplifting requires fairly radical changes. And, in any case, intelligence augmentation via coupling with electronics is likely to be a much faster path and to produce a more radical intelligence augmentation. But, perhaps, they think that the associated existential risks are too high...

Some of the ways the IABIED plan can backfire

mishka10d40

where the future of humanity only gets a tiny sliver of the reachable universe

I am not sure how to think about this. "Canned primates" are not going to reach a big part of the physically reachable universe. For the purposes of thinking about "the light cone", one should still think about "merge with AI", "uploading", and so on. That line of reasoning should be not about "humans vs AIs", but about ways to have a "good merge" (that is, without succumbing to S-risks, and without doing bad things to unmodified biologicals).

Also, I tend to privilege already living humans or their close descendants over the more remote ones, so achieving personal immortality is important if one wants to enjoy a sizable chunk of "the light cone" (it takes time to reach it). Of course, we need personal immortality ASAP anyway, otherwise their "everyones dies" would really become true (although not all at once, and not without replacement, but that's cold comfort for those currently alive).

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments