LESSWRONG
LW

Lukas Finnveden
2664Ω447164363
Message
Dialogue
Subscribe

Previously "Lanrian" on here. Research analyst at Redwood Research. Views are my own.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Vitalik's Response to AI 2027
Lukas Finnveden6h*40

Your analysis is focused on whether humans or misaligned AI are at an overall better position at giving out certain deals. But even if I condition on it "humans could avoid AI takeover by credibly offering AIs large percentages of all future resources", it still seems <50% likely that they do it. Curious if you disagree. (In general, if I thought humans were going to act rationally and competently to prevent AI takeover risk, I think that would cut the risk in significantly more than half. There's tons of stuff that we could do to reduce the risk that I doubt we'll do.)

Maybe there's some argument along the lines of "just like humans are likely to mess up in their attempts to prevent AI takeover risk (like failing to offer deals), AIs are likely to mess up in their attempts to take over (like failing to make deals with each other), so this doesn't cut asymmetrically towards making deals-between-AIs more likely". Maybe, I haven't though much about this argument. My first-pass answer would be "we'll just keep making them smarter until they stop messing up".

If you wrote a vignette like Daniel suggests, where humans do end up making deals, that might help me feel like it's more intuitively likely to happen.

Minor points:

It'll be explicitly aiming to break the law and lie to humans to seize power, making its "promises" to other AIs less credible.

I'm generally thinking that the AIs would try to engineer some situations where they all have some bargaining power after the take-over, rather than relying on each others' promises. If you could establish that's very difficult to do, that'd make me think the "coordinated takeover" seemed meaningfully less likely.

Seems v plausible, but why 'probably'? Are you thinking techniques like debate probably stop working?

Yes, because of known issues like inaccessible information (primarily) and obfuscated arguments (secondarily).

Reply
Vitalik's Response to AI 2027
Lukas Finnveden2d92

I'm in favor of trying to offer deals with the AIs.

I don't think it reliably prevents AI takeover. The situation looks pretty rough if the AIs are far smarter than humans, widely deployed, and resource-hungry. Because:

  • It's pretty likely that they'll be able to communicate with each other through one route or another.
  • It seems intuitively unlikely that humans will credibly offer AIs large percentages of all future resources. (And if an argument for hope relies on us doing that, I think that should be clearly flagged, because that's still a significant loss of longtermist value.)
  • At some level of AI capability, we would probably be unable to adjudicate arguments about which factions are misaligned or about what technical proposals would actually leave us in charge vs. disempowered.
Reply
Vitalik's Response to AI 2027
Lukas Finnveden2d41

I agree it's not a slam dunk.

It does seem unlikely to me that humanity would credibly offer large fractions of all future resources. (So I wouldn't put it in a scenario forecast meant to represent one of my top few most likely scenarios.)

Reply
Vitalik's Response to AI 2027
Lukas Finnveden3d93

Importantly, if there are multiple misaligned superintelligences, and no aligned superintelligence, it seems likely that they will be motivated and capable to coordinate with each other to overthrow humanity and divide the spoils.

Reply
Vitalik's Response to AI 2027
Lukas Finnveden3d20

I think the argument against that the military thing is supposed to be item 1 on the list.

(1) The world's physical security (incl bio and anti-drone) is run by localized authority (whether human jor AI) that is not all puppets of Consensus-1 (the name for the AI that ends up controlling the world and then killing everyone in the AI 2027 scenario) (...) Intuitively, (1) could go both ways. Today, some police forces are highly centralized with strong national command structures, and other police forces are localized. If physical security has to rapidly transform to meet the needs of the AI era, then the landscape will reset entirely, and the new outcomes will depend on choices made over the next few years. Governments could get lazy and all depend on Palantir. Or they could actively choose some option that combines locally developed and open-source technology. Here, I think that we need to just make the right choice.

I.e.: The argument is that there might not be a single Consensus-1 controlled military even in the US.

I think it seems unlikely that the combined US AI police forces will be able to compete with the US AI national military, which is one reason I'm skeptical of this. Still, if "multiple independent militaries" would solve the problem, we could potentially push for that happening inside the national military. It seems plausible to me that the government will want multiple companies to produce AI for their military systems, so we could well end up with different AI military units run by different AI system.

The more fundamental problem is that, even if the different AIs have entirely different development histories, they may all end up misaligned. And if they all end up misaligned, they may collude to overthrow humanity and divide the spoils.

I'm all for attempts to make this more difficult. (That's the kind of thing that the AI control agenda is trying to do.) But as the AIs get more and more superhuman, it starts to seem extremely hard to prevent all their opportunities at collusion.

Reply
Foom & Doom 1: “Brain in a box in a basement”
Lukas Finnveden8dΩ340

To be clear: I'm not sure that my "supporting argument" above addressed an objection to Ryan that you had. It's plausible that your objections were elsewhere.

But I'll respond with my view.

If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said.

Ok, so this describes a story where there's a lot of work to get proto-AGI and then not very much work to get superintelligence from there. But I don't understand what's the argument for thinking this is the case vs. thinking that there's a lot of work to get proto-AGI and then also a lot of work to get superintelligence from there.

Going through your arguments in section 1.7:

  • "I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above."
    • But I think what you wrote about the simple(ish) core of intelligence in 1.3 is compatible with there being like (making up a number) 20 different innovations involved in how the brain operates, each of which gets you a somewhat smarter AI, each of which could be individually difficult to figure out. So maybe you get a few, you have proto-AGI, and then it takes a lot of work to get the rest.
      • Certainly the genome is large enough to fit 20 things.
      • I'm not sure if the "6-ish characteristic layers with correspondingly different neuron types and connection patterns, and so on" is complex enough to encompass 20 different innovations. Certainly seems like it should be complex enough to encompass 6.
    • (My argument above was that we shouldn't expect the brain to run an algorithm that only is useful once you have 20 hypothetical components in place, and does nothing beforehand. Because it was found via local search, so each of the 20 things should be useful on their own.)
  • "Plenty of room at the top" — I agree.
  • "What's the rate limiter?" — The rate limiter would be to come up with the thinking and experimenting needed to find the hypothesized 20 different innovations mentioned above. (What would you get if you only had some of the innovations? Maybe AGI that's incredibly expensive. Or AGIs similarly capable as unskilled humans.)
  • "For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence"
    • I agree that there are reasons to expect imitation learning to plateau around human-level that don't apply to fully non-imitation learning.
    • That said...
      • For some of the same reasons that "imitation learning" plateaus around human level, you might also expect "the thing that humans do when they learn from other humans" (whether you want to call that "imitation learning" or "predictive learning" or something else) to slow down skill-acquisition around human level.
      • There could also be another reason for why non-imitation-learning approaches could spend a long while in the human range. Namely: Perhaps the human range is just pretty large, and so it takes a lot of gas to traverse. I think this is somewhat supported by the empirical evidence, see this AI impacts page (discussed in this SSC).
Reply
Foom & Doom 1: “Brain in a box in a basement”
Lukas Finnveden8dΩ570

Prior to having a complete version of this much more powerful AI paradigm, you'll first have a weaker version of this paradigm (e.g. you haven't figured out the most efficient way to do the brain algorithmic etc).

A supporting argument: Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn't expect the human brain algorithm to be an all-or-nothing affair. (Unless it's so simple that evolution could find it in ~one step, but that seems implausible.)

Edit: Though in principle, there could still be a heavy-tailed distribution of how useful each innovation is, with one innovation producing most of the total value. (Even though the steps leading up to that were individually slightly useful.) So this is not a knock-down argument.

Reply
What's important in "AI for epistemics"?
Lukas Finnveden9d30

I don't know of any work on these unfortunately. Your two finds look useful, though, especially the paper — thanks for linking!

Reply
A case for courage, when speaking of AI danger
Lukas Finnveden13d95

I read Buck's comment as consistent with him knowing people who speak without the courage of their convictions for other reasons than stuff like "being uncertain between 25% doom and 90% doom".

Reply1
Habryka's Shortform Feed
Lukas Finnveden14d20

If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority?

Maybe Sam Altman just hoped to get way more compute in total, and then this failed, and OpenAI simply didn't have enough compute to meet GPT-5's demands no matter how high of a priority they made it? If so, I would have thought that's a pretty different story from the situation with superalignment (where my impression was that the complaint was "OpenAI prioritized this too little" rather than "OpenAI overestimated the total compute it would have available, and this was one of many projects that suffered"). 

Reply
Load More
Project ideas for making transformative AI go well, other than by working on alignment
Extrapolating GPT-N performance
Inside/Outside View
4y
(+429/-68)
Conservation of Expected Evidence
4y
(+106)
Acausal Trade
4y
(+11/-39)
129AI-enabled coups: a small group could use AI to seize power
Ω
3mo
Ω
18
50What's important in "AI for epistemics"?
11mo
2
18Project ideas: Backup plans & Cooperative AI
2y
0
20Project ideas: Sentience and rights of digital minds
2y
0
43Project ideas: Epistemics
2y
4
20Project ideas: Governance during explosive technological growth
2y
0
44Non-alignment project ideas for making transformative AI go well
2y
1
28Memo on some neglected topics
2y
2
39Implications of evidential cooperation in large worlds
Ω
2y
Ω
4
57PaLM-2 & GPT-4 in "Extrapolating GPT-N performance"
Ω
2y
Ω
6
Load More