LESSWRONG
LW

402
David Scott Krueger (formerly: capybaralet)
2460Ω597684751
Message
Dialogue
Subscribe

https://twitter.com/DavidSKrueger
https://www.davidscottkrueger.com/
https://therealartificialintelligence.substack.com/p/the-real-ai-deploys-itself

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5capybaralet's Shortform
Ω
5y
Ω
50
Gradual Disempowerment Monthly Roundup
David Scott Krueger (formerly: capybaralet)2d*40

In general I think you should be a little suspicious of all lab self-reports about data usage, partly because they have a strong incentive to slightly fudge the category boundaries. In this case, they had a top-level category for “self-expression” which included “relationships and personal reflection” as well as “games and role-play”. Make of that what you will. But overall I think this kind of work is extremely valuable, and I’m very glad they did it.


Another reason I heard is that they don't include enterprise use here, e.g. because of privacy agreements with companies.  This data may also look more "job replace-y" vs. "complementary".

Reply
Antisocial media: AI’s killer app?
David Scott Krueger (formerly: capybaralet)15d40

agreed -- I'm suggesting they'll be blending together, and that moving towards AI generated videos as the primary means of generating content on social media will help companies automate content creators

Reply
Safety researchers should take a public stance
David Scott Krueger (formerly: capybaralet)22d209

Huge thanks to all the lab employees who stated their support for an AI moratorium in this thread!

Can we make this louder and more public?  This is really important for the public to understand.

Reply
Safety researchers should take a public stance
David Scott Krueger (formerly: capybaralet)22d40

why not?

Reply
The real AI deploys itself
David Scott Krueger (formerly: capybaralet)22d30

Yeah, this is automatically cross-posted, but I guess it's not working as well as I'd hoped. 

Reply
The real AI deploys itself
David Scott Krueger (formerly: capybaralet)22d30

I think you're overoptimistic about what will happen in the very near term. 

But I agree that as AI gets better and better, we might start to see frictions going away faster than society can keep up (leading to, eg. record unemployment) before we get to real AI. 

Reply
The real AI deploys itself
David Scott Krueger (formerly: capybaralet)22d50

The term AGI has been confusing to so many people and corrupted/co-opted in so many ways. I will probably write a post about that at some point. 

Reply1
We are likely in an AI overhang, and this is bad.
David Scott Krueger (formerly: capybaralet)23d20

An argument I find somewhat compelling for why the overhang would be increasing is:
Our current elicitation methods would probably fail for sufficiently advanced systems, so we should expect their performance to degrade at some point(s).  This might happen suddenly, but it could also happen more gradually and continuously.  A priori, it seems more natural to expect the continuous thing, if you expect most AI metrics to be continuous.

Normally, I think people would say, "Well, you're only going to see that kind of a gap emerging if the model is scheming (and we think it's basically not yet)", but I'm not convinced it makes sense to treat scheming as such a binary thing, or to assume we'll be able to catch the model doing it (since you can't know what your missing, and even if you find it, it won't necessarily show up with signatures of scheming vs. looking like some more mundane seeming elicitation failure).  

This is a novel argument I'm elaborating, so don't stand by it strongly, but I don't feel like it's been addressed that I've seen, and it seems like a big weakness of a lot of safety cases that they seem to be treating underelicitation as either 1) benign or 2) due to "scheming", which is treated as binary and ~observable.  "Roughly bound[ing]" an elicitation gap may be useful, but is not sufficient for high quality assurance -- we're in the realm of unknown unknowns (I acknowledge there's some subtleties around unknown unknowns I'm eliding, and that the OP is arguing that this is "likely", which is a higher standard than "likely enough that we can't dismiss it").

 

But, GPT-5 reasoning on high is already decently expensive and this is probably not a terrible use of inference compute, so we don't have a ton of headroom here.

As written, this is clearly false/overconfident.  It's "probably not a terrible use of inference compute" is a very weak and defensible claim.  But drawing the conclusion that "we don't have a ton of headroom" from it is completely unwarranted and would require making a stronger, less defensible claim. 

Reply1
Thoughts on Gradual Disempowerment
David Scott Krueger (formerly: capybaralet)1mo20

Thanks!

> Do you think that, absent AI power-seeking, this dynamic is highly likely to lead to human disempowerment? (If so, then i disagree.)

As a sort-of answer, I would just say that I am concerned that people might knowingly and deliberately build power-seeking AIs and hand over power to them, even if we have the means to build AIs that are not power-seeking.

> I said "absent misalignemnt", and I think your story involves misalignment?

It does not.  The point of my story is: "reality can also just be unfriendly to you".  There are trade-offs, and so people optimize for selfish, short-term objectives. You could argue people already do that, but cranking up the optimization power without fixing that seems likely to be bad.

My true objection is more that I think we will see extreme safety/performance trade-offs due to technical inadequacies -- ie (roughly) the alignment tax is large (although I don't like that framing).  In that case, you have misalignment despite also having a solution to alignment: competitive pressures prevent people from adopting the solution.

Reply
Thoughts on Gradual Disempowerment
David Scott Krueger (formerly: capybaralet)2mo102

(I’ve only read the parts I’m responding to)
 

My high-level view is that the convincing versions of gradual disempowerment either rely on misalignment or result [from] power concentration among humans. 

It feels like this statement should be qualified more; later it is stated that GD isn’t “similarly plausible to the risks from power-seeking AI or AI-enabled coups”, but this is holding GD to a higher bar; the relevant bar would seem to be “is plausible enough to be worth considering”. 

“Rely[ing] on misalignment” is also an extremely weak condition: I claim that current systems are not aligned, and gradual disempowerment dynamics are already at play (cf AI “arms race”).

The analysis of economic disempowerment seems to take place in a vacuum, ignoring one of the main arguments we make, which is that different forms of disempowerment can mutually reinforce each other.  The most concerning version of this, I think, is not just “we don't get UBI”, but rather that the memes that say “it's good to hand over as much power as quickly as possible to AI” win the day. 

The analysis of cultural disempowerment goes one step “worse”, arguing that “If humans remain economically empowered (in the sense of having much more money than AI), I think they will likely remain culturally empowered.”  I think we agree that a reasonable model here is one where cultural and economic are tightly coupled, but I don’t see why that means they won’t both go off the rails.  You seem to think that they are almost guaranteed to feedback on each other in a way that maintains human power, but I think it can easily go the opposite way. 

Regarding political disempowerment, you state: “It’s hard to see how those leading the state and the top AI companies could be disempowered, absent misalignment.” Personally, I find this quite easy. Insufficient elite coordination is one mechanism (discussed below). But reality can also just be unfriendly to you and force you to make choices about how you prioritize long-term vs. short-term objectives, leading people to accept deals like: “I'll be rich and powerful for the next hundred years, and then my AI will take over my domain and do as it pleases”. Furthermore, if more people take such deals, this creates pressure for others to do so as well, since you need to get power in the short-term in order to remain “solvent” in the long term, even if you aren’t myopic yourself.  I think this is already happening; the AI arms race is burning the commons every day; I don’t expect it to stop.

Regarding elite coordination, I also looked at the list under the heading “Sceptic: Why don’t the elites realise what’s happening and coordinate to stop it?” Another important reason not mentioned is that cooperating usually produces a bargaining game where there is no clearly correct way to split the proceeds of the cooperation.

Reply
Load More
35Antisocial media: AI’s killer app?
15d
8
76The real AI deploys itself
23d
8
33Announcing "The Real AI": a blog
1mo
1
50Detecting High-Stakes Interactions with Activation Probes
3mo
0
25Upcoming workshop on Post-AGI Civilizational Equilibria
4mo
0
24A review of "Why Did Environmentalism Become Partisan?"
6mo
0
167Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
Ω
9mo
Ω
65
40A Sober Look at Steering Vectors for LLMs
Ω
11mo
Ω
0
19Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
QΩ
1y
QΩ
7
21An ML paper on data stealing provides a construction for "gradient hacking"
1y
1
Load More
Consequentialism
10 years ago
(+50/-38)