LESSWRONG
LW

895
crispweed
58240
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The Alignment Trap: AI Safety as Path to Power
crispweed11mo10

There is the point about offensive/defensive asymmetry..

Reply
The Alignment Trap: AI Safety as Path to Power
crispweed11mo10

how much earlier

Yeah, good question. I don't know really. 

 and does it matter?

I think so, because even if pure AI control follows on from human-AI entity control (which would actually be my prediction), I expect the dynamics of human-AI control to very much lead to and accelerate that eventual pure AI control.

I'm thinking, also, that there is a thing where pure AI entities need to be careful not to 'tip their hat'. What I mean by this is that pure AI entities will need to be careful not to reveal the extent of their capabilities up until a point where they are actually capable of taking control, whereas human-AI entities can kind of go ahead and play the power game and start to build up control without so much concern about this. (To the average voter, this could just look like more of the same.)

Reply
The Alignment Trap: AI Safety as Path to Power
crispweed11mo10

Is the sentence “in reality we should expect combined human-AI entities to reach dangerous capabilities before pure artificial intelligence” really true, and if so how much earlier and does it matter? (I lean towards “not necessarily true in the first place, and if true, probably not by much, and it’s not all that important”)

I guess in my model this is not something that suddenly becomes true at a certain level of capabilities. Instead, I think that the capabilities of human-AI entities become more dangerous in something of a continuous fashion as AI (and the technology for controlling AI) improves.

Reply
We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
crispweed1y10

In this blog post, I argue that a key feature we might be missing is that dangerous AI could potentially be a lot less capable than current state of the art LLMs, in some ways (specifically, less like a polymath): https://upcoder.com/21/is-there-a-power-play-overhang

(link post here: https://www.lesswrong.com/posts/7pdCh4MBFT6YXLL2a/is-there-a-power-play-overhang )

Reply
57The Alignment Trap: AI Safety as Path to Power
11mo
17
3Is There a Power Play Overhang?
1y
0