Message

AviS

Message

AviS

The first bullet point here is what I see as the most important factor for why current AI doesn't seek extreme power: they are best thought of not as being intrinsically motivated to complete tasks, but rather as having a reflex to complete contexts in a human-like way.

Maybe RL focusses this reflex and adds some degree of motivation to models, but I doubt this effect is large. My reasoning for this is that the default behavior of a pretrained model is to act like it is pursuing a goal when the inputted context suggests this, so there is little reward/gradient pressure to instill additional goal-pursuing drive.

Daniel Kokotajlo's Shortform

AviS8mo32

It certainly seems like Israel is acting in a manner consistent with taking this concern seriously, seeming intent on ending Hamas's presence in Gaza, holding a buffer zone in Syria, and weakening Hezbollah.

Counting arguments provide no evidence for AI doom

AviS2y1-3

A point about counting arguments that I have not seen made elsewhere (although I may have missed it!).

The failure of the counting argument that SGD should result in overfitting is not a valid countexample! There is a selection bias here - the only reason we are talking about SGD is *because* it is a good learning algorithm that does not overfit. It could well still be true that almost all counting arguments are true about almost all learning algorithms. The fact that SGD does generalises well is an exception *by design*.