TwentyFiveHour — LessWrong

I think the word "capabilities" is actually not quite what the word we're looking for here, because the part of "capabilities" that negatively affects timelines is pretty much just scaling work. You can make a reasonable argument that the doomsday clock is based on how fast we get to an AI that's smarter than we are; which is to say, how quickly we end up being able to scale to however many neurons it takes to become smarter than humans. And I don't expect that prompt research or intepretability research or OOD research is really going to meaningfully change that, or at least I don't recall having read any argument to the contrary.

Put another way: do you think empirical alignment research that doesn't increase capabilities can ever exist, even in principle?

EDIT: On reflection, I guess in addition to scaling that "agent goal-directedness" work would fall under the heading of things that are both (1) super dangerous for timelines and (2) super necessary to get any kind of aligned AI ever. I do not know what to do about this.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments