you can't just train your ASI for corrigibility because it will sit and do nothing
I'm confused. That doesn't sound like what Max means by corrigibility. A corrigible ASI would respond to requests from its principal(s) as a subgoal of being corrigible, rather than just sit and do nothing.
Or did you mean that you need to do some next-token training in order to get it to be smart enough for corrigibility training to be feasible? And that next-token training conflicts with corrigibility?
Nothing importantly bearish happened in that month other than bullish deals
What happened that made a bunch of people more bearish is that AI stocks went up a good deal, especially some of the lesser known ones.
I'm unsure what exact time period you're talking about, but here are some of the more interesting changes between Aug 29 and Oct 15:
IREN +157%
CLSK +145%
APLD +136%
INOD +118%
NBIS +75%
MU +61%
AMD +47%
If I thought AI was mostly hype, those kinds of near-panic buying would have convinced me to change my mind from "I don't know" to "some of those are almost certainly in a bubble". (Given my actual beliefs, I'm still quite bullish on MU, and weakly bullish on half of the others).
A bunch of superforecasters were asked what their probability of an AI killing everyone was. They listed out the main ways in which an AI could kill everyone (pandemic, nuclear war, chemical weapons) and decided none of those would be particularly likely to work, for everyone.
As someone who participated in that XPT tournament, that doesn't match what I encountered. Most superforecasters didn't list those methods when they focused on AI killing people. Instead, they tried to imagine how AI could differ enough from normal technology that it could attempt to start a nuclear war, and mostly came up with zero ways in which AI could be powerful enough that they should analyze specific ways in which it might kill people.
I think Proof by Failure of Imagination describes that process better than does EFA.
I guess "steering abilities" wasn't quite the right way to describe what I meant.
I'll edit it to "desire to do anything other than predict".
I'm referring to the very simple strategy of leaving out the "then do that thing".
Training an AI to predict X normally doesn't cause an AI to develop a desire to cause X.
begging the question.
It seems that you want me to answer a question that I didn't plan to answer. I'm trying to describe some ways in which I expect solutions to look different from what MIRI is looking for.
I'm referring mainly to MIRI's confidence that the desire to preserve goals will conflict with corrigibility. There's no such conflict if we avoid giving the AI terminal goals other than corrigibility.
I'm also referring somewhat to MIRI's belief that it's hard to clarify what we mean by corrigibility. Max has made enough progress at clarifying what he means that it now looks like an engineering problem rather than a problem that needs a major theoretical breakthrough.
Max Harms' work seems to discredit most of MIRI's confidence. Why is there so little reaction to it?
Novice investor participation is nowhere near what it was at the 2000 dot com peak. Current conditions look more like 1998. A bubble is probably coming, but there's lots of room still for increased novice enthusiasm.