I definitely have to update here - that's just law of probability. Maybe you don't have to update much if you already expected to have superhuman competetive programming around now.
But also this isn't the only update that informs my new timelines. I was saying more like "look I wrote down advanced predictions and it was actually useful to me", rather than intending to give an epistemically legible account of my timeline models.
7 months ago, I wrote down those AI predictions:
How long until the sun (starts to) get eaten? 10th/50th/90th percentile: 3y, 12y, 37y.
How long until an AI reaches Elo 4000 on codeforces? 10/50/90: 9mo, 2.5y, 11.5y
About one month ago, aka 6 months after I wrote this, OpenAI's model won the ICPC world finals, which I guess is sorta equivalent to Elo 4000 on codeforces, given that it won by a significant margin.
(This updates me to thinking that both (1) AI capabilities increase faster than I expected, and (2) competetive programming requires less general intelligence than I expected.)
Absent any coordinated slowdown, my new 10/50/90 guess for dyson sphere level capability is: 1y, 3.3y, 18y.
(I still find it hard to predict whether progress will continue continuous or whether there will be at least one capability leap.)
I don't like superagency, but yeah seems important to have a better word for this. Maybe just RCR as abbreviation. Or hard-going or hard-optimizing.
I sometimes used "Harry-Factor" when talking to people who read HPMoR to describe what kind of intelligence I mean, and made examples like what he came up with in the last army battle, but obviously we want a different word.
The purpose of studying LDT would be to realize that the type signature you currently imagine Steve::consequentialist preferences to have is different from the type signature that Eliezer would imagine.
The starting point for the whole discussion is a consequentialist preference—you have desires about the state of the world after the decision is over.
You can totally have preferences about the past that are still influenced by your decision (e.g. Parfit's hitchhiker).
Decisions don't cause future states, they influence which worlds end up real vs counterfactual. Preferences aren't over future states but over worlds - which worlds would you like to be more real?
AFAIK Eliezer only used the word "consequentialism" in abstract descriptions of the general fact that you (usually) need some kind of search in order to find solutions to new problems. (Like I think just using a new word for what he used to call optimization.) Maybe he also used the outcome pump as an example, but if you asked him what how consequentialist preferences look like in detail, I'd strongly bet he'd say sth like preferences over worlds rather than preferences over states in the far future.
However, we would like to diversify the public face of MIRI and potentially invest heavily in a spokesperson who is not Eliezer, if we can identify the right candidate.
Is this still up to date?
To me it seems a bit surprising that you say we agree on the object level, when in my view you're totally guilty of my 2.b.i point above of not specifying the tradeoff / not giving a clear specification of how decisions are actually made.
I also think the utility maximizer frame is useful, though there are 2 (IMO justified) assumptions that I see as going along with it:
“You’re wrong that this supposed mistake that you attribute to Eliezer is a path through which we can solve the alignment problem, and Eliezer doesn’t emphasize it because it’s an unimportant dead-end technicality” (maybe! I don’t claim to have a solution to the alignment problem right now; perhaps over time I will keep trying and failing and wind up with a better appreciation of the nature of the blockers).
I'm more like "Your abstract guesturing didn't let me see any concrete proposal that would make me more hopeful, and even if good proposals are in that direction it seems to me like most of the work would still be ahead instead of it being like 'we can just do it sorta like that' as you seem to present it. But maybe I'm wrong and maybe you have more intuitions and will find a good concrete proposal.".
I don’t follow what you think Eliezer means by “consequentialism”. I’m open-minded to “farfuturepumping”, but only if you convince me that “consequentialism” is actually misleading.
Maybe study logical decision theory? Not sure where to best start but maybe here:
"Logical decision theories" are algorithms for making choices which embody some variant of "Decide as though you determine the logical output of your decision algorithm."
Like consequentialism in the sense of "what's the consequence of choosing the logical output of your decision algorithm in a particular way", where consequence here isn't a time-based event but rather the way the universe looks like conditional on the output of your decision algorithm.
I'm not confident those are the only reason why LVF seems worse here, I didn't fully articulate my intuitions yet.
I want to note that it seems to me that Jeremy is trying to argue you out of the same mistake I tried to argue you out in this thread.
The problem is that you use "consequentialism" different than Eliezer means it. I suppose he only used the word in a couple of occasions where he tried to get accross the basic underlying model without going into excessive details, and it may read to you like your "far futue outcome pumping" matches your definitions there (though back when I looked over your cited support that Eliezer means it, it didn't seem at all like the evidence points to this interpretation). But if you get a deep understanding of logical decision theory, or you study a lot of MIRI papers where they (where the utility of agents is iirc always over trajectories of the environment program[1]), you see what Eliezer's deeper position is.
Probably not worth the time to further discuss what certain other people do or don’t believe, as opposed to what’s true.
I think you're strawmanning Eliezer and propagating a wrong understanding of what "consequentialism" was supposed to refer to, and this seems like an important argument to have separately from what's true. But a good point that we should distinguish arguing about this from arguing about what's true.
Going forward, I suggest you use another word like "farfuturepumping" instead of "consequentialism". (I'll also use another word for Eliezer::consequentialism and clarify it, since it's apparently often misunderstood.)
As quick summary, which may not be easily understandable due to inferential distance, I think that me and Eliezer both think that:
I would recommend you to chat with Jeremy (and maybe reread our comment thread).
Yes utility is often formalized over the space of outcomes, but the space of outcomes is iirc the space of trajectories.
The authors propose to get an international treaty to pause progress towards superintelligence, including both scaling & R&D. I’m for it, although I don’t hold out much hope for such efforts to have more than marginal impact. I expect that AI capabilities would rebrand as AI safety, and plow ahead:
The problem is: public advocacy is way too centered on LLMs, from my perspective. Thus, those researchers I mentioned, who are messing around with new paradigms on arXiv, are in a great position to twist “Pause AI” type public advocacy into support for what they’re doing!
[...]
I think these people are generally sincere but mistaken, and I expect that, just as they have fooled themselves, they will also successfully fool their friends, their colleagues, and government regulators…
This seems way too pessimistic to me. (Or like sure it's going to be hard and I'm not super optimistic, but given that you're also relatively pessimistic the international AI R&D shutdown approach doesn't seem too unpromising to me.)
Sure they are going to try to convince government regulators that their research is great for safety, but we're going to try to convince the public and regulators otherwise.
I mean it's sorta understandable to say that we currently seem to be in a relatively weak position and getting sufficient change seems hard, but movements can grow quickly. Yeah understandable that this doesn't seem super convincing, but I think we have a handful of smart people who might be able to find ways to effectively shift the gameboard here. Idk.
More to the point though, conditional that we manage to internationally ban AI R&D, it doesn't obviously seem that much more difficult or that much less likely that we manage to also ban AI safety efforts which can lead to AI capability increases, based on the understanding that those efforts are likely delusional and alignment is out of reach. (Tbc I would try to not ban your research, but given that your agenda is the only one I am aware of into which I put significantly more than 0 hope, it's not clear to me that it's worth overcomplicating the ban around that.)
Also in this common knowledge problem domain, self-fulfilling prophecies are sorta a thing, and I think it's a bit harmful to the cause if you post on twitter and bluesky that you don't have much hope in government action. Tbc, don't say the opposite either, keep your integrity, but maybe leave the critizism on lesswrong? Idk.
Can you make "sort by magic" the default sort for comments under a post? Here's why:
The problem: Commenting late on a post (after the main reading peak) is disincentivized, not only because fewer people will read the post and look over the comments, but also because most people only look over the top scoring comments and won't scroll down far enough to read your new comment. This also causes early good comments to continue to accumulate more karma because more people read those, so the usual equilibrium is that early good comments stay on top and late good comments don't really get noticed.
Also, what one cares about for sorting is the quality of a comment, and the correct estimator for that would be "number of upvotes per views". I don't know how you calculate magic but it seems very likely like a better proxy for this than top scoring. (If magic doesn't seem very adequate and you track page viewcounts, you could also get a more principled new magic sort, though you'd have to track for each comment what viewcount the page had at the time when the comment was posted. Like if the average ratio of upvotes per views is a/b, you could assign each comment a score of (upvotes+a)/(page_views_since_comment_was_posted+b), and sort descending by score.)
Yeah I am also a bit disappointed with that list.
I would recommend controlAI.