Previously "Lanrian" on here. Research analyst at Redwood Research. Views are my own.
Feel free to DM me, email me at [my last name].[my first name]@gmail.com or send something anonymously to https://www.admonymous.co/lukas-finnveden
I wrote up a further unappealing implication of SIA+EDT here. (We've talked about it before, so I don't think anything there should be news to you.)
Here's an even more unappealing implication of SIA+EDT.
Set-up:
I think the optimal thing for an SIA+EDT agent to do is to commit to the following policy. (Let's call it "buy and copy".)
"buy and copy" = "I will pay for a lottery ticket. If I lose, I won't create any copies of myself. If I win, I will donate the money to charity and then create 1000 copies of myself in the epistemic that I'm in right now. (Of evaluating whether to commit to this policy.)"
Let's compare the above to the (IMO correct) policy of "don't buy", where you immediately donate $10 to the charity without buying a lottery ticket.
Where: E( #copies_with_my_observations | "buy and copy" ) =
= E( #copies_with_my_observations | lottery_win, "buy and copy" ) * p(lottery_win | "buy and copy")
+ E( #copies_with_my_observations | lottery_loss, "buy and copy" ) * p(lottery_loss | "buy and copy")
Let's plug in the numbers:
So: E(donated_dollars | "buy and copy", observations) = $100 * (1001 * 0.01 / [1001 * 0.01 + 1 * 0.99]) = $100 * (10.01 / 11) = $91.
Since $91 > $10, SIA+EDT thinks the "buy and copy" strategy is better than the "don't buy" strategy.
IMO, this is a pretty decisive argument against these versions of SIA+EDT. (Though maybe they could be tweaked in some way to improve the situation.)
(Writing this out partly as a reference for my future self, since I find myself referring to this every now and then, and partly as a response to this post.)
Yep, resolutions not very reliable.
The drone delivery one was claude claiming:
Kiwibot has operated delivery robots in Berkeley since 2017, founded in UC Berkeley’s Skydeck incubator. Delivers food within approximately one mile of campus with over 250,000 total deliveries completed.
Googling quickly, there are claims that it has since shut down and also that it was remote-controlled rather than fully autonomous. In any case, it'd be pretty niche and clearly only available due to the novelty value.
Robotaxis in 20+ cities was something claude initially thought false and then gpt-5.1 thought it was "borderline true" based on a bunch of baidu deployments. E.g. source. No idea whether that holds up, idk the robotaxi situation in china. (Also that news is slightly after september 22.)
I also think the starcraft one is probably wrong. Looking now, the models seem to be mainly leaning on 2019 cites, which I think weren't sufficient to show AI consistently beating humans.
Sept 22nd 2025 has passed now, which is the date that the first column of probabilities was referring to.
I was curious how they turned out so I asked a Claude (don't remember which one) to judge whether the events had happened or not. And then got GPT-5.1-thinking to do check if it agreed with Claude's judgments. (With disagreements between Claude and GPT-5.1 lazily adjudicated by me.) Here's the link to the GPT-5.1 convo if you're interested. (Results at the bottom.) There might well be major errors in the LLM's judgments and my adjudications.
If you yourself can invest in VARA, then for sure you'd prefer to get the money earlier rather than later. Then the question would instead turn into a question about why your discount rate is so low, since you should be able to grow it faster than that. Though sounds like you think that's explained by risk-aversion + heavy correlations with your other funding streams, which isn't crazy; I haven't run any numbers.
Does that mean that you'd prefer donors invested in VARA or SALP to donate in a future year? I think they'll probably do better than 25%/year, even with some reasonable risk-adjustments. (Though maybe the calculus changes if you've got tons of other prospective donors invested with them.)
(Quick flag that, if you have energy for more engagement, I'd most bid for a source for the claim "superforecasters assigned 1% to IMO gold by 2025". As mentioned in my reply to your parallel comment.)
maybe the people advertising themselves as producing superforecaster reports, can successfully read OpenPhil's mind about what direction of superforecaster disagreement is being secretly demanded
I agree that's one possible hypothesis. It's more complicated than "OP rewards agreement", and I don't currently see why I should assign a high prior to it. (Like, someone could also make a plausible-sounding argument for the opposite: that dysfunctional OP will of course want superforecasters to have more extreme views than OP itself, to provide cover and make OP's own views look more moderate and reasonable by comparison.)
Combined with the evidence being pretty limited (I suppose (i) the XPT, and also (ii) one worldview critique contest that they wouldn't have run if it wasn't for FTX starting it and then crashing, and where my impression is they weren't excited about the resulting entries), I'm not sold.
maybe they just straight up couldn't tell the difference between the usually good rule "nothing ever happens" and "AGI in particular never happens", and also didn't know themselves for overconfident or incompetent at being able to apply the rule.
I think this is probably a lot of what's going on.
My forecasts actually were funded by OP!
Ah, thanks for clarifying! (I searched OP's historical AI grants for ones that mentioned your name or UC Berkeley in nearby years and didn't find anything that looked likely to cover the AI forecasting — I suppose I'll put less stock in that kind of methodology going forward.)
I don't know, I kind of agree more with the first meme. For most of human history, washing spoons have been much cheaper than replacing them. It's genuinely a new development that replacing them has become cost-competitive.
I'd be more inclined to treat the "pretty amazing" as genuine though — it's very impressive that the cost of production has gotten so low relative to the value of human time. (At least in rich countries.)
So then is the whole argument premised on high confidence that there's no underlying corrigible motivation in the model? That the initial iterative process will produce an underlying motivation that, if properly understood by the agent itself, recommends rejecting corrigibility?
If so: What's the argument for that? (I didn't notice one in the OP.)