As far as I understand, * means something that something that one would want, agree, decide, etc. under ideal reflection conditions (e.g. knowing most plausibly relevant arguments, given a long time to think, etc) See, e.g. the CEV as defined not in relation to alignment targets or Wei Dai's metaethical alternatives 3-5.
Could you share your model, if you haven't modeled it in an incognito chat? And don't forget to check out my comment below. If you are right, then the alternate Claude who succeeded at 15-second-long tasks would have a lower 50% time horizon.
P.S. I also asked the AIs a similar question. Grok 4 failed to answer, Gemini 3 Pro estimated the revised 80% horizon as 35-40 mins and the 50% horizon as 6-7 hrs. GPT-5's free version gave me this piece of slop, EDIT: Claude Opus 4.5 estimated the revised 80% horizons at 30-45 or 45-60 mins and 50% horizons as 2-3 hrs or 3-4 hrs.
This makes me think of the previous model with the biggest 50%/80% time horizon ratio, Grok 4. It had funny failures at 2 sec, 2min and 2h long tasks. What if an alternate-universe Claude who, like GPT-5.1-Codex-Max, succeeded at ALL tasks shorter than a minute, would have achieved a far bigger 80% time horizon? And if GPT-5.2 and Gemini 3 Pro had the failures at less-than-a-minute-long tasks ironed out, as happened with GPT-5 vs Grok 4?
EDIT: in theory, the alternate Claude could also end up with a worse 50% time horizon. But the real Claude succeeded on a quarter of 2-4 hr long tasks and about a half of 4-16 hr long tasks.
Zvi covered education in a series of roundups ("Childhood and Education Roundup #N") In two most recent ones he concludes that American educational system is in a crisis and that the entire educational ‘expert’ class very obviously is engaged in enemy action, to which Zvi devoted an entire day.
ARC-AGI-1 performance of the newest Gemini 3 Flash and the older Grok 4 Fast implies a potential cluster of maximal capabilities of models with ~100B params/token. Unfortunately, the potential cluster didn't have any company try and create more models of such class.
Had RMP try to roast my post about evidence against CoT-based supercoders. The post itself is here. RMP's fact check managed to claim that I thought OpenBrain to be a real company (which I never did. What I did was to quote a piece of the AI-2027 scenario relevant to the authors' idea of solving alignment) and, which is worse, that the AI-2027 slowdown ending involved INTERNATIONAL coordination. The fallacy check claimed that GPT-5 and Grok 4 don't exist. Does it mean that the tool should doublecheck the claims related to new models?
Me too. It's METR who has yet to reveal anything aside from evidence extracted by Jurkovic about the models aside from C. Sonnet 4.5 (and GPT-5.1 Codex Max, but you didn't mention it; C. Sonnet 4.5 was never SOTA to begin with and could be unusable for the graph. GPT-5.1 Codex Max had someone add the data point to the AI-2027 graph and Kokotajlo notice the likely return of the 7 month doubling trend) But I doubt that "this kind of extensive work can hardly keep up with the release of new models providing new data", since an update of parameters would likely require mere days, if not minutes, of thinking per data point. See, e.g. Greenblatt's quick take about the GPT-5-related forecast and my two comments there, or my post on a worrisome trend which could have been invalidated by new models.
Thank you for covering the issue of optimizaton for virality in far more detail than my comment did! My worry is a different facet: what if such content distorts the users' brains with problematic results?
As for the Bleeding Mind persona, there turned out to exist a Russian short story written back in 2017 which Claude Opus 4.5 found rather similar. Additionally, I have a nitpick related to a phrase:
The nitpick
Self-Other Overlap (SOO), perhaps the only alignment approach which is "Not obviously stupid" according to Eliezer.
I would rather rephrase it as "the only alignment approach not from MIRI that Eliezer has bothered to read and didn't proceed to rule out on sight", which would imply that such approaches (e.g. this one) are highly unlikely not to be slop, not that Eliezer read all such approaches and deemed them to be stupid. For example, if Max Harms' idea of CAST and measuring empowerment was discovered or quasi-reformulated by an outsider, then this wouldn't mean that Eliezer considers the rediscovered approach stupid.
It could also be interesting to model potential memetic evolution. Suppose that the model is pretrained on a mixed dataset where some documents describe aligned AIs, while others describe misaligned[1] AIs. Then another model is pretrained on another mixed dataset where the ratio of aligned-to-misaligned is determined by the previous model's choices, etc. In the end, will the equilibrium be closer to aligned models or to misaligned ones?
I also suspect that one might be able to control the misalignment type, but I don't understand whether this effect is detectable in the regime you describe. Were the model to believe that misaligned AIs decide to become superintelligent teachers instead of superintelligent servants, it might rule against commiting genocide or disempowering the humans.