The trend itself was this:
gpt_3_5_turbo_instruct 3.49
gpt_4 5.54
gpt_4_0125 4.47
gpt_4_1106 5.87
gpt_4_turbo 4.30
gpt_4o 5.48
o1_preview 4.76
o1_elicited 6.76
o3 4.39
o4-mini 4.93
gpt_5 5.18
gpt_5_1_codex_max 5.36
Neither I nor GPT-5.2 believe that THIS trend is consistent enough. Additionally, Claude Opus 4.5 had its share of doubts cast upon the abnormally high 50% time horizon. Finally, what would it mean for a hired human to have a 50% or 80% chance of succeeding at year-long tasks? That the human cannot do the task ~at all, even given 10 years? But even this example is not that an example...
I apologise, but there is another aspect which I described in this comment. Before the rise of the Internet pictures or films would have to be reproduced by talented people or expensive equipment before being seen by armies of viewers. Then the reproducers or those who possess the equipment would have to carefully select what they spread[1] across the nation over the years. This, in turn, would imply that a far-reaching meme would be spread for a long time by ~the same reproducers, letting the society react (e.g. by arresting the reproducer for possessing porn) or forget about the old films which weren't better than the average.
An additional level of friction was the requirement that the commoners come and see the film or see or get the photographs.
This reminds me of a case for slop and of the two rebuttals which the post received. I expect that high-level tastes (e.g. related to long-term value of the piece of art, the ideas which it propagates or to meanings unlocked under scrutiny) will not be satisfied by AI-assisted art unless either the AI or the human creator has high-level tastes as well. Alas, training high-level tastes into the AI could end up being difficult due to problems with incentives and with training data (think of GPT-4o's sycophancy, expected(?) rollout of erotica by OpenAI's models, AI girlfriends who don't need to be smarter than Llama, brainrot), and the art which you describe (e.g. making an impression of your sixteen-year-old self from an iPhone backup and then letting you talk to an LLM roleplaying as them) would be either as hard to value to outsiders as family photos or optimized for virality instead of causing the users to develop high-level tastes...
I notice that I am confused. You imply that the human-equivalent horizon of a model is Then the LOGARITHM of is and it is the LOGARITHM which likely behaves linearly if is constant and changes linearly or hyperbolically if changes linearly. Alas, doesn't change linearly across models. Instead, as far as I understand, is calculated as Were monotoneous, we would also expect monotoneous changes in the ratio of time horizons Instead, the ratios are this. Setting aside Claude Opus 4.5 with the ratio equal to 10.64, the next two biggest ratios are displayed by DeepSeek R1-0528 (8.53) and Grok 4 (7.14). Therefore, the ratio of the time horizons did NOT display a consistent trend at least before Claude Opus 4.5.
Claude now has a horizon of 444 billion minutes(!)
Could you actually provide a citation for Claude already being a supercoder? If you can't, then either the model has wrong parameters or is wrong wholesale. What I expect is that the time horizon is exponential until the last few doublings, not hyperbolic. Additionally, I suspect that Claude's horizon is more due to surprisingly low performance on some tasks well below the alleged horizon.
As for models being woefully outmatched by humans before a specific period when your "benchmark" skyrocketed, it means a different thing. Recall that the METR graph had the models' performance rise quickly, then slowly until the spike at the very end, that the AI-2027 forecast estimated the human speed of thought as 10-20 tokens/sec and that the models, unlike the humans, can only have the CoT and stuff tokens into the mechanism which ejects the next one without learning anything from experience.
Were METR's baselining process simulated and placed into the METR graph with 20 tokens/simulated second of a human doing the task, the performance on ranges higher than the time necessary to introduce the baseliners to the tasks would likely resemble a straight line where 1K tokens is a bit lower than a minute and 100K tokens are a between 1 hr and 2 hrs. I manually edited the line into the graph and added another line where the hypothetical model requires 100 times more tokens. The models first don't display progress at all, then proceed faster (OpenAI's models) or about as fast as the humans (GPT-4o, Claude Sonnet 4.5, Grok 4), then ALL models begin to proceed far slower, almost as if their competence is exhausted by harder tasks, and finally models since o3 display a jump, as if they did something, weren't confident, but decided to submit anyway.
In this post and its successors Max Harms proposes a novel understanding of corrigibility as the desired property of the AIs, including an entire potential formalism usable for training the agents to be as corrigible as possible.
The core ideas, as summarized by Harms, are the following:
Max Harms's summary
These claims can be tested fairly well:
@Max Harms honestly admitted that his first attempt at creating the formalism failed; while it is a warning that "formal measures should be taken lightly" (and, more narrowly, that the minus signs in expected utilities should be avoided), I expect there to be a plausible or seemingly plausible[1] fix (e.g. by considering the expected utility u(actual actions|actual values) - max(u(actual actions|other values), u(no actions|other values))
The followup work that I would like to see is intense testing-like actions (e.g. like the one which I described in point 4 and tests of potential fixes like the one which I described in point 6), but I don't understand who would do it.
E.g. E(u(actions|values)) - E(u(actions|counterfactual values)/2). Said "fix" prevents the AI from ruining the universe, but doesn't prevent it from accumulating resources and giving them to the user.
You can look over the current leaderboard for reviewers to get a sense of which of your reviews might be worth polishing.
Unfortunately, I don't see anything at all via the link. Is the reason that I haven't written reviews?
This post is an insightful attempt to explore a novel issue of kids whom parents are to prepare for a world drastically more different from the previous eras than the eras are from each other.
Before reading the post, I didn't think much about the issue, instead focusing on things like existing effects on kids' behavior. What this post made me do is to try and reason from first principles, which also is a way to test the post's validity.
My opinions based on first principles and potentially biased sources
Historically, parenting was supposed to optimize the kids' training environment so that the kids would become adults who are at least partially aligned to parents' values and have the capabilities to live a decent life. Additionally, parenting styles would also add or remove the constraint related to kids' welfare.
The most prominent example of overfocusing on capabilities is Asian parenting (think of South Korea where, quoting Zvi, "parents think that children who can’t compete for positional educational goods are better off not existing"), which also has a benigh explanation of a decent life requiring kids to enter one of few positions in high-level universities and companies.
As the OP correctly assumes, AI would either eliminate the race or cause the race to test new capabilities and character traits the OP mentions (e.g. the ability to adapt to new meaningful activities); in either case parenting is shifted towards kids' welfare.
On the other hand, my worry is that the kids could end up losing the motive to learn and to behave appropriately, which, according to potentially biased evidence, has already happened to lots of American Gen Alpha children. Said evidence also places the blame on superstimuli like short-form content readily accessible from electronic devices and affecting the brains of so-called Ipad kids.
This worry was also partially addressed in the OP in the form of "kids falling into some weird headspace, falling in love with the AI or something".
An additional test of accuracy of the post's reasoning is its quasi-replication not by me, but by someone who hasn't read the post at all.
Potential followups have also been done; for example, the OP's author wrote the post Why we’re still doing normal school, which discusses the reasons to keep educating children (e.g. connecting kids with friends, letting parents keep their autonomy) and sources of meaning that kids' lives would have in the post-work world.
The main takeaway from the post is Zvi's concept of levels of friction, which he developed later. As of the time of writing the post, Zvi had in mind the following:
I am coming around to a generalized version of this principle. There is a vast difference between:
- Something being legal, ubiquitous, frictionless and advertised.
- Something being available, mostly safe to get, but we make it annoying.
- Something being actively illegal, where you can risk actual legal trouble.
- Something being actively illegal and we really try to stop you (e.g. rape, murder).
We’ve placed far too many productive and useful things in category 2 that should be in category 1.
The main other takeaway is that frictionless bets on sports are a superstimulus, especially for highly vulnerable people:
It means those who are inclined to bet on sports are either often doing it out of desperation, or that the same causes that lead them to bet on sports and pushing them to the financial edge in other ways as well, and this is the straw breaking the camel’s back.
While I doubt that this post affected my thinking, the reason for that would be that I learned similar conclusions from different sources. For example, this post at After Babel explicitly claims that "In the last few years, limitless, frictionless gambling has become available to anyone with an internet connection." The only followup work that I might have recommended and that wasn't already done would be to explore the harms of frictionless ways to obtain superstimuli (e.g. some modern food or stimuli related to sexual instincts, like AI girlfriends or boyfriends) in more detail.
It seems to me that, unlike important problems that have been described in detail not by you,[1] coordination is at least as easy as a superintelligence-enabled strategy: assuming well-soluble superalignment, the authors of the AI-2027 forecast thought it easy to "codesign a new AI, Consensus-1, whose primary imperative—taking precedence over any future orders or retraining attempts—is to enforce the terms" of the treaty between the two ASIs created in the USA and China.
While the scenario itself had an aligned Safer-4 and a misaligned DeepCent-2, the Rogue Replication variant had the USA and China create Safer-1 and DeepCent-1 who adviced their respective governments to codesign Consensus-1. Since it had both the agenda of the USG and the CCP in mind, the world doesn't end up being controlled by the USA, but does end up flourishing.
How difficult does alignment need to be for your failure mode in the form of multipolar competition annihilating human values to occur, if the aligned and misaligned coalition might simply nuke the world into oblivion?
However, neither me nor Claude Sonnet 4.5 think that you accounted for the Intelligence Curse. I strongly suspect that it is the most important aspect that mankind is to consider.