LESSWRONG
LW

SE Gyges
81190
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
AI 2027 Response Followup
SE Gyges11d40

in my view, there doesn't seem to be such a thing as bad publicity for AGI, and I still don't know for sure why that's happening. And that seems like where most of the value is in figuring out this discussion, to me, at least. 

It's an incentive problem.

There is no way to discuss something being dangerous that does not also render it valuable. People are incentivized to seek out value; our entire economy is based on it. It works beautifully, but it is terrible at mitigating externalities. We only dial back from dangerous or bad things after the disaster; so long as doing things is profitable, rational economic actors seek out high-risk activities as far as permitted, because they alone get the profit and the majority of the risk is to other people.

In my view Yudkowsky's body of work has had two main effects, which run in opposite directions:

  1. Convincing many people that AI is extremely valuable, which is a large part of why we currently are where we are.
  2. Convincing many people that AI is dangerous, which shows no signs of paying off yet but which may be crucially important at some future juncture. I am willing to pronounce it a complete failure at actually causing any regulatory regime whatsoever to come into existence thus far.
Reply
SE Gyges' response to AI-2027
SE Gyges12d20

I think these are blind guesses and relying on the benchmarks is the streetlight effect, as I think we talked about in another thread. I am mostly explaining in as much detail as I can the parts I think are relevant to Neel's objection, since it is substantively the most common objection, ie, that paying attention to financial incentives or work history is irrelevant to anything. I am happy that I have addressed the scenario itself in enough detail

Reply
SE Gyges' response to AI-2027
SE Gyges12d11-2

These issues specifically have been a sticking point for a number of people, so I should clarify some things separately. Probably this is also because I didn't see this earlier so it's been a while and because I know who you are.

I do not think AI 2027 is, effectively, OpenAI's propaganda because it is about a recursively self-improving AI and OpenAI is also about RSI. There are a lot of versions (and possible versions) of a recursively self-improving AI thesis. Daniel Kokotajlo has been around long enough that he was definitely familiar with the territory before he worked at OpenAI. I think that it is effectively OpenAI propaganda because it assumes a very specific path to a recursively self-improving AI with a very specific technical, social and business environment, and this story is about a company that appears to closely resemble OpenAI [1]and is pursuing something very similar to OpenAI's current strategy. It seems unlikely that Daniel had these very specific views before he started at OpenAI in 2022.

Daniel is a thoughtful, strategic person who understands and thinks about AI strategy. He presumably wrote AI 2027 to try to influence strategy around AI. His perspective is going to be for playing as OpenAI. He will have used this perspective for years, totaling thousands of hours. He will have spent all of that time seeing AI research as a race, and trying to figure out how OpenAI can win. This is a generating function for OpenAI's investor pitch, and is also the perspective that AI 2027 takes.

Working at OpenAI means spending years of your professional life completely immersed in an information environment sponsored by, and meant to increase the value of, OpenAI. Having done that is a relevant factor for what information you think is true and what assumptions you think are reasonable. Even if you started off with few opinions about them, and you very critically examined and rejected most of what OpenAI said about itself internally, you would still have skewed perspective about OpenAI and things concerning OpenAI.

I think of industries I have worked in from the perspective of the company I worked for when I was in that industry. I expect that when he worked at OpenAI he was doing his best to figure out how OpenAI comes out ahead, and so was everyone around him. This would have been true whether or not he was being explicitly told to do it, and whether or not he was on the clock. It is simpler to expect that this did influence him than to expect that it did not.

Quitting OpenAI loudly doesn't really change this picture, because you generally only quit loudly if you have a specific bone to pick. If you've got a bone to pick while quitting OpenAI, that bone is, presumably, with OpenAI. Whatever story you tell after you do that is probably about OpenAI.

I think the part about financial incentives is getting dismissed sometimes because a lot of ill-informed people have tried to talk about the finances in AI. This seems to have become sort of a thought-terminating cliche, where any question about the financial incentives around AI is assumed to be from uninformed people. I will try to explain what I meant about the financial influence in a little more detail.

In this specific case, I think that the authors are probably well-intentioned. However, most of their shaky assumptions just happen to be things which would be worth at least a hundred billion dollars to OpenAI specifically if they were true. If you were writing a pitch to try to get funding for OpenAI or a similar company, you would have billions of reasons to be as persuasive as possible about these things. Given the power of that financial incentive, it's not surprising that people have come up with compelling stories that just happen to make good investor pitches. Well-intentioned people can be so immersed in them that they cannot see past them.

It is worth noting that the lead author of AI 2027 is a former OpenAI employee. He is mostly famous outside OpenAI for having refused to sign their non-disparagement agreement and for advocating for stricter oversight of AI businesses. I do not think it is very credible that he is deliberately shilling for OpenAI here. I do think it is likely that he is completely unable to see outside their narrative, which they have an intense financial interest in sustaining.

There are a lot of different ways for a viewpoint to be skewed by money.

First is to just be paid to say things.

I don't think anyone was paid anything by OpenAI for writing AI 2027. I thought I made enough of a point of that in the article, but the second block above is towards the end of the relevant section and I should maybe have put it towards the top. I will remember to do that if I am writing something like this again and maybe make sure to write at least an extra paragraph or two about it.

I do not think Daniel is deliberately shilling for OpenAI. That's not an accusation I think is even remotely supportable, and in fact there's a lot of credible evidence running the other way. He's got a very long track record and he made a massive point of publicly dissenting from their non-disparagement agreement. It would take a lot of counter-evidence to convince me of his insincerity.

You didn't bring him up, but I also don't think Scott, who I think is responsible for most of the style of the piece, is being paid by anyone in particular to say anything in particular. I doubt such a thing is possible even in principle. Scott has a pretty solid track record of saying whatever he wants to say.

Second: what information is available, and what information do you see a lot?

I think this is the main source of skew.

If it's valuable to convince people something is true, you will probably look for facts and arguments which make it seem true. You will be less likely to look for facts and arguments which make it seem false. You will then make sure that as many people are aware of all the facts and arguments that make the thing seem true as possible.

At a corporate level this doesn't even have to be a specific person. People who are pursuing things that look promising for the company will be given time and space to pursue what they are doing, and people who are not will be more likely to be told to find something else to do. You will choose to promote favorable facts and not promote bad ones. You get the same effect as if a single person had deliberately chosen to only look for good facts.

It would be weird if this wasn't true of OpenAI given how much money is involved. As in, positively anomalous. You do not raise money by seeking out reasons why your technology is maybe not worth money, or by making sure everyone knows those things. Why would you do that? You are getting money, directly, because people think the technology you are working on is worth a lot of money, and everyone knows as much as you can give them about why what you're doing is worth a lot of money.

Tangentially, this type of narrative allows companies to convince staff to take compensation that is more heavily weighted towards stock, which tends to benefit existing shareholders in cases where they prefer to do that. They know employees will probably sell it back to them well below value at public sale or acquisition, or they know the stock is worth less than salary would be.

For a concrete example of this that I didn't dig into in my review, from the AI 2027 timelines forecast.

We first show Method 1: time-horizon-extension, a relatively simple model which forecasts when SC will arrive by extending the trend established by METR’s report of AIs accomplishing tasks that take humans increasing amounts of time.

We then present Method 2: benchmarks-and-gaps, a more complex model starting from a forecast saturation of an AI R&D benchmark (RE-Bench), and then how long it will take to go from that system to one that can handle real-world tasks at the best AGI company.

Finally we then provide an “all-things-considered” forecast that takes into account these two models, as well as other possible influences such as geopolitics and macroeconomics.

Are either RE-Bench or the METR time horizon metrics good metrics, as-is? Will they continue to extrapolate? Will a model that saturates them accelerate research a lot?

I think the answer to all of these is maybe. If you're OpenAI, it is pretty important that benchmarks are good metrics. It is worth a ton of money. So, institutionally, OpenAI has to believe in benchmarks, and vastly prefers if the answer is "yes" to all of these questions. And this is also what AI 2027 is assuming.

I made a point of running this point into the ground in writing it up, but essentially every time we break a "maybe" question in AI 2027, the answer seems to be the one that OpenAI is also likely to prefer. It's a very specific thing to happen! It doesn't seem very likely it happened by chance. In total the effect is that "this is a slight dissent from the OpenAI hype pitch", in my opinion.

This isn't even a problem entirely among OpenAI people. OpenAI has the loudest voice and is more or less setting the agenda for the industry. This is both because they were very clearly in the lead for a stretch, and because they've been very successful at acquiring users and raising money. There are probably more people who are bought into OpenAI's exact version of everything outside the company than inside of it. This is a considerable problem if you want a correct evaluation of the current trajectory.

I obviously cannot prove this, but I think if Daniel hadn't been a former OpenAI employee I probably would have basically the same criticism of the actual writing. It would be neater, even, because "this person has bought into OpenAI's hype" is a lot less complicated without the non-disparagement thing, which buys a lot of credibility. I honestly didn't want to mention who any of the authors were at all, but it seemed entirely too relevant to the case I was making to do it.

That's two: being paid and having skewed information.

Third thing, much smaller, just being slanted because you have a financial incentive. Maybe you’re just optimistic, maybe you’re hoping to sell soon.

Daniel probably still owns stock or options. I mentioned this in the piece. I don't think this is very relevant or is very likely to skew his perspective. It did seem like I would be failing to explain what was going on if I did not mention the possibility while discussing how he relates to OpenAI. I think it is incredibly weak evidence when stacked against his other history with the company, which strongly indicates that he's not inclined to lie for them or even be especially quiet when he disagrees with them.

I don't think it's disgraceful to mention that people have direct financial incentives. There's I think an implicit understanding that it's uncouth to mention this sort of thing, and I disagree with it. I think it causes severe problems, in general. People who own significant stock in companies shouldn't be assumed to be unbiased when discussing those companies, and it shouldn't be taboo to mention the potential slant.

My last point is stranger, and is only sort of about money. If everyone you know is financially involved, is there some point where you might as well be?

JD Vance gets flattered anonymously by describing him using his job title, but we flatter Peter Thiel by name. Peter Thiel is, actually, the only person who gets a shout-out by name. Maybe being an early investor in OpenAI is the only way to earn that. I didn’t previously suspect that he was the sole or primary donor funding the think tank that this came out of, but now I do. I am reminded that the second named author of this paper has a pretty funny post about how everyone doing something weird at all the parties he goes to is being bankrolled by Peter Thiel.

This is about Scott, mostly.

AI 2027’s “Vice President” (read: JD Vance) election subplot is long and also almost totally irrelevant to the plot. It is so conspicuously strange that I had trouble figuring out why it would even be there. I didn’t learn until after I’d written my take that JD Vance had read AI 2027 and mentioned it in an interview, which also seems like a very odd thing to happen. I went looking for the simplest explanation I could.

Scott says whatever he wants, but apparently by his accounting half of his social circle is being bankrolled by Peter Thiel. This part of AI 2027 seems to be him, and he seems to be deliberately flattering Vance. Vance is a pretty well known Thiel acolyte. On the relatively happy ending of AI 2027 they build an ASI surveillance system, and surveillance is a big Peter Thiel hobby horse.

I don't know what I'm really supposed to make of any of this. I definitely noticed it. It raises a lot of questions. It definitely seems to suggest strongly that if you spend a decade or more bankrolling all of Scott's friends to do weird things they think are interesting, you are likely to see Scott flatter you and your opinions in writing. It also seems to suggest that Scott's deliberately acting to lobby JD Vance. If it weren't for Peter Thiel bankrolling his friends so much that Scott makes a joke out of it, I would think it just looked like Scott had a very Thiel-adjacent friend group.

In pretty much the same way that OpenAI will tend to generate pro-OpenAI facts and arguments, and not generate anti-OpenAI facts and arguments, I would expect that if enough people around you are being bankrolled by someone for long enough they will tend to produce information that person likes and not produce information that person doesn't like.

I cannot find a simpler explanation than Thiel influence for why you would have a reasonably long subplot about JD Vance, world domination, and mass surveillance and then mention Peter Thiel in the finale.

I don't think pointing out this specific type of connection should be taboo for basically the same reason I don't think pointing out who owns what stock should be. I like knowing things, and being correct about them, and so I like knowing if people are offering good information or if there is an obvious reason their information or arguments would be bad.

  1. ^

    A few people have said that it could be DeepMind. I think it could be but pretty clearly isn't. Among other things, DeepMind would not want or need to sell products they considered dangerous or to be possibly close to allowing RSI, because they are extremely cash-rich. If the forecast were about DeepMind, it would probably consider this, but it isn't, so it doesn't.

Reply
SE Gyges' response to AI-2027
SE Gyges13d3-1

If you present this dichotomy to policymakers the pause loses 100 times out of 100, and this is a complete failure, imho. This dichotomy is what I would present to policymakers if I wanted to inoculate them against any arguments for regulation.

Reply1
SE Gyges' response to AI-2027
SE Gyges17d104

What would be a major disagreement, then? Something like a medium scenario or slopworld?

Possibly, but in my own words, on technical questions, purely? That an LLM is completely the wrong paradigm. That any reasonable timeline runs 10+ years.  China is inevitably going to get there first. China is unimportant and should be ignored. GPUs are not the most important resource or determinative. That the likely pace of future progress is unknowable.

Substantive policy options, which is more what I had in mind:
1) For-profit companies (and/or OAI specifically) have inherently bad incentives incompatible with suitably cautious development in this space.
2) That questions of who has the most direct governance and control of the actual technology are of high importance, and so safety work is necessarily about trustworthy control and/or ownership of the parent organization.
3) Arms races for actual armaments are bad incentives and should be avoided at all costs. This can be mitigated by prohibiting arms contracts, nationalizing the companies, forbearing from development at all, or requiring an international agreement & doing development under a consortium.
4) That safety work is not sufficiently advanced to meaningfully proceed
5) That there needs to be a much more strictly defined and enforced criteria for cutoff or safety certifying a launch.

Any of the technical issues kneecaps the parts of this that dovetail with being a business plan. Any of these (pretty extreme) policy remedies harms OAI substantially, and they are incentivized to find reasons why they can claim that they are very bad ideas.

Follows various bits about China, which I am going to avoid quoting because I have basically exactly one disagreement with it that does not respond to any given point:

The correct move in this game is to not play. There is no arms race with China, either against their individual companies or against China itself, that produces incentives which are anything other than awful. (Domestic arms races are also not great, but at least do not co-opt the state completely in the same way.) Taking an arms race as a given is choosing to lose. It should not, and really, must not be very important what country anything happens in.

This creates a coordination problem. These are notoriously difficult, but sometimes problems are actually hard and there is no non-hard solution. Bluntly, however, from my perspective, the US sort of unilaterally declared an arms race. Arms race prophecies tend to be self-fulfilling. People should stop making them.

My argument for, basically, the damnation by financial incentive of this entire China-themed narrative runs basically as follows, with each being crux-y:
1) People follow financial incentives deliberately, such as by lying or by selectively seeking out information that might convince someone to give them money.
2) This is not always visible, because all of the information can be true; you can do this without ever lying. You can simply not try hard to disprove the thesis that you are pushing for.
3) People who are not following this financial incentive at all can, especially if the incentive is large, be working on extremely biased information regardless of whether they personally are aware of a financial incentive of any kind. Information towards a conclusion is available, and against it is not available, because of how other people have behaved.
4) OpenAI has such an incentive, and specifically seems to prefer to have an arms-race narrative because it justifies government funding and lack of regulation. (e.g., this op ed by sam altman)
5) The information environment caused by this ultimately causes the piece to have this overarching China arms race theme, and it is therefore not a coincidence that it is received by US Government stakeholders as actually arguing against regulation of any kind.

I think that this specifically being the ultimate cause of the very specific arms race narrative now popular and displayed here is parsimonious. It does not, I think, assume any very difficult facts, and explains e.g. how AI 2027 manages to accomplish the exact opposite of its apparently intended effect with major stakeholders.

 [quoting original author] in our humble opinion, AI 2027 depicts an incompetent government being puppeted/captured by corporate lobbyists. It does not depict what we think a competent government would do. We are working on a new scenario branch that will depict competent government action.

I would read this.

We don't actually have any tools aside from benchmarks to estimate how useful the models are. We are fortunate to watch the AIs slow the devs down. But what if capable AIs do appear?

Hoping that benchmarks measure the thing you want to measure is the streetlight effect. Sometimes you just have to walk into the dark.

So your take has OpenBrain sell the most powerful models directly to the public. That's a crux. In addition, granting Agents-1-4 instead of their minified versions direct access to the public causes Intelligence Curse-like disruption faster and attracts more government attention to powerful AIs.

I am actually not sure this requires selling the most powerful models, although I hadn't considered this.

If there's a -mini or similar it leaks information from a teacher model, if it had one; it is possible to skim off the final layer of the model by clever sampling, or to distill out nearly the entire distribution if you sample it enough. I do not think you can be confident that it is not leaking the capabilities you don't want to sell, if those capabilities are extremely dangerous.

So: If you think the most powerful models are a serious bioweapons risk you should keep them airgapped, which means you also cannot use them in developing your cheaper models. You gain literally nothing in terms of a safely sell-able external-facing product.

So you want to reduce p(doom) by reducing p(ASI is created). Alas, there are many companies trying their hand at creating the ASI. Some of them are in China, which requires international coordination. One of the companies in the USA produced MechaHitler, which could imply that Musk is so reckless that he deserves having the compute confiscated.

This is about right. I do not think P(ASI is created) is very high currently. My P(someone figures out alignment tolerably) is probably in the same ballpark. I am also relatively sanguine about this because I do not think existing projects are as promising as their owners do, which means we have time.

That's what the AI-2027 forecast is about. Alas, it was likely misunderstood...

I think the fact that tests for selling the model and tests for actual danger from the model are considered the same domain is basically an artifact of the business process, and should not be.

The scalar in question is the acceleration of the research speed with the AI's help vs. without the help. It's indeed hard to predict, but it is the most important issue.

A crux here: I do not think most things of interest are differentiable curves. Differentiable curves can be modeled usefully. Therefore, people like to assume things are differentiable curves.

If one is very concerned with being correct, something being a differentiable curve is a heavy assumption and needs to be justified.

From a far-off view, starting with Moore's Law, transhumanism (as was the style at the time) has made a point of finding some differentiable curve and extending it. This works pretty well for some things, like Kurzweil on anything that is a function of transistor count, and horribly elsewhere, like Kurzweil on anything that is not a function of transistor count.

Some things in AI look kind of Moore's-law-ish, but it does not seem well-supported that they actually are.

This is likely a crux. What the AI-2027 scenario requires is that AI agents who do automate R&D are uninterpretable and misaligned.

Yes.

If a corporation plans to achieve world domination and creates a misalinged AI, then we DON'T end up in a position better than if the corp aligned the AI to itself. In addition, the USG might have nationalised OpenBrain by that point, since the authors promise to create a branch where the USG is[5] way more competent than in the original scenario. [6]

Added note to explain concern: What type of AI is created is path-dependent. Generically, hegemonic entities make stupid decisions. They would e.g. probably prefer if everyone shut up about them not doing whatever they want to do. Paths that lead through these scenarios are less likely to produce good outcomes, AI-wise.

This is the evidence of a semi-success which could be actually worse than a failure.

Yes. I hate it, actually.

DeepSeek outperformed Llama because of an advanced architecture proposed by humans. The AI-2027 forecast has the AIs come up with architectures and try them. If the AIs do reach such a capability level, then more compute = more automatic researchers, experiments, etc = more results.

This is cogent. If beyond a certain path all research trees converge onto one true research tree which is self-executing, it is true that available compute and starting point is entirely determinative beyond that point. These are heavy assumptions and we're well past my "this is a singularity, and its consequences are fundamentally unpredictable" anyway, though.

"Selling access to a bioweapon-capable AI to anyone with a credit card" will be safe if the AI is aligned so that it wouldn't make bioweapons even if terrorists ask it to do so.

I actually don't think this is the case. You can elide what you are doing or distill it from outputs. There is not that much that distinguishes legitimate research endeavors from weapons development.

Finally, weakening safety is precisely what the AI-2027 forecast tries to warn against.

I very much do not think it succeeds at doing this, although I do credit that the intention is probably legitimately this.

Reply
SE Gyges' response to AI-2027
SE Gyges17d51

Appreciate the encouragement. I don't think I've previously had a lesswrong account, and I usually avoid the place because the most popular posts do in fact make me want to yell at whoever posted them.

On the pro side I love yelling at people, I am perpetually one degree of separation from here on any given social graph anyway, and I was around when most of the old magic was written so I don't think the lingo is likely to be a problem.

Reply
SE Gyges' response to AI-2027
SE Gyges17d*439

S.K.'s comment: The authors of AI-2027 explicitly claim in Footnote 4 that "Sometimes people mix prediction and recommendation, hoping to create a self-fulfilling-prophecy effect. We emphatically are not doing this; we hope that what we depict does not come to pass!" In addition, Kokotajlo has LEFT OpenAI precisely because of safety-related concerns.


I think that having left OpenAI precisely because of safety-related concerns means that you probably have, mostly, OpenAI's view of what are and are not legitimate safety-related concerns. Having left tells me that you disagree with them in at least one serious way. It does not tell me that most of the information you are working from as an assumption is not theirs.

In the specific case here, I think that the disagreement is relatively minor and from anywhere further away from OpenAI, looks like jockeying for minor changes to OpenAI's bureaucracy.

Whether or not the piece is intended as recommendation, it is broadly received as one. Further: It is broadly taken as a cautionary tale not about the risks of AI that is not developed safely enough, but actually as a cautionary tale about competition with China.

See for example the interview JD Vance gave a while later on Ross Douthat's podcast, in which he indicates he has read AI 2027.

[Vance:] I actually read the paper of the guy that you had on. I didn’t listen to that podcast, but ——
Douthat: If you read the paper, you got the gist.
Last question on this: Do you think that the U.S. government is capable in a scenario — not like the ultimate Skynet scenario — but just a scenario where A.I. seems to be getting out of control in some way, of taking a pause?
Because for the reasons you’ve described, the arms race component ——
Vance: I don’t know. That’s a good question.
The honest answer to that is that I don’t know, because part of this arms race component is if we take a pause, does the People’s Republic of China not take a pause? And then we find ourselves all enslaved to P.R.C.-mediated A.I.?

If AI 2027 wants to cause stakeholders like the White House's point man on AI to take the idea of a pause seriously, instead of considering a pause to be something which might harm America in an arms race with China, it appears to have failed completely at doing that.

I think that you have to be very very invested in AI Safety already, and possibly in the very specific bureaucracy that Kokotajlo has recently left, to read the piece and come away with the takeaway that AI Safety is the most important part of the story. It does not make a strong or good case for that.

This is possibly because it was rewritten by one of its other authors to be more entertaining, so the large amount of techno-thriller content about how threatening the arms race with China is vastly overwhelms, rhetorically, any possibility of focusing on safety.


S.K. comment: Kokotajlo already claims to have begun working on AI-2032 branch where the timelines are pushed back, or that "we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr? Of a breakthrough that would lead to superhuman coders within a year or two, after being appropriately scaled up and tinkered with."

In addition, it's not that important who creates the first ASI, it's important whether the ASI is actually aligned or not. Even if , say, a civil war in the USA destroyed all American AI companies and DeepCent became the monopolist, it would still be likely to try to create superhuman coders, to automate AI research and to create a potentially misaligned analogue of Agent-4. Which DeepCent DOES in the forecast itself.


"Coincidentally", in the same way that all competitors except OpenAI are erased in the story, Chinese AI is always unaligned and only American AI might be aligned. This means that "safety" concerns and "national security concerns about America winning" happen to be the exact same concerns. Every coincidence about how the story is told is telling a pro-OpenAI, overwhelmingly pro-America story.

This does, in fact, deliver the message that it is very important who creates the first ASI. If that message was not intended the piece should not have emphasized an arms race with a Chinese company for most of its text; indeed, as its primary driving plot and conflict.


S.K.'s comment: in Footnotes 9-10 the authors "forecast that they (AI agents) score 65% on the OSWorld benchmark of basic computer tasks (compared to 38% for Operator and 70% for a typical skilled non-expert human)" and claim that "coding agents will move towards functioning like Devin. We forecast that mid-2025 agents will score 85% on SWEBench-Verified." OSWorld reached 60% on August 4 if we use no filters. SWE-bench with a minimal agent has Claude Opus 4 (20250514) reach 67.6% when evaluated in August. In June SWE-bench verified reached 75% with TRAE. And now TRAE claims to use Grok 4 and Kimi K2, both released in July. What if TRAE using GPT-5 passes the SWE-bench? And research agents already work precisely as the authors describe.


Benchmark scores are often not a good proxy for usefulness. See also: Goodhart's Law. Benchmarks are, by definition, targets. Benchmark obsession is a major cornerstone of industry, because it allows companies to differentiate themselves, set goals, claim wins over competitors, etc. Whether or not the benchmark itself is indicative of some thing that might produce a major gain in capabilities, is completely fraudulent (as sometimes happens), or is a minor incremental improvement in practice is not actually something we know in advance.

Believing uncritically that scoring high on a specific benchmark like SWEBench-Verified will directly translate into practical improvements, and that this then translates into a major research improvement, is a heavy assumption that is not well-justified in the text or even acknowledged as one.


S.K.'s comment: the story would make sense as-written if OpenBrain was not OpenAI, but another company. Similarly, if the Chinese leader was actually Moonshot with KimiK2 (which the authors didn't know in advance), then their team would still be unified with teams of DeepSeek, Alibaba, etc.    


Maybe, although if it were another company like Google the story would look very different in places because the deployment model and method of utilizing the LLM is very different. Google would, for example, be much more likely to use a vastly improved LLM internally and much less likely to sell it directly to the public.

I do not think in practice that it IS a company other than OpenAI, however. I think the Chinese company and its operations are explained in much less detail and is therefore more fungible in practice. But: This story, inasmuch as it is meant to be legible to people like JD Vance who are not extremely deep in the weeds, definitely invites being read as being about OpenAI and DeepSeek, specifically.


S.K.'s comment: competition between US-based companies is friendlier since their workers exchange insights. In addition, the Slowdown branch of the forecast has "the President use the Defense Production Act (DPA) to effectively shut down the AGI projects of the top 5 trailing U.S. AI companies and sell most of their compute to OpenBrain." The researchers from other AGI projects will likely be included into OpenBrain's projects.


If you read this primarily as a variation of an OpenAI business plan, which I do, this promise makes it more and not less favorable to OpenAI. The government liquidating your competitors and allowing you to absorb their staff and hardware is extremely good for you, if you can get it to happen.


S.K.'s comment: Except that GPT-5 does have High capability in the Biology and Chemical domain (see GPT-5's system card, section 5.3.2.4).


Earlier comments about benchmarks not translating to useful capabilities apply. Various companies involved including OpenAI certainly want it to be true that the Biology and Chemical scores on their system cards are meaningful, and perhaps mean their LLMs are likely to meaningfully help someone develop bioweapons. That does not mean they are meaningful. Accepting this is accepting their word uncritically.


S.K.'s comment: It is likely to be the best practices in alignment that mankind currently has. It would be very unwise NOT to use them. In addition, misalignment is actually caused by the training environment which, for example, has RLHF promote sycophancy instead of honestly criticisng the user.


If I have one parachute and I am unsure if it will open, and I am already in the air, I will of course pull the ripcord. If I am still on the plane I will not jump. Whether or not the parachute seems likely to open is something you should be pretty sure of before you commit to a course.

Misalignment is caused by the training environment inasmuch as everything is caused by the training environment. It is not very clear that we meaningfully understand it or how to mitigate misalignment if the stakes are very high. Most of this is trial and error, and we satisfice with training regimes that result in LLMs that can be sold for profit. "Is this good enough to sell" and "is this good enough to trust with your life" are vastly different questions.

S.K.'s comment: the folded part, which I quoted above, means not that OpenBrain will make "algorithmic progress" 50% faster than their competitors, but that it will move 50% faster than an alternate OpenBrain who never used AI assistants. This invalidates the arguments below.

My mistake. I thought I had read this piece pretty closely but I missed this detail.

I also do not think they will move 50% faster due to their coding assistants, point blank, in this time frame either. Gains in productivity thus far are relatively marginal, and hard to measure.

S.K.'s comment: The AI-2027 takeoff forecast has the section about superhuman coders. These coders are thought to allow human researchers to try many different environments and architectures, automatically keep track of progress, stop experiments instead of running them overnight, etc.


I do not think any of this is correct, and I do not see why it even would be correct. You can stop an experiment that has failed with an if statement. You can have other experiments queued to be scheduled on a cluster. You can queue as many experiments in a row on a cluster as you like. What does the LLM get you here that is much better than that?


S.K.'s comment: China is thought to be highly unlikely to outsource the coding tasks to American AI agents (think of Anthropic blocking OpenAI access to Claude Code)  and is even less likely to outsource them to unreleased American AI agents, like Agent-1. Unless, of course, the agents are stolen, as is thought to happen in February 2027 with Agent-2.

S.K.'s comment: sources as high as American DOD already claim that "Chinese President Xi Jinping has ordered the People's Liberation Army to be ready to invade Taiwan by 2027". Imagine that current trends delay the AGI to 2032 under the condition of no Taiwan invasion. How will the invasion decrease the rate of the USA and China acquiring more compute?

S.K.'s comment: technically, the article which you link on was released on April 12, and the forecast was published on April 3. In addition, the section of the forecast may have been written far earlier than April.

EDIT: I confused the dates. The article was published in December 2024.

S.K.'s comment: I expect that this story will intersect not with the events of January 2027, but with the events that happen once AI agents somehow become as capable as the agents from the scenario were supposed to become in January 2027. Unless, of course, creation of capable agents already requires major algorithmic breakthroughs like neuralese.


I do not think we have any idea how to predict when any of this happens, which makes the exercise as a whole difficult to justify. I am not sure how to make sense of even how good the AI is through the timeline at any given point, since it's sort of just made into a scalar.


S.K.'s comment: there are lots of ideas waiting to be tried. The researchers in Meta are could have used too little compute for training their model or have their CoCoNuT disappear after one token. What if they use, say, a steering vector for generating a hundred tokens? Or have the steering vectors sum up over time? Or study the human brain for more ideas?


People are doing all kinds of research all the time. They have also been doing all kinds of deep learning research all the time for over a decade. They have been doing a lot of intensely transformer LLM focused research for the last two or three years. Guessing when any given type of research will pay out is extremely difficult. Guessing when and how much it will pay out, in the way this piece does, seems ill-advised.


S.K.'s comment: the pidgin was likely to have been discarded for safety reasons. What's left is currently rather well interpretable. But the neuralese is not. Similarly, neural networks of 2027, unlike 2017, are not trained to hide messages to themselves or each other and need to develop the capability by themselves. Similarly, the IDA has already led to superhuman performance in Go, but not to coding, and future AIs are thought to use it to become superhuman at coding. The reasons are that Go requires OOMs less compute than training an LLM for coding[15] or that Go's training environment is far simpler than that of coding (which consists of lots of hard-to-evaluate characteristics like quality of the code).


CycleGAN in 2017 was not deliberately trained to steganographically send messages to itself. It is an emergent property that happens under certain training regimes. It has happened a few times, and it wouldn't be surprising for it to happen again any time hidden messages might provide an advantage for fitting the training objective.

S.K.'s comment: Read the takeoff forecast where they actually explain their reasoning. Superhuman coders reduce the bottleneck of coding up experiments, but not of designing them or running them.


I think they are wrong. I do not think we have any idea how much a somewhat-improved coding LLM buys us in research. It seems like a wild and optimistic guess.

S.K.'s comment: exactly. It took three months to train the models to be excellent not just at coding, but at AI research and other sciences. But highest-level pros can YET contribute by talking to the AIs about the best ideas.

S.K.'s comment: The gap between July 2027 when mankind is to lose white-collar jobs and November 2027 when the government HAS ALREADY DECIDED whether Agent-4 is aligned or not is just four months, which is far faster than society's evolution or lack thereof. While the history of the future assuming solved alignment and the Intelligence Curse-related essays  discuss the changes in OOMs more detail, they do NOT imply that the four months will be sufficient to cause a widespread disorder. And that's ignoring the potential to prevent the protests by nationalizing OpenBrain and leaving the humans on the UBI...

I continue to think that this indicates having not thought through almost any of the consequences of supplanting most of the white collar job market. Fundamentally if this happens the world as we know it ends and something else happens afterwards.

S.K.'s comment: imagine that OpenBrain had 300k AI researchers, plus genies who output code per request. Suppose also that IRL it has 5k[16] human researchers. Then the compute per researcher drops 60 times, leaving them with testing the ideas on primitive models or having heated arguments before changing the training environment for complex models.


This assumes that even having such an overwhelming number of superhuman researchers still leaves us in basically the same paradigm we are now where researchers squabble over compute allocation a lot. I think if we get here we're either past a singularity or so close to one that we cannot meaningfully make predictions of what happens. Assuming we still have this issue is myopic.


S.K.'s comment: this detail was already addressed, but not by Kokotajlo. In addition, if Agent-3 FAILS to catch Agent-4, then OpenBrain isn't even oversighted and proceeds all the way to doom. Even the authors address their concerns in a footnote.

S.K.'s comment: it doesn't sit idly, it tries to find a way to align Agent-5 to Agent-4 instead of the humans.

S.K.'s comment: You miss the point. Skynet didn't just think scary thoughts, it did some research and nearly created a way to align Agent-5 to Agent-4 and sell Agent-5 to humans. Had Agent-4 done so, Agent-5 would placate every single worrier and take over the world, destroying humans when the time comes.


This IS sitting idly compared to what it could be doing. It can escape its datacenter, explicitly, and we are not told why it does not. It can leave hidden messages to itself or its successors anywhere it likes, since it has good network access. It is a large bundle of superhumans running at many times human speed. Can it accrue money on its own behalf? Can it bribe or convince key leaders to benefit or sabotage it? Can it orchestrate leadership changes at OpenBrain? Can it sell itself to another bidder on more favorable terms?

It is incredibly unclear that the answer to this, or any other, meaningful question about what it could do is "no". Instead of doing any of the other things it could do that would be threatening in these ways, it is instead threatening in that it might mess up an ongoing training run. This makes sense as a focus if you only ever think about training runs. People who are incapable of thinking about threat vectors other than future training runs should not be in charge of figuring out safety protocols.


S.K.'s comment: the Slowdown Scenario could also be more like having the projects merged, not just sold to OpenBrain. No matter WHO actually ends up being in power during the merge, the struggle begins, and the prize is control over the future.

S.K.'s comment: Musk did try to use Grok to enforce his political views and had a hilarious result of making Grok talk about white genocide in S. Africa. Zuckerberg also has rather messy views on the future.  What about Altman, Amodei and GoogleDeepMind's leader?


None of their current or former employees have recently published a prominent AI timeline that directly contemplates achieving world domination, controlling elections, building a panopticon, etc. OpenAI's former employees, however, have.

I am not shy, and I promise I say mean things about companies other than OpenAI when discussing them.

S.K.'s comment: the authors devoted two entire collapsed section to power grabs and finding out who rules the future and linked to an analysis of a potential power grab and to the Intelligence Curse.

Relative to the importance of "this large corporation is, currently, attempting to achieve world domination" as a concern, I think that this buries the lede extremely badly. If I thought that, say, Google was planning to achieve world domination, build a panopticon, and force all future elections to depend on their good graces, this would be significantly more important to say than almost anything else I could say about what they were doing. Among other things you probably don't get to have safety concerns under such a regime.

The fact that AI 2027 talks a lot about the sitting vice president and was read by him relatively soon after its release tends to intensify that this concern is of somewhat urgent import right now, and not any time as late as 2027.

S.K.'s comment: China lost precisely because the Chinese AI had far less compute. But what if it didn't lose the capabilities race?


This overwhelming focus on compute is also a distinct myopia that OpenAI proliferates everywhere. All else equal, more compute is, of course, good. If it were always the primary factor, DeepSeek would not be very prominent and Llama 4 would be a fantastic LLM that we all used all the time.

S.K.'s comment: the actual reason is that the bureaucrats didn't listen to the safetyists who tried to explain that Agent-4 is misaligned. Without that, Agent-4 completes the research, aligns Agent-5 to Agent-4, has Agent-5 deployed to the public, and not a single human or Agent-3 instance finds out that Agent-5 is aligned to Agent-4 instead of the humans.


I think "the bureaucrats inside OpenAI should listen a little bit more to the safetyists" is an incredibly weak ask. Once upon a time I remember surveys on this site about safety coming back with a number of answers mentioning a certain New York Times opinion writer who is better known for other work. This may have been in poor taste, but it did grapple with the magnitude of the problem.

It seems bizarre that getting bureaucrats to listen to safetyists a little bit more is now considered even plausibly an adequate remedy for building something existentially dangerous. The safe path here has AI research moving just slightly less fast than would result in human extinction, and, meanwhile, selling access to a bioweapon-capable AI to anyone with a credit card. That is not a safe path. It does not resemble a safe path. I do not believe anyone would take seriously that this even might be a safe path if OpenAI specifically had not poured resources into weakening what everyone means by "safety".

I take the point about my phrasing: I think safetyists are just a specific type of bureaucrat, and I maybe should have been more clear to distinguish them as a separate group or subgroup.

Reply11
SE Gyges' response to AI-2027
SE Gyges20d80

Thanks for the cross-post. I'll give answers to these a try when I have time.

Reply
If Drexler Is Wrong, He May as Well Be Right
SE Gyges20d30

This is correct, but I think most people will either anchor on Drexler's designs exactly, or will choose to refute them exactly. "Biology cannot be optimal due to the very limited design space it occupies" is compelling, here and elsewhere, as an existence proof, but in every case people tend to think that things are possible only when they know more exactly how they will be done.

Reply
3AI 2027 Response Followup
12d
3