Heaven banning, where trolls are banished to a fake version of the website filled with bots that pretend to like them, has come to Reddit.
That link is satire, it's a retweet from 2022 of a hypothetical future article in 2024
For anyone wondering about the "Claude boys" thing being fake: it was edited from a real post talking about "coin boys" (ie kids flipping a coin constantly to make decisions). Still pretty funny imo.
Those diffusion regulations were projected by Nvidia to not have a substantive impact on their bottom line in their official financial statement.
I don't think that's true? AFAIK there's no requirement for companies to report material impact on an 8-K form. In a sense, the fact that NVIDIA even filed an 8-K form is a signal that the diffusion rule is significant for their business -- which it obviously is, though it's not clear whether the impact will be substantially material. I think we have to wait for their 10-Q/10-K filings to see what NVIDIA signals to investors, since there I do think it is the case that they'd need to report expected material impact.
Which means, in turn, that you must (for that to make any sense) be using the AI in its non-aligned state to align itself and solve all those other problems
Strongly disagree on this.
The text doesn't imply this at all. "While doing it" doesn't mean you will be using AI, it just means that during the development your team uncovers a lot of corner cases and knowledge and skills needed that weren't available to them before they started, which is how most of the engineering projects are done.
You may have general plan, but it is expected that you will come up with the details as your knowledge of the area extends.
Break time is over, it would seem, now that the new administration is in town.
This week we got r1, DeepSeek’s new reasoning model, which is now my go-to first choice for a large percentage of queries. The claim that this was the most important thing to happen on January 20, 2025 was at least non-crazy. If you read about one thing this week read about that.
We also got the announcement of Stargate, a claimed $500 billion private investment in American AI infrastructure. I will be covering that on its own soon.
Due to time limits I have also pushed coverage of a few things into next week, including this alignment paper, and I still owe my take on Deliberative Alignment.
The Trump administration came out swinging on many fronts with a wide variety of executive orders. For AI, that includes repeal of the Biden Executive Order, although not the new diffusion regulations. It also includes bold moves to push through more energy, including widespread NEPA exemptions, and many important other moves not as related to AI.
It is increasingly a regular feature now to see bold claims of AI wonders, usually involving AGI, coming within the next few years. This week was no exception.
And of course there is lots more.
Table of Contents
Language Models Offer Mundane Utility
Remember that the upgrades are coming. Best think now about how to use them.
All the tested reasoning models successfully reasoned through this ‘170 breaker’ LSAT question (meaning it is predictive of 170+ scores), whereas the non-reasoning ones including Sonnet didn’t. Man the LSAT is a fun test, and also it’s pretty sad that you only need to get about this hard to differentiate even at the top.
Fill out forms related to insurance and the California wildfire, using the memory feature and saving hundreds of hours.
The fact that you can even in theory save hundreds of hours of paperwork is already a rather horrible scandal in the first place. Good to see help is on the way.
Get told correctly to stop being a dumbass and go to the hospital for Rhabdomyolysis.
More o1 prompting advice:
Another satisfied o1-pro customer. If you’re coding ‘for real’ you definitely want it until o3 shows up.
Code without typing, via Voice → text to speech → prompt → code?
This has to be a skill issue, the question is for who. I can’t imagine wanting to talk when one can type, especially for prompting where you want to be precise. Am I bad at talking or are they bad at typing? Then again, I would consider coding on a laptop to be categorically insane yet many successful coders report doing that, too.
Thread summarizing mostly well-known use cases of a Gemini real-time live feed. This does feel like a place we should be experimenting more.
Peter Wildeford will load the podcast transcript into an LLM on his phone before listening, so he can pause the podcast to ask the LLM questions. I notice I haven’t ‘wanted’ to do this, and wonder to what extent that means I’ve been listening to podcasts wrong, including choosing the ‘wrong’ podcasts.
Potential future mundane utility on offer:
This seems like a subset of the general ‘suggested next action’ function for an AI agent or AI agent-chatbot hybrid?
As in, there should be a list of things, that starts out concise and grows over time, of potential next actions that the AI could suggest within-context, that you want to make very easy to do – either because the AI figured out this made sense, or because you told the AI to do it, and where the AI will now take the context and use it to make the necessary steps happen on a distinct platform.
Indeed, it’s not only hard to imagine a future where your emails include buttons and suggestions for automated next steps such as who you should forward information to based on an LLM analysis of the context, it’s low-key hard to imagine that this isn’t already happening now despite it (at least mostly) not already happening now. We already have automatically generated calendar items and things added to your wallet, and this really needs to get extended a lot, pronto.
He also asks this question:
Quick investigation (e.g. asking multiple AIs) says that this is not settled law and various details matter. When I envision the future, it’s hard for me to think that an AI logging a conversation or monitoring communication or being fed information would inherently waive privilege if the service involved gave you an expectation of privacy similar to what you get at the major services now, but the law around such questions often gets completely insane.
Use machine learning (not strictly LLMs) to make every-5-minute predictions of future insulin needs for diabetics, and adjust doses accordingly.
Denis Hassabis is bullish on AI drug discovery. Perhaps way too bullish?
You can accelerate the discovery phase quite a lot, and I think you can have a pretty good idea that you are right, but as many have pointed out the ‘prove to authority figures you are right’ step takes a lot of time and money. It is not clear how much you can speed that up. I think people are sleeping on how much you can still speed it up, but it’s not going to be by a factor of 5-10 without a regulatory revolution.
Language Models Don’t Offer Mundane Utility
Until the upgrades are here, we have to make do with what we have.
Eventually everything you build is a waste, you’ll tell o7 or Claude 5 Sonnet or what not to write a better version of tool and presto. I expect that as agents get better, a well-designed narrow agent built now with future better AI in mind will have a substantial period where it outperforms fully general agents.
The summaries will be returning in a future effort.
Summaries are a great idea, but very much a threshold effect. If they’re not good enough to rely upon, they’re worse than useless. And there are a few thresholds where you get to rely on them for different values of rely. None of them are crossed when you’re outright wrong 30% of the time, which is quite obviously not shippable.
Prompting is important, folks.
If you don’t price by the token, and you end up losing money on $200/month subscriptions, perhaps you have only yourself to blame. They wouldn’t do this if they were paying for marginal inference.
A very reasonable stance to take towards Anthropic:
I do think it is ultimately wrong, though. Yes, for everyone else’s utility, and for strictly maximizing revenue per token now, this would be the play. But maintaining good customer relations, customer ability to count on them and building relationships they can trust, matter more, if compute is indeed limited.
The other weird part is that Anthropic can’t find ways to get more compute.
Timely words of wisdom when understood correctly (also, RIP).
In his honor, I also will not be explaining.
Some people, however, need some explaining. In which case be like Kevin, and ask.
Part of the answer is that I typed the Tweet into r1 to see what the answer would be, and I do think I got a better answer than I’d have gotten otherwise. The other half is the actual answer, which I’ll paraphrase, contract and extend.
Also you have to know how to prompt them to get max value. My guess is this is less true of r1 than others, because with r1 you see the CoT, so you can iterate better and understand your mistakes.
Huh, Upgrades
They’ve tested o3-mini externally for a few weeks, so that’s it for safety testing, and they plan to ship in a few weeks, along with the API at the same time and high rate limits. Altman says it’s worse than o1-pro at most things, but must faster. He teases o3 and even o3 pro, but those are still in the future.
ChatGPT gets a new interface where it will craft custom instructions for you, based on your description of what you want to happen. If you’re reading this, you’re probably too advanced a user to want to use it, even if it’s relatively good.
Google AI Studio has a new mobile experience. In this case even I appreciate it, because of Project Astra. Also it’s highly plausible Studio is the strictly better way to use Gemini and using the default app and website is purely a mistake.
OpenAI gives us GPT-4b, a specialized biology model that figures out proteins that can turn regular cells into stem cells, exceeding the best human based solutions. The model’s intended purpose is to directly aid longevity science company Retro, in which Altman has made $180 million in investments (and those investments and those in fusion are one of the reasons I try so hard to give him benefit of the doubt so often). It is early days, like everything else in AI, but this is huge.
The o1 system card has been updated, and Tyler Johnson offers us a diff. The changes seem to be clear improvements, but given we are already on to o3 I’m not going to go into details on the new version.
Gemini 2.0 Flash Thinking gets an upgrade to 73.3% on AIME and 74.2% on GPQA Diamond, also they join the ‘banned from making graphs’ club oh my lord look at the Y-axis on these, are you serious.
Seems like it’s probably a solid update if you ever had reason not to use r1. It also takes the first position in Arena, for whatever that is worth, but the Arena rankings look increasingly silly, such as having GPT-4o ahead of o1 and Sonnet fully out of the top 10. No sign of r1 in the Arena yet, I’m curious how high it can go but I won’t update much on the outcome.
Pliny jailbroke it in 24 minutes and this was so unsurprising I wasn’t sure I was even supposed to bother pointing it out. Going forward assume he does this every time, and if he ever doesn’t, point this out to me.
Additional Notes on r1
I didn’t notice this on my own, and it might turn out not to be the case, but I know what she thinks she saw and once you see it you can’t unsee it.
Presumably we will know soon enough, as there are various tests you can run.
On writing, there was discussion about whether r1’s writing was ‘good’ versus ‘slop’ but there’s no doubt it was better than one would have expected. Janus and Kalomaze agree that what they did generalized to writing in unexpected ways, but as Janus notes being actually good at writing is high-end-AGI-complete and f***ing difficult.
It would be a very bad sign for out-of-distribution behavior of all kinds if removing the CoT was a disaster. This includes all of alignment and many of the most important operational modes.
Fun With Media Generation
Ethan Mollick generates AI videos of people riding hoverboards at CES without spending much time, skill or money. They look like they were done on green screens.
At this point, if an AI video didn’t have to match particular details and only has to last nine seconds, it’s going to probably be quite good. Those restrictions do matter, but give it time.
Google’s Imagen 3 image model (from 12/16) is on top of Arena for text-to-image by a substantial margin. Note that MidJourney is unranked.
We Tested Older LLMs and Are Framing It As a Failure
This keeps happening.
They tested on GPT-4 Turbo, GPT-4o (this actually did slightly worse than Turbo), Meta’s Llama (3.1-70B, not even 405B) and Google’s Gemini 1.5 Flash (are you kidding me?). I do appreciate that they set the random seed to 42.
Here’s the original source.
The sample questions are things like (I chose this at random) “Was ‘leasing’ present, inferred present, inferred absent or absent for the plity called ‘Funan II’ during the time frame from 540 CE to 640 CE?”
Perplexity said ‘we don’t know’ despite internet access. o1 said ‘No direct evidence exists’ and guessed inferred absent. Claude Sonnet basically said you tripping, this is way too weird and specific and I have no idea and if you press me I’m worried I’d hallucinate.
Their answer is: ‘In an inscription there is mention of the donation of land to a temple, but the conditions seem to imply that the owner retained some kind of right over the land and that only the product was given to the temple: “The land is reserved: the produce is given to the god.’
That’s pretty thin. I agree with Gwern that most historians would have no freaking idea. When I give that explanation to Claude, it says no, that’s not sufficient evidence.
When I tell it this was from a benchmark it says that sounds like a gotcha question, and also it be like ‘why are you calling this Funan II, I have never heard anyone call it Funan II.’ Then I picked another sample question, about whether Egypt had ‘tribute’ around 300 BCE, and Claude said, well, it obviously collected taxes, but would you call it ‘tribute’ that’s not obvious at all, what the hell is this.
Once it realized it was dealing with the Seshat database… it pointed out that this problem is systemic, and using this as an LLM benchmark is pretty terrible. Claude estimates that a historian that knows everything we know except for the classification decisions would probably only get ~60%-65%, it’s that ambiguous.
Deepfaketown and Botpocalypse Soon
Heaven banning, where trolls are banished to a fake version of the website filled with bots that pretend to like them, has come to Reddit.
The New York Times’s Neb Cassman and Gill Fri of course say ‘some think it poses grave ethical questions.’ You know what we call these people who say that? Trolls.
I kid. It actually does raise real ethical questions. It’s a very hostile thing to do, so it needs to be reserved for people who richly deserve it – even if it’s kind of on you if you don’t figure out this is happening.
New York Times runs a post called ‘She is in Love with ChatGPT’ about a 28-year-old with a busy social life who spends hours on end talking to (and having sex with) her ‘A.I. boyfriend.’
Customization is important. There are so many different things in this that make me cringe, but it’s what she wants. And then it kept going, and yes this is actual ChatGPT.
Her husband was fine with all this, outside of finding it cringe. From the description, this was a Babygirl situation. He wasn’t into what she was into, so this addressed that.
Also, it turns out that if you’re worried about OpenAI doing anything about all of this, you can mostly stop worrying?
The descriptions in the post mostly describe actively healthy uses of this modality.
Her only real problem is the context window will end, and it seems the memory feature doesn’t fix this for her.
The longer context window is coming – and there are doubtless ways to de facto ‘export’ the key features of one Leo to the next, with its help of course.
Or someone could, you know, teach her how to use the API. And then tell her about Claude. That might or might not be doing her a favor.
I think this point is fair and important but more wrong than right:
In these cases, you know the AI is manipulating you in some senses, but most users will indeed think they can avoid being manipulated in other senses, and only have it happen in ways they like. Many will be wrong, even at current tech levels, and these are very much no AGIs.
Yes, also there are a lot of people who are very down for being manipulated by AI, or who will happily accept it as the price of what they get in return, at least at first. But I expect the core manipulations to be harder to notice, and more deniable on many scales, and much harder to opt out of or avoid, because AI will be core to key decisions.
They Took Our Jobs
What is the impact of AI on productivity, growth and jobs?
Goldman Sachs rolls out its ‘GS AI assistant’ to 10,000 employees, part of a longer term effort to ‘introduce AI employees.’
Philippe Aghion, Simon Bunel and Xavier Jaravel make the case that AI can increase growth quite a lot while also improving employment. As usual, we’re talking about the short-to-medium term effects of mundane AI systems, and mostly talking about exactly what is already possible now with today’s AIs.
The instinct when hearing that taxonomy will be to underestimate it, since it encourages one to think about going task by task and looking at how much can be automated, then has this silly sounding thing called ‘ideas,’ whereas actually we will develop entirely transformative and new ways of doing things, and radically change the composition of tasks.
But even if before we do any of that, and entirely excluding ‘automation of the production of ideas’ – essentially ruling out anything but substitution of AI for existing labor and capital – look over here.
A one time 25% productivity growth boost isn’t world transforming on its own, but it is already a pretty big deal, and not that similar to Cowen’s 0.5% RDGP growth boost. It would not be a one time boost, because AI and tools to make use of it and our integration of it in ways that boost it will then all grow stronger over time.
Lack of competition seems like a rather foolish objection. There is robust effective competition, complete with 10x reductions in price per year, and essentially free alternatives not that far behind commercial ones. Anything you can do as a customer today at any price, you’ll be able to do two years from now for almost free.
Whereas we’re ruling out quite a lot of upside here, including any shifts in composition, or literal anything other than doing exactly what’s already being done.
Thus I think these estimates, as I discussed previously, are below the actual lower bound – we should be locked into a 1%+ annual growth boost over a decade purely from automation of existing ‘non-idea’ tasks via already existing AI tools plus modest scaffolding and auxiliary tool development.
They then move on to employment, and find the productivity effect induces business expansion, and thus the net employment effects are positive even in areas like accounting, telemarketing and secretarial work. I notice I am skeptical that the effect goes that far. I suspect what is happening is that firms that adapt AI sooner outcompete other firms, so they expand employment, but net employment in that task does not go up. For now, I do think you still get improved employment as this opens up additional jobs and tasks.
Maxwell Tabarrok’s argument last week was centrally that humans will be able to trade because of a limited supply of GPUs, datacenters and megawatts, and (implicitly) that these supplies don’t trade off too much against the inputs to human survival at the margin. Roon responds:
Roon seems to me to be clearly correct here. Comparative advantage potentially buys you some amount of extra time, but that is unlikely to last for long.
He also responds on the Cowen vision of economic growth:
Timothy Lee essentially proposes that we can use Keynes to ensure full employment.
There is a hidden assumption here that ‘humans are alive, in control of the future and can distribute its real resources such that human directed dollars retain real purchasing power and value’ but if that’s not true we have bigger problems. So let’s assume it is true.
Does giving people sufficient amounts of M2 ensure full employment?
The assertion that some people will prefer (some) human-provided services to AI services, ceteris paribus, is doubtless true. That still leaves the problem of both values of some, and the fact that the ceteris are not paribus, and the issue of ‘at what wage.’
There will be very stiff competition, in terms of all of:
Will there be ‘full employment’ in the sense that there will be some wage at which most people would be able, if they wanted it and the law had no minimum wage, to find work? Well, sure, but I see no reason to presume it exceeds the Iron Law of Wages. It also doesn’t mean the employment is meaningful or provides much value.
In the end, the proposal might be not so different from paying people to dig holes, and then paying them to fill those holes up again – if only so someone can lord over you and think ‘haha, sickos, look at them digging holes in exchange for my money.’
So why do we want this ‘full employment’? That question seems underexplored.
After coming in top 20 in Scott Alexander’s yearly forecasting challenge three years in a row, Petre Wildeford says he’s ‘50% sure we’re all going to be unemployed due to technology within 10 years.’
Many others are, of course, skeptical.
Even if we don’t economically need to work or think, we will want to anyway.
Also I don’t think it’s true that anything AI can teach is something you no longer need to know. There are many component skills that are useful to know, that the AI knows, but which only work well as complements to other skills the AI doesn’t yet know – which can include physical skill. Or topics can be foundations for other things. So I both agree with Karpathy that we will want to learn things anyway, and also disagree with Roon’s implied claim that it means we don’t benefit from it economically.
Anthropic CEO Dario Amodei predicts that we are 2-3 years away from AI being better than humans at almost everything, including solving robotics.
Dario says something truly bizarre here, that the only good part is that ‘we’re all in the same boat’ and he’d be worried if 30% of human labor was obsolete and not the other 70%. This is very much the exact opposite of my instinct.
Let’s say 30% of current tasks got fully automated by 2030 (counting time to adapt the new tech), and now have marginal cost $0, but the other 70% of current tasks do not, and don’t change, and then it stops. We can now do a lot more of that 30% and other things in that section of task space, and thus are vastly richer. Yes, 30% of current jobs go away, but 70% of potential new tasks now need a human.
So now all the economist arguments for optimism fully apply. Maybe we coordinate to move to a 4-day work week. We can do temporary extended generous unemployment to those formerly in the automated professions during the adjustment period, but I’d expect to be back down to roughly full employment by 2035. Yes, there is a shuffling of relative status, but so what? I am not afraid of the ‘class war’ Dario is worried about. If necessary we can do some form of extended kabuki and fake jobs program, and we’re no worse off than before the automation.
Daniel Eth predicts the job guarantee and makework solution, expecting society will not accept UBI, but notes the makework might be positive things like extra childcare, competitive sports or art, and this could be like a kind of summer camp world. It’s a cool science fiction premise, and I can imagine versions of this that are actually good. Richard Ngo calls a version of this type of social dynamics the ‘extracurricular world.’
Also, this isn’t, as he calls it, ‘picking one person in three and telling them they are useless.’ We are telling them that their current job no longer exists. But there’s still plenty of other things to do, and ways to be.
The 100% replacement case is the scary one. We are all in the same boat, and there’s tons of upside there, but that boat is also in a lot trouble, even if we don’t get any kind of takeoff, loss of control or existential risk.
Get Involved
Dan Hendrycks will hold a Spring 2025 session of Center for AI Safety’s Safety, Ethics and Society course from February 9 – May 9, more information here, application here. There is also a 12-week online course available for free.
Philosophy Post-Doc available in Hong Kong for an AI Welfare position, deadline January 31, starts in September 2025.
Anthropic is hiring for Frontier Model Red Teaming, in Cyber, CBRN, RSP Evaluations, Autonomy and Research Team Lead.
Introducing
CAIS and Scale AI give us Humanity’s Last Exam, intended as an extra challenging benchmark. Early results indicate that yes this looks difficult. New York Times has a writeup here.
The reasoning models are crushing it, and r1 being ahead of o1 is interesting although its subset might be easier so I’d be curious to see everyone else’s non-multimodal score, and have asked them.
It turns out last week’s paper about LLM medical diagnosis not only shared its code, it is now effectively a new benchmark, CRAFT-MD. They haven’t run it on Claude or full o1 (let alone o1 pro or o3 mini) but they did run on o1-mini and o1-preview.
o1 improves conversation all three scores quite a lot, but is less impressive on Vignette (and oddly o1-mini is ahead of o1-preview there). If you go with multiple choice instead, you do see improvement everywhere, with o1-preview improving to 93% on vignettes from 82% for GPT-4.
This seems like a solid benchmark. What is clear is that this is following the usual pattern and showing rapid improvement along the s-curve. Are we ‘there yet’? No, given that human doctors would presumably would be 90%+ here. But we are not so far away from that. If you think that the 2028 AIs won’t match human baseline here, I am curious why you would think that, and my presumption is it won’t take that long.
Kimi k1.5, a Chinese multi-modal model making bold claims. One comment claims ‘very strong search capabilities’ with ability to parse 100+ websites at one go.
As usual, I don’t put much trust in benchmarks except as an upper bound, especially from sources that haven’t proven themselves reliable on that. So I will await practical reports, if it is all that then we will know. For now I’m going to save my new model experimentation time budget for DeepSeek v3 and r1.
We Had a Deal
The FronterMath benchmark was funded by OpenAI, a fact that was not to our knowledge disclosed by Epoch AI until December 20 as per an NDA they signed with OpenAI.
In a statement to me, Epoch confirms what happened, including exactly what was and was not shared with OpenAI when.
That level of access is much better than full access, there is a substantial holdout, but it definitely gives OpenAI an advantage. Other labs will be allowed to use the benchmark, but being able to mostly run it yourself as often as you like is very different from being able to get Epoch to check for you.
Here is the original full statement where we found out about this, and Tamay from Epoch’s full response.
OpenAI is up to its old tricks again. You make a deal to disclose something to us and for us to pay you, you agree not to disclose that you did that, you let everyone believe otherwise until a later date. They ‘verbally agree’ also known as pinky promise not to use the data in model training, and presumably they still hill climb on the results.
General response to Tamay’s statement was, correctly, to not be satisfied with it.
There is debate on where this falls from ‘not wonderful but whatever’ to giant red flag.
The most emphatic bear case was from the obvious source.
The more measured bull takes is at most we can trust this to the extent we trust OpenAI, which is, hey, stop laughing.
The bull case that this is no big deal is, essentially, that OpenAI might have had the ability to target or even cheat the test, but they wouldn’t do that, and there wouldn’t have been much point anyway, we’ll all know the truth soon enough.
For example, here’s Daniel Litt, who wrote one of the FrontierMath questions, whose experience was positive and that does not feel misled.
Then there’s the different third thing case, which I assume is too clever by half:
My strong supposition is that OpenAI did all of this because that is who they are and this is what they by default do, not because of any specific plan. They entered into a deal they shouldn’t have, and made that deal confidential to hide it. I believe this was because that is what OpenAI does for all data vendors. It never occured to anyone involved on their side that there might be an issue with this, and Epoch was unwilling to negotiate hard enough to stop it from happening. And as we’ve seen with the o1 system card, this is not an area where OpenAI cares much about accuracy.
In Other AI News
It’s pretty weird that a16z funds raised after their successful 2009 fund have underperformed the S&P for a long time, given they’ve been betting on tech, crypto and AI, and also the high quality of their available dealflow. It’s almost like they transitioned away from writing carefully chosen small checks to chasing deals and market share, and are now primary a hype machine and political operation that doesn’t pay much attention to physical reality or whether their investments are in real things, or whether their claims are true, and their ‘don’t care about price’ philosophy on investments is not so great for returns. It also doesn’t seem all that consistent with Marc’s description of his distributions of returns in On the Edge.
Dan Grey speculates that this was a matter of timing, and was perhaps even by design. If you can grow your funds and collect fees, so what if returns aren’t that great? Isn’t that the business you’re in? And to be fair, 10% yearly returns aren’t obviously a bad result even if the S&P did better – if, that is, they’re not correlated to the S&P. Zero beta returns are valuable. But I doubt that is what is happening here, especially given crypto has behaved quite a lot like three tech stocks in a trenchcoat.
Democratic Senators Warren and Bennet send Sam Altman a letter accusing him of contributing $1 million to the Trump inauguration fund in order to ‘cozy up’ to the incoming Trump administration, and cite a pattern of other horrible no-good Big Tech companies (Amazon, Apple, Google, Meta, Microsoft and… Uber?) doing the same, all contributing the same $1 million, along with the list of sins each supposedly committed. So they ‘demand answers’ for:
In addition to the part where the questions actually make zero sense given this was a personal contribution… I’m sorry, what the actual f*** do they think they are doing here? How can they possibly think these are questions they are entitled to ask?
What are they going to say now when let’s say Senators Cruz and Lee send a similar letter to every company that does anything friendly to Democrats?
I mean, obviously, anyone can send anyone they want a crazy ass letter. It’s a free country. But my lord the decision to actually send it, and feel entitled to a response.
Sam Altman has scheduled a closed-door briefing for U.S. Government officials on January 30. I don’t buy that this is evidence of any technological advances we do not already know. Of course with a new administration, a new Congress and the imminent release of o3, the government should get a briefing. It is some small good news that the government is indeed being briefed.
There is distinctly buzz about OpenAI staff saying they have ‘a big breakthrough on PhD level SuperAgents’ but we’ll have to wait and see about that.
Mira Mutari’s AI startup makes its first hires, poaching from various big labs. So far, we do not know what they are up to.
Reid Hoffman and Greg Beato write a book: ‘Superagency: What Could Possibly Go Right With Our AI Future.’ Doubtless there are people who need to read such a book, and others who need to read the opposite book about what could possibly go wrong. Most people would benefit from both. My heuristic is: If it’s worth reading, Tyler Cowen will report that he has increased his estimates of future RGDP growth.
A good summary of New York Times coverage of AI capabilities would indeed be ‘frequently doubts that in the future we will get to the place we already are,’ oh look the byline is Cate Metz again.
Alas, this is what most people, most otherwise educated people, and also most economists think. Which explains a lot.
Allow me to swap out ‘many’ for ‘most.’
If you have not come to terms with this fact, then that is a ‘you’ problem.
Although, to be fair, that bar is actually rather high. You have to know what terminal to type into and to be curious enough to do it.
Every time I have a debate over future economic growth from AI or other AI impacts, the baseline assumption is exactly that narrowest of bullseyes. The entire discussion takes as a given that AI frontier model capabilities will stop where they are today, and we only get the effects of things that have already happened. Or at most, they posit a small number of specific future narrow mundane capabilities, but don’t generalize. Then people still don’t get how wild even that scenario would be.
A paper proposes various forms of AI agent infrastructure, which would be technical systems and shared protocols external to the agent that shape how the agent interacts with the world. We will increasingly need good versions of this.
There are those who think various versions of this:
There are obvious reasons why revenue is the hardest metric to fake. That makes it highly useful. But it is very much a lagging indicator. If you wait for the revenue to show up, you will be deeply late to all the parties. And in many cases, what is happening is not reflected in revenue. DeepSeek is an open model being served for free. Most who use ChatGPT or Claude are either paying $0 and getting a lot, or paying $20 and getting a lot more than that. And the future is highly unevenly distributed – at least for now.
I’m more sympathetic to Samo’s position. You cannot trust benchmarks to tell you whether the AI is of practical use, or what you actually have. But looking for whether you can do practical tasks is looking at how much people have applied something, rather than what it is capable of doing. You would not want to dismiss a 13-year-old, or many early stage startup for that matter, for being pre-revenue or not yet having a product that helps in your practical tasks. You definitely don’t want to judge an intelligence purely that way.
What I think you have to do is to look at the inputs and outputs, pay attention, and figure out what kind of thing you are dealing with based on the details.
A new paper introduces the ‘Photo Big 5,’ claiming to be able to extract Big 5 personality features from a photograph of a face and then use this to predict labor market success among MBAs, in excess of any typical ‘beauty premium.’
There are any number of ways the causations involved could be going, and our source was not shall we say impressed with the quality of this study and I’m too swamped this week to dig into it, but AI is going to be finding more and more of this type of correlation over time.
Suppose you were to take an AI, and train it on a variety of data, including photos and other things, and then it is a black box that spits out a predictive score. I bet that you could make that a pretty good score, and also that if we could break down the de facto causal reasoning causing that score we would hate it.
The standard approach to this is to create protected categories – race, age, sex, orientation and so on, and say you can’t discriminate based on them, and then perhaps (see: EU AI Act) say you have to ensure your AI isn’t ‘discriminating’ on that basis either, however they choose to measure that, which could mean enforcing discrimination to ensure equality of outcomes or it might not.
But no matter what is on your list of things there, the AI will pick up on other things, and also keep doing its best to find proxies for the things you are ordering it not to notice, which you can correct for but that introduces its own issues.
A key question to wonder about is, which of these things happens:
Whistling in the Dark
And these are the words that they faintly said as I tried to call for help.
Or, we now need a distinct section for people shouting ‘AGI’ from the rooftops.
Will Bryk, CEO of Exa, continues to believe those at the labs, and thus believes we have a compute-constrained straight shot to AGI for all definitions of AGI.
The first thing to do is to find out what things to do.
Intelligence is knowing which questions are worth answering, and also answering the questions. Agency is getting off your ass and implementing the answers.
If we give everyone cheap access to magic lamps with perfectly obedient and benevolent genies happy to do your bidding and that can answer questions about as well as anyone has ever answered them (aka AGI), who benefits? Let’s give Lars the whole ‘perfectly benevolent’ thing in fully nice idealized form and set all the related questions aside to see what happens.
And then there are those who… have a different opinion. Like Gerard here.
What can I say. We get letters.
Quiet Speculations
Yes, a lot of people are saying AGI Real Soon Now, but also we interrupt this post to bring you an important message to calm the **** down, everyone.
I adjusted my expectations a little bit on this Tweet, but I am presuming I was not in the group who needed an OOM expectation adjustment.
So what should we make of all the rumblings from technical staff at OpenAI?
Janus believes we should, on the margin, pay essentially no attention.
You can see how Claude’s Tweets would cause one to lean forward in chair in a way that the actual vague posts don’t.
Sentinel says forecasters predict a 50% chance OpenAI will get to 50% on frontier math by the end of 2025, and a 1 in 6 chance that 75% will be reached, and only a 4% chance that 90% will be reached. These numbers seem too low to me, but not crazy, because as I understand it Frontier Math is a sectioned test, with different classes of problem. So it’s more like several benchmarks combined in one, and while o4 will saturate the first one, that doesn’t get you to 50% on its own.
Lars Doucet argues that this means no one doing the things genies can do has a moat, so ‘capability-havers’ gain the most rather than owners of capital.
There’s an implied ‘no asking the genie to build a better genie’ here but you’re also not allowed to wish for more wishes so this is traditional.
The question then is, what are the complements to genies? What are the valuable scarce inputs? As Lars says, capital, including in the form of real resources and land and so on, are obvious complements.
What Lars argues is even more of a complement are what he calls ‘capability-havers,’ those that still have importantly skilled labor, through some combination of intelligence, skills and knowing to ask the genies what questions to ask the genies and so on. The question then is, are those resources importantly scarce? Even if you could use that to enter a now perfectly competitive market with no moat because everyone has the same genies, why would you enter a perfectly competitive market with no moat? What does that profit a man?
A small number of people, who have a decisive advantage in some fashion that makes their capabilities scarce inputs, would perhaps become valuable – again, assuming AI capabilities stall out such that anyone retains such a status for long. But that’s not something that works for the masses. Most people would not have such resources. They would either have to fall back on physical skills, or their labor would not be worth much. So they wouldn’t have a way to get ahead in relative terms, although it wouldn’t take much redistribution for them to be fine in absolute terms.
And what about the ‘no moat’ assumption Lars makes, as a way to describe what happens when you fire your engineers? That’s not the only moat. Moats can take the form of data, of reputation, of relationships with customers or suppliers or distributors, of other access to physical inputs, of experience and expertise, of regulatory capture, of economies of scale and so on.
Then there’s the fact that in real life, you actually can tell the future metaphorical genies to make you better metaphorical genies.
Where we’re going, will you need money?
Suchir’s Last Post
An unpublished draft post from the late Suchir Balaji, formerly of OpenAI, saying that ‘in the long run only the fundamentals matter.’ That doesn’t tell you what matters, since it forces you to ask what the fundamentals are. So that’s what the rest of the post is about, and it’s interesting throughout.
He makes the interesting claim that intelligence is data efficiency, and rate of improvement, not your level of capabilities. I see what he’s going for here, but I think this doesn’t properly frame what happens if we expand our available compute or data, or become able to generate new synthetic data, or be able to learn on our own without outside data.
In theory, suppose you take a top level human brain, upload it, then give it unlimited memory and no decay over time, and otherwise leave it to contemplate whatever it wants for unlimited subjective time, but without the ability to get more outside data. You’ll suddenly see it able to be a lot more ‘data efficient,’ generating tons of new capabilities, and afterwards it will act more intelligent on essentially any measure.
I agree with his claims that human intelligence is general, and that intelligence does not need to be embodied or multimodal, and also that going for pure outer optimization loops is not the best available approach (of course given enough resources it would eventually work), or that scale is fully all you need with no other problems to solve. On his 4th claim, that we are better off building an AGI patterned after the human brain, I think it’s both not well-defined and also centrally unclear.
Modeling Lower Bound Economic Growth From AI
We have another analysis of potential economic growth from AI. This one is very long and detailed, and I appreciated many of the details of where they expect bottlenecks.
I especially appreciated the idea that perhaps compute is the central bottleneck for frontier AI research. If that is true, then having better AIs to automate various tasks does not help you much, because the tasks you can automate were not eating so much of your compute. They only help if AI provides more intelligence that better selects compute tasks, which is a higher bar to clear, but my presumption is that researcher time and skill is also a limiting factor, in the sense that a smarter research team with more time and skill can be more efficient in its compute use (see DeepSeek).
Maximizing the efficiency of ‘which shots to take’ in AI would have a cap on how much a speedup it could get us, if that’s all that the new intelligence could do, the same way that it would in drug development – you then need to actually run the experiments. But I think people dramatically underestimate how big a win it would be to actually choose the right experiments, and implement them well from the start.
If their model is true, it also suggests that frontier labs with strong capital access should not be releasing models and doing inference for customers, unless they can use that revenue to buy more compute than they could otherwise. Put it all back into research, except for what is necessary for recruitment and raising capital. The correct business model is then to win the future. Every 4X strategy gamer knows what to do. Obviously I’d much rather the labs all focus on providing us mundane utility, but I call it like I see it.
Their vision of robotics is that it is bottlenecked on data for them to know how to act. This implies that if we can get computers capable of sufficiently accurately simulating the data, robotics would greatly accelerate, and also that once robots are good enough to collect their own data at scale things should accelerate quickly, and also that data efficiency advancing will be a huge deal.
Their overall conclusion is we should get 3% to 9% higher growth rates over the next 20 years. They call this ‘transformative but not explosive,’ which seems fair. I see this level of estimate as defensible, if you make various ‘economic normal’ assumptions and also presume that we won’t get to scale to true (and in-context reasonably priced) ASI within this period. As I’ve noted elsewhere, magnitude matters, and defending 5%/year is much more reasonable than 0.5%/year. Such scenarios are plausible.
Here’s another form of studying the lower bound via a new paper on Artificial Intelligence Asset Pricing Models:
I don’t have the time to evaluate these specific claims, but one should expect AI to dramatically improve our ability to cheaply and accurately price a wide variety of assets. If we do get much better asset pricing, what does that do to RGDP?
r1 says:
Claude says:
o1 and GPT-4o have lower estimates, with o1 saying ~0.2% RGDP growth per year.
I’m inclined to go with the relatively low estimates. That’s still rather impressive from this effect alone, especially compared to claims that the overall impact of AI might be of similar magnitude. Or is the skeptical economic claim that essentially ‘AI enables better asset pricing’ covers most of what AI is meaningfully doing? That’s not a snark question, I can see that claim being made even though it’s super weird.
The Quest for Sane Regulations
The Biden Executive Order has been revoked. As noted previously, revoking the order does not automatically undo implementation of the rules contained within it. The part that matters most is the compute threshold. Unfortunately, I have now seen multiple claims that the compute threshold reporting requirement is exactly the part that won’t survive, because the rest was already implemented, but somehow this part wasn’t. If that ends up being the case we will need state-level action that much more, and I will consider the case for ‘let the Federal Government handle it’ definitively tested and found incorrect.
Those diffusion regulations were projected by Nvidia to not have a substantive impact on their bottom line in their official financial statement.
The new Trump Executive Orders seem to have in large part been written by ChatGPT.
I’m all for that, if and only if you do a decent job of it. Whereas Futurism not only reports further accusations that AI was used, they accuse the administration of ‘poor, slipshod work.’
The errors pointed out certainly sound stupid, but there were quite a lot of executive orders, so I don’t know the baseline rate of things that would look stupid, and whether these orders were unusually poorly drafted. Even if they were, I would presume that not using ChatGPT would have made them worse rather than better.
In effectively an exit interview, former NSA advisor Jake Sullivan warned of the dangers of AI, framing it as a national security issue of America versus China and the risks of having such a technology in private hands that will somehow have to ‘join forces with’ the government in a ‘new model of relationship.’
Sullivan mentions potential ‘catastrophe’ but this is framed entirely in terms of bad actors. Beyond that all he says is ‘I personally am not an AI doomer’ which is a ‘but you have heard of me’ moment and also implies he thought this was an open question. Based on the current climate of discussion, if such folks do have their eye on the correct balls on existential risk, they (alas) have strong incentives not to reveal this. So we cannot be sure, and of course he’s no longer in power, but it doesn’t look good.
The article mentions Andreessen’s shall we say highly bold accusations against the Biden administration on AI. Sullivan also mentions that he had a conversation with Andreessen about this, and does the polite version of essentially calling Andreessen a liar, liar, pants on fire.
Dean Ball covers the new diffusion regulations, which for now remain in place. In many ways I agree with his assessments, especially the view that if we’re going to do this, we might as well do it so it could work, which is what this is, however complicated and expensive it might get – and that if there’s a better way, we don’t know about it, but we’re listening.
My disagreements are mostly about ‘what this is betting on’ as I see broader benefits and thus a looser set of necessary assumptions for this to be worthwhile. See the discussion last week. I also think he greatly overestimates the risk of this hurting our position in chip manufacturing, since we will still have enough demand to meet supply indefinitely and China and others were already pushing hard to compete, but it is of course an effect.
Call for an intense government effort for AI alignment, with conservative framing.
It could happen.
Right now, AGI doesn’t exist, so it isn’t doing any persuasion, and it also is not providing any value. If both these things changed, opinion could change rather quickly. Or it might not, especially if it’s only relatively unimpressive AGI. But if we go all the way to ASI (superintelligence) then it will by default rapidly become very popular.
And why shouldn’t it? Either it will be making life way better and we have things under control in the relevant senses, in which case what’s not to love. Or we don’t have things under control in the relevant senses, in which case we will be convinced.
The Week in Audio
OpenAI’s Brad Lightcap says AI models have caused ‘multiple single-digit’ gains in productivity for coding with more progress this year. That’s a very dramatic speedup.
There’s a new Epoch podcast, first episode is about expectations for 2030.
Geoffrey Hinton interview, including his summary of recent research as saying AIs can be deliberately deceptive and act differently on training data versus deployment.
David Dalrymple goes on FLI. I continue to wish him luck and notice he’s super sharp, while continuing to not understand how any of this has a chance of working.
Larry Ellison of Oracle promises AI will design mRNA vaccines for every individual person against cancer and make them robotically in 48 hours, says ‘this is the promise of AI.’
This very much is not ‘the promise of AI,’ even if true. If the AI is capable of creating personalized vaccines against cancer on demand, it is capable of so much more.
Is it true? I don’t think it is an absurd future. There are three things that have to happen here, essentially.
The first two obstacles seem highly solvable down the line? These are technical problems that should have technical solutions. The 48 hours is probably Larry riffing off the fact that Moderna designed their vaccine within 48 hours, so it’s probably a meaningless number, but sure why not, sounds like a thing one could physically do.
That brings us to the third issue. We’d need to either do that via ‘the FDA approves the general approach and then the individual customized versions are automatically approved,’ which seems hard but not impossible, or ‘who cares it is a vaccine for cancer I will travel or use the gray market to get it until the government changes its procedures.’
That also seems reasonable? Imagine it is 2035. You can get a customized 100% effective vaccine against cancer, but you have to travel to Prospera (let’s say) to get it. It costs let’s say $100,000. Are you getting on that flight? I am getting on that flight.
Larry Ellison also says ‘citizens will be on their best behavior because we are recording everything that is going on’ plus an AI surveillance system, with any problems detected ‘reported to the appropriate authority.’ There is quite the ‘missing mood’ in the clip. This is very much one of those ‘be careful exactly how much friction you remove’ situations – I didn’t love putting cameras everywhere even when you had to have a human intentionally check them. If the The Machine from Person of Interest is getting the feeds, except with a different mandate, well, whoops.
Rhetorical Innovation
A fine warning from DeepMind CEO Demis Hassabis:
We are definitely not doing what he suggests.
How much should we be willing to pay to prevent AI existential risk, given our willingness to pay 4% of GDP (and arguably quite a lot more than that) to mitigate Covid?
Well, that depends on if you think spending the money reduces AI existential risk. That requires both:
Many argue with #1 and also #2.
Paul Schrader, author of Taxi Driver, has his ‘feel the AGI’ moment when he asked the AI for Paul Schrader script ideas and the AI’s were better than his own, and in five seconds it gave him notes as good or better than he’s ever received from a from a film executive.
Professional coders should be having it now, I’d think. Certainly using Cursor very much drove that home for me. AI doesn’t accelerate my writing much, although it is often helpful in parsing papers and helping me think through things. But it’s a huge multiplier on my coding, like more than 10x.
Has successful alignment of AIs prevented any at-scale harms to people, as opposed to harm to corporate guidelines and reputations? As opposed to there being little harm because of insufficient capabilities.
I am wondering about o3 (not o3-mini, only the full o3) as well.
Holly Elmore makes the case that safety evals currently are actively counterproductive. Everyone hears how awesome your model is, since ability to be dangerous is very similar to being generally capable, then there are no consequences and anyone who raises concerns gets called alarmist. And then the evals people tell everyone else we have to be nice to the AI labs so they don’t lose access. I don’t agree and think evals are net good actually, but I think the argument can be made.
So I want to make it clear: This kind of talk, from Dario, from the policy team and now from the recruitment department, makes it very difficult for me to give Anthropic the benefit of the doubt, despite knowing how great so many of the people there are as they work on solving our biggest problems. And I think the talk, in and of itself, has major negative consequences.
If the response is ‘yes we know you don’t like it and there are downsides but strategically it is worth doing this, punishing us for this is against your interests’ my response is that I do not believe you have solved for the decision theory properly. Perhaps you are right that you’re supposed to do this and take the consequences, but you definitely haven’t justified it sufficiently that I’m supposed to let you off the hook and take away the incentive not to do it or have done it.
A good question:
This was a particular real case, in which most obvious things sound like they have been tried. What about the general case? We are going to encounter this issue more and more. I too feel like I could usefully talk such people off their ledge often if I had the time, but that strategy doesn’t scale, likely not even to one victim of this.
Cry Havoc
Shame on those who explicitly call for a full-on race to AGI and beyond, as if the primary danger is that the wrong person will get it first.
In the Get Involved section I linked to some job openings at Anthropic. What I didn’t link to there is Logan Graham deploying jingoist language in pursuit of that, saying ‘AGI is a national security issue’ and therefore not ‘so we should consider not building it then’ but rather we should ‘push models to their limits and get an extra 1-2 year advantage.’ He clarified what he meant here, to get a fast OODA loop to defend against AI risks and get the benefits, but I don’t see how that makes it better?
Way more shame on those who explicitly use the language of a war.
The actual suggestions I would summarize as:
When you move past the jingoism, the first four actual suggestions are good here.
The fifth suggestion is the usual completely counterproductive and unworkable ‘use-case-based’ approach to AI safety regulation.
That approach has a 0% chance of working, it is almost entirely counterproductive, please stop.
It is a way of saying ‘do not regulate the creation or deployment of things smarter and more capable than humans, instead create barriers to using them for certain specific purposes’ as if that is going to help much. If all you’re worried about is ‘an AI might accidentally practice medicine or discriminate while evaluating job applications’ or something, then sure, go ahead and use an EU-style approach.
But that’s not what we should be worried about when it comes to safety. If you say people can create, generally deploy and even make available the weights of smarter-than-human, capable-of-outcompeting-human future AIs, you think telling them to pass certain tests before being deployed for specific purposes is going to protect us? Do you expect to feel in charge? Or do you expect that this would even in practice be possible, since the humans can always call the AI up on their computer either way?
Meanwhile, calling for a ‘sector-specific, use-case-based’ regulatory approach is exactly calling upon every special interest to fight for various barriers to using AI to make our lives better, the loading on of everything bagel requirements and ‘ethical’ concerns, and especially to prevent automation and actual productivity improvements.
Can we please stop it with this disingenuous clown car.
Aligning a Smarter Than Human Intelligence is Difficult
I definitely agree you do not want a full Literal Genie for obvious MIRI-style reasons. You want a smarter design than that, if you go down that road. But going full ‘set it free’ on the flip side also means you very much get only one chance to get this right on every level, including inter-angel competitive dynamics. By construction this is a loss of control scenario.
(It also happens to be funny that rule one of ‘summon an angel and let it be free’ is to remember that for most versions of ‘angels’ including the one in the Old Testament, I do not like your chances if you do this, and I do not think this is a coincidence.)
Janus notices a potential issue with Chain of Thought, including in humans.
One must be careful how one takes feedback on a truthseeking or creative process, and also what things you keep or do not keep in your context window. The correct answer is definitely not to discard all of it, in either case.
You can of course fix the o1 problem by starting new conversations or in the API editing the transcript, but you shouldn’t have to.
Janus also makes this mostly astute observation, especially given his other beliefs:
For now, the quest most people are on seems to be, well, if we’re facing a relatively hard problem we all know we’re dead, but can we at least make it so if we face an easy problem we might actually not be dead?
Whereas Eliezer Yudkowsky (for a central example) is confident we’re not facing an easy problem on that scale, so he doesn’t see much point in that approach.
Team Virtue Ethics remembers John McCain and welcomes Seb Krier and potentially Jan Kulveit.
Joshua Clymer thread and post about testing models (or humans) for their potential capabilities under fine-tuning or scaffolding, and checking for sandbagging. It’s interesting the extent to which this is ‘written in a different language’ than mine, in ways that make me have to do something akin to translation to grok the claims, which mostly seemed right once I did that. I do notice however that this seems like a highly insufficient amount of concern about sandbagging.
Call me old fashioned, but if I see the model sandbagging, it’s not time to fine tune to remove the sandbagging. It’s time to halt and catch fire until you know how that happened, and you absolutely do not proceed with that same model. It’s not that you’re worried about what it was hiding from you, it’s that it was hiding anything from you at all. Doing narrow fine-tuning until the visible issue goes away is exactly how you get everyone killed.
People Strongly Dislike AI
It seems that the more they know about AI, the less they like it?
Or, in the parlance of academia: Lower Artificial Intelligence Literacy Predicts Greater AI Receptivity.
If their reasoning is true, this bodes very badly for AI’s future popularity, unless AI gets into the persuasion game on its own behalf.
Game developers strongly dislike AI, and it’s getting worse.
I find most of those concerns silly in this context, with the only ‘real’ one being the quality of the AI-generated content. And if the quality is bad, you can simply not use it where it is bad, or play games that use it badly. It’s another tool on your belt. What they don’t point to there is employment and competition.
Either way, the dislike is very real, and growing, and I would expect it to grow further.
People Are Worried About AI Killing Everyone
If we did slow down AI development, say because you are OpenAI and only plan is rather similar to ‘binding a demon on the first try,’ it is highly valid to ask what one would do with the time you bought.
I have seen three plausible responses.
Here’s the first one, human intelligence augmentation:
I would draw a distinction between current AI as an amplifier of capabilities, which it definitely is big time, and as a way to augment our intelligence level, which it mostly isn’t. It provides various speed-ups and automations of tasks, and all this is very helpful and will on its own transform the economy. But wherever you go, there you still are, in terms of your intelligence level, and AI mostly can’t fix that. I think of AIs on this scale as well – I centrally see o1 as a way to get a lot more out of a limited pool of ‘raw G’ by using more inference, but its abilities cap out where that trick stops working.
The second answer is ‘until we know how to do it safely,’ which makes Roon’s objection highly relevant – how do you plan to figure that one out if we give you more time? Do you think you can make that much progress on that task using today’s level of AI? These are good questions.
The third answer is ‘I don’t know, we can try the first two or something else, but if you don’t have the answer then don’t let anyone f***ing build it. Because otherwise we die.’
Other People Are Not As Worried About AI Killing Everyone
Questions where you’d think the answer was obvious, and you’d be wrong.
Obviously all of this is high bait but that only works if people take it.
In all seriousness, if your answer is ‘while building it,’ that implies that the act of being in the middle of building it sufficiently reliably gives you the ability to safety do that, whereas you could not have had that ability before.
Which means, in turn, that you must (for that to make any sense) be using the AI in its non-aligned state to align itself and solve all those other problems, in a way that you couldn’t plan for without it. But you’re doing that… without the plan to align it. So you’re telling a not-aligned entity smarter than you to align itself, without knowing how it is going to do that, and… then what, exactly?
What Roon and company are hopefully trying to say, instead, is that the answer is (A), but that the deadline has not yet arrived. That we can and should simultaneously be figuring out how to build the ASI, and also figuring out how to align the ASI, and also how to manage all the other issues raised by building the ASI. Thus, iterative deployment, and all that.
To some extent, this is obviously helpful and wise. Certainly we will want to use AIs as a key part of our strategy to figure out how to take things from here, and we still have some ways we can make the AIs more capable before we run into the problem in its full form. But we all have to agree the answer is still (A)!
I hate to pick on Roon but here’s a play in two acts.
Act one, in which Adam Brown (in what I agree was an excellent podcast, recommended!) tells us humanity could in theory change the cosmological constant and otherwise adjust the laws of physics, and one could locally do this unilaterally and it would expand at the speed of light, but if you mess that up even a little you would make the universe incompatible with life, and there are some very obvious future serious problems with this scenario if it pans out:
Act 2:
The Lighter Side
Imagine what AGI would do!
We both kid of course, but this is a thought experiment of how easy it is to boost GDP.
Sadly, this does not yet appear to be a thing.
AI regulation we should all be able to agree upon.