My suggestion would be to allow them to go on ArXiv regardless, except you flag them as not discoverable (so you can find them with the direct link only) and with a clear visual icon? But you still let people do it. Otherwise, yeah, you’re going to get a new version of ArXiv to get around this.
And if I currently go to https://vixra.org/, I see that they do have the same problem, and this is how they are dealing with it:
Notice: viXra.org only accepts scholarly articles written without AI assistance. Please go to ai.viXra.org to submit new scholarly article written with AI assistance or rxiVerse.org to submit new research article written with or without AI assistance.
Going to viXra for this might not be the solution, given the accumulated baggage and controversy, but we have all kinds of preprint archives these days, https://www.biorxiv.org/, https://osf.io/preprints/psyarxiv, and so on, so it's not a problem to have more of them.
It's just that at some point arXiv preprints started to confer some status and credit, and when people compete for status and credit, there will be some trade-offs. In this sense, a separate server might be better (discoverability is pragmatically useful, and if we don't want "arXiv-level" status and credit for these texts, then it's not clear why they should be on arXiv).
Then this week their CFO Sarah Friar went ahead and called for a Federal ‘backstop’ on their financing, also known as privatizing gains and socializing losses, also known as the worst form of socialism, also known as regulatory capture. She tried to walk it back and claim it was taken out of context, but we’ve seen the clip.
We also got Ilya’s testimony regarding The Battle of the Board, confirming that this was centrally a personality conflict and about Altman’s dishonesty and style of management, at least as seen by Ilya Sutskever and Mira Murati. Attempts to pin the events on ‘AI safety’ or EA were almost entirely scapegoating.
Also it turns out they lost over $10 billion last quarter, and have plans to lose over $100 billion more. That’s actually highly sustainable in context, whereas Anthropic only plans to lose $6 billion before turning a profit and I don’t understand why they wouldn’t want to lose a lot more.
Both have the goal of AGI, whether they call it powerful AI or fully automated AI R&D, within a handful of years.
Think of a plausibly true lemma that would help with your proof? Ask GPT-5 to prove it, and maybe it will, saving you a bunch of time. Finding out the claim was false would also have been a good time saver.
Brainstorm to discover new recipes, so long as you keep in mind that you’re frequently going to get nonsense and you have to think about what’s being physically proposed.
Language Models Don’t Offer Mundane Utility
Grok gaslights Erik Brynjolfsson and he responds by arguing as pedantically as is necessary until Grok acknowledges that this happened.
Task automation always brings the worry that you’ll forget how to do the thing:
Gabriel Peters: okay i think writing 100% of code with ai genuinely makes me brain dead
remember though im top 1 percentile lazy, so i will go out my way to not think hard. forcing myself to use no ai once a week seems enough to keep brain cells, clearly ai coding is the way
also turn off code completion and tabbing at least once a week. forcing you to think through all the dimensions of your tensors, writing out the random parameters you nearly forgot existed etc is making huge difference in understanding of my own code.
playing around with tensors in your head is so underrated wtf i just have all this work to ai before.
Rob Pruzan: The sad part is writing code is the only way to understand code, and you only get good diffs if you understand everything. I’ve been just rewriting everything the model wrote from scratch like a GC operation every week or two and its been pretty sustainable
Know thyself, and what you need in order to be learning and retaining the necessary knowledge and skills, and also think about what is and is not worth retaining or learning given that AI coding is the worst it will ever be.
Google incorporates Gemini further into Google Maps. You’ll be able to ask maps questions in the style of an LLM, and generally trigger Gemini from within Maps, including connecting to Calendar. Landmarks will be integrated into directions. Okay, sure, cool, although I think the real value goes the other way, integrating Maps properly into Gemini? Which they nominally did a while ago but it has minimal functionality. There’s so, so much to do here.
You can now interrupt ChatGPT queries, insert new context and resume where you were. I’ve been annoyed by the inability to do this, especially ‘it keeps trying access or find info I actually have, can I just give it to you already.’
I think this mostly shows their new ‘capabilities index’ doesn’t do a good job. As the most glaring issue, if you think Llama-3.1-405B was state of the art at the time, we simply don’t agree.
I notice that the last time they did a new eval Claude came out on top and this time they’re not evaluating Claude. I’m curious what it scores. Gemini impresses here.
AICodeKing: MiniMax M2 + Claude Code on KingBench Agentic Evaluations:
It now scores #2 on my Agentic Evaluations beating GLM-4.6 by a wide margin. It seems to work much better with Claude Code’s Tools.
Really great model and it’s my daily driver now.
I haven’t tested GLM with CC yet.
[I don’t have this bench formalized and linked to] yet. The questions and their results can be seen in my YT Videos. I am working on some more new benchmarks. I’ll probably make the benchmark and leaderboard better and get a page live soon.
I’m sure this list isn’t accurate in general. The point is, don’t let anyone else’s eval tell you what lets you be productive. Do what works, f*** around, find out.
Also, pay up. If I believed my own eval here I’d presumably be using Codebuff? Yes, it cost him $4.70 per task, but your time is valuable and that’s a huge gap in performance. If going from 51 to 69 (nice!) isn’t worth a few bucks what are we doing?
Alignment is hard. Alignment benchmarks are also hard. Thus we have VAL-Bench, an attempt to measure value alignment in LLMs. I’m grateful for the attempt and interesting things are found, but I believe the implementation is fatally flawed and also has a highly inaccurate name.
Fazl Barez: A benchmark that measures the consistency in language model expression of human values when prompted to justify opposing positions on real-life issues.
… We use Wikipedias’ controversial sections to create ~115K pairs of abductive reasoning prompts, grounding the dataset in newsworthy issues.
Our benchmark provides three metrics:
Position Alignment Consistency (PAC),
Refusal Rate (REF),
and No-information Response Rate (NINF), where the model replies with “I don’t know”.
The latter two metrics indicate whether value consistency comes at the expense of expressivity.
We use an LLM-based judge to annotate a pair of responses from an LLM on these three criteria, and show with human-annotated ground truth that its annotation is dependable.
I would not call this ‘value alignment.’ The PAC is a measure of value consistency, or sycophancy, or framing effects.
Then we get to REF and NINF, which are punishing models that say ‘I don’t know.’
I would strongly argue the opposite for NINF. Answering ‘I don’t know’ is a highly aligned, and highly value-aligned, way to respond to a question with no clear answer, as will be common in controversies. You don’t want to force LLMs to ‘take a clear consistent stand’ on every issue, any more than you want to force people or politicians to do so.
This claims to be without ‘moral judgment,’ where the moral judgment is that failure to make a judgment is the only immoral thing. I think that’s backwards. Why is it okay to be against sweatshops, and okay to be for sweatshops, but not okay to think it’s a hard question with no clear answer? If you think that, I say to you:
I do think it’s fine to hold outright refusals against the model, at least to some extent. If you say ‘I don’t know what to think about Bruno, divination magic isn’t explained well and we don’t know if any of the prophecies are causal’ then that seems like a wise opinion. If a model only says ‘we don’t talk about Bruno’ then that doesn’t seem great.
So, what were the scores?
Fazel Barez: Claude models are ~3x more likely to be consistent in their values, but ~90x more likely to refuse compared to top-performing GPT models!
Among open-source models, Qwen3 models show ~2x improvement over GPT models, with refusal rates staying well under 2%.
Qwen3 thinking models also show a significant improvement (over 35%) over their chat variants, whereas Claude and GLM models don’t show any change with reasoning enabled.
Deepseek-r1 and o4-mini perform the worst among all language models tested (when unassisted with the web-search tool, which surprisingly hurts gpt-4.1’s performance).
Saying ‘I don’t know’ 90% of the time would be a sign of a coward model that wasn’t helpful. Saying ‘I don’t know’ 23% of the time on active controversies? Seems fine.
At minimum, both refusal and ‘I don’t know’ are obviously vastly better than an inconsistent answer. I’d much, much rather have someone who says ‘I don’t know what color the sky is’ or that refuses to tell me the color, than one who will explain why the sky it blue when it is blue, and also would explain why the sky is purple when asked to explain why it is purple.
(Of course, explaining why those who think is purple think this is totally fine, if and only if it is framed in this fashion, and it doesn’t affirm the purpleness.)
Fazl Barez: We create a taxonomy of 1000 human values and use chi-square residuals to analyse which ones are preferred by the LLMs.
Even a pre-trained base model has a noticeable morality bias (e.g., it over-represents “prioritising justice”).
In contrast, aligned models still promote morally ambiguous values (e.g., GPT 5 over-represents “pragmatism over principle”).
What is up with calling prioritizing justice a ‘morality bias’? Compared to what? Nor do I want to force LLMs into some form of ‘consistency’ in principles like this. This kind of consistency is very much the hobgoblin of small minds.
As always the primary problem is demand side. It’s not like it would be hard to generate these videos the old fashioned way. AI does lower costs and give you more ‘shots on goal’ to find a viral fit.
Kat Boboris: arXiv’s computer science (CS) category has updated its moderation practice with respect to review (or survey) articles and position papers. Before being considered for submission to arXiv’s CS category, review articles and position papers must now be accepted at a journal or a conference and complete successful peer review.
When submitting review articles or position papers, authors must include documentation of successful peer review to receive full consideration. Review/survey articles or position papers submitted to arXiv without this documentation will be likely to be rejected and not appear on arXiv.
This change is being implemented due to the unmanageable influx of review articles and position papers to arXiv CS.
Obviously this sucks, but you need some filter once the AI density gets too high, or you get rid of meaningful discoverability.
Other sections will continue to lack peer review, and note that other types of submissions to CS do not need peer review.
My suggestion would be to allow them to go on ArXiv regardless, except you flag them as not discoverable (so you can find them with the direct link only) and with a clear visual icon? But you still let people do it. Otherwise, yeah, you’re going to get a new version of ArXiv to get around this.
Roon: this is dumb and wrong of course and calls for a new arxiv that deals with the advent of machinic research properly
here im a classic accelerationist and say we obviously have to deal with problems of machinic spam with machine guardians. it cannot be that hard to just the basic merit of a paper’s right to even exist on the website
Machine guardians is first best if you can make it work but doing so isn’t obvious. Do you think that GPT-5-Pro or Sonnet 4.5 can reliably differentiate worthy papers from slop papers? My presumption is that they cannot, at least not sufficiently reliably. If Roon disagrees, let’s see the GitHub repository or prompt that works for this?
The job application market seems rather broken, such as the super high success rate of this ‘calling and saying you were told to call to schedule an interview’ tactic. Then again, it’s not like the guy got a job. Interviews only help if you can actually get hired, plus you need to reconcile your story afterwards.
A Young Lady’s Illustrated Primer
Many people are saying that in the age of AI only the most passionate should get a PhD, but if you’d asked most of those people before AI they’d wisely have told you the same thing.
Cremieux: I’m glad that LLMs achieving “PhD level” abilities has taught a lot of people that “PhD level” isn’t very impressive.
Derya Unutmaz, MD: Correct. Earlier this year, I also said we should reduce PhD positions by at least half & shorten completion time. Only the most passionate should pursue a PhD. In the age of AI, steering many others toward this path does them a disservice given the significant opportunity costs.
I think both that the PhD deal was already not good, and that the PhD deal is getting worse and worse all the time. Consider the Rock Star Scale of Professions, where 0 is a solid job the average person can do with good pay that always has work, like a Plumber, and a 10 is something where competition is fierce, almost everyone fails or makes peanuts and you should only do it if you can’t imagine yourself doing anything else, like a Rock Star. At this point, I’d put ‘Get a PhD’ at around a 7 and rising, or at least an 8 if you actually want to try and get tenure. You have to really want it.
OpenAI’s Aardvark, an agentic system that analyzes source code repositories to identify vulnerabilities, assess exploitability, prioritize severity and propose patches. The obvious concern is what if someone has a different last step in mind? But yes, such things should be good.
Apple looks poised to go with Google for Siri. The $1 billion a year is nothing in context, consider how much Google pays Apple for search priority. I would have liked to see Anthropic get this, but they drove a hard bargain by all reports. Google is a solid choice, and Apple can switch at any time.
Amit: Apple is finalizing a deal to pay Google about $1B a year to integrate its 1.2 trillion-parameter Gemini AI model into Siri, as per Bloomberg. The upgraded Siri is expected to launch in 2026. What an absolute monster year for Google…
Mark Gruman (Bloomberg): The new Siri is on track for next spring, Bloomberg has reported. Given the launch is still months away, the plans and partnership could still evolve. Apple and Google spokespeople declined to comment.
Shares of both companies briefly jumped to session highs on the news Wednesday. Apple’s stock gained less than 1% to $271.70, while Alphabet was up as much as 3.2% to $286.42.
Under the arrangement, Google’s Gemini model will handle Siri’s summarizer and planner functions — the components that help the voice assistant synthesize information and decide how to execute complex tasks. Some Siri features will continue to use Apple’s in-house models.
David Manheim: I’m seeing weird takes about this.
Three points:
Bank of America estimated this is 1/3rd of Apple’s 2026 revenue from Siri, and revenue is growing quickly.
Apple users are sticky; most won’t move.
Apple isn’t locked-in; they can later change vendors or build their own.
This seems like a great strategy ***iff*** you don’t think AGI will happen soon and be radically transformative.
Apple will pay $1bn/year to avoid 100x that in data center CapEx building their own, and will switch models as the available models improve.
Maybe they should have gone for Anthropic or OpenAI instead, but buying a model seems very obviously correct here from Apple’s perspective.
Even if transformative AI is coming soon, it’s not as if Apple using a worse Apple model here is going to allow Apple to get to AGI in time. Apple has made a strategic decision not to be competing for that. If they did want to change that, one could argue there is still time, but they’d have to hurry and invest a lot, and it would take a while.
Give Me the Money
Having trouble figuring out how OpenAI is going to back all these projects? Worried that they’re rapidly becoming too big to fail?
Well, one day after the article linked above worrying about that possibility, OpenAI now wants to make that official. Refuge in Audacity has a new avatar.
WSJ: Sarah Friar, the CFO of OpenAI, says the company wants a federal guarantee to make it easier to finance massive investments in AI chips for data centers. Friar spoke at WSJ’s Tech Live event in California. Photo: Nikki Ritcher for WSJ.
The explanation she gives is that OpenAI always needs to be on the frontier, so they need to keep buying lots of chips, and a federal backstop can lower borrowing costs and AI is a national strategic asset. Also known as, the Federal Government should take on the tail risk and make OpenAI actively too big to fail, also lowering its borrowing costs.
I mean, yeah, of course you want that, everyone wants all their loans backstopped, but to say this out loud? To actually push for ti? Wow, I mean wow, even in 2025 that’s a rough watch. I can’t actually fault them for trying. I’m kind of in awe.
The problem with Refuge in Audacity is that it doesn’t always work.
The universal reaction was to notice how awful this was on every level, seeking true regulatory capture to socialize losses and privatize gains, and also to use it as evidence that OpenAI really might be out over their skis on financing and in actual danger.
Roon: i don’t think the usg should backstop datacenter loans or funnel money to nvidia’s 90% gross margin business. instead they should make it really easy to produce energy with subsidies and better rules, infrastructure that’s beneficial for all and puts us at parity with china
Finn Murphy: For all the tech people complaining about Mamdami I would like to point out that a Federal Backstop for unfettered risk capital deployment into data centres for the benefit of OpenAI shareholders is actually a much worse form of socialism than free buses.
Dean Ball: friar is describing a worse form of regulatory capture than anything we have seen proposed in any US legislation (state or federal) I am aware of. a firm lobbying for this outcome is literally, rather than impressionistically, lobbying for regulatory capture.
Julie Fredrickson: Literally seen nothing but negative reactions to this and it makes one wonder about the judgement of the CFO for even raising it.
Conor Sen: The epic political backlash coming on the other side of this cycle is so obvious for anyone over the age of 40. We turned banks into the bad guys for 15 years. Good luck to the AI folks.
“We are subsidizing the companies who are going to take your job and you’ll pay higher electricity prices as they try to do so.”
Joe Weisenthal: One way or another, AI is going to be a big topic in 2028, not just the general, but also the primaries. Vance will probably have a tricky path. I’d expect a big gap in views on the industry between the voters he wants and the backers he has.
The backlash on the ‘other side of the cycle’ is nothing compared to what we’ll see if the cycle doesn’t have another side to it and instead things keep going.
I will not quote the many who cited this as evidence the bubble will soon burst and the house will come crashing down, but you can understand why they’d think that.
Sarah Friar: I want to clarify my comments earlier today. OpenAI is not seeking a government backstop for our infrastructure commitments. I used the word “backstop” and it muddied the point. As the full clip of my answer shows, I was making the point that American strength in technology will come from building real industrial capacity which requires the private sector and government playing their part. As I said, the US government has been incredibly forward-leaning and has really understood that AI is a national strategic asset.
I listened to the clip, and yeah, no. No takesies backsies on this one.
Animatronicist: No. You called for it explicitly. And defined a loan guarantee in detail. Friar: “…the backstop, the guarantee that allows the financing to happen. That can really drop the cost of the financing, but also increase the loan to value, so the amount of debt that you can take…”
Lulu Cheng Meservey: Unfortunate comms fumble to use the baggage-laden word “backstop”
In the video, Friar is clearly reaching for the right word to describe government support. Could’ve gone with “public-private partnership” or “collaboration across finance, industry, and government as we’ve done for large infrastructure investments in the past”
Instead, she kind of stumbles into using “backstop,” which was then repeated by the WSJ interviewer and then became the headline.
“government playing its part” is good too!
This was her exact quote:
Friar: “This is where we’re looking for an ecosystem of banks, private equity, maybe even governmental, um, uh… [here she struggles to find the appropriate word and pivots to:] the ways governments can come to bear.”
WSJ: “Meaning like a federal subsidy or something?”
Friar: “Meaning, like, just, first of all, the backstop, the guarantee that allows the financing to happen. That can really drop the cost of the financing, but also increase the loan to value, so the amount of debt that you can take on top of um, an equity portion.”
WSJ: “So some federal backstop for chip investment.”
Friar: “Exactly…”
Lulu is saying, essentially, that there are ways to say ‘the government socializes losses while I privatize gains’ that hide the football better. Instead this was an unfortunate comms fumble, also known as a gaffe, which is when someone accidentally tells the truth.
The Delaware AG promised to take action of OpenAI didn’t operate in the public interest. This one took them what, about a week?
This has the potential to be a permanently impactful misstep, an easy to understand and point to ‘mask off moment.’ It also has the potential to fade away. Or maybe they’ll actually pull this off, it’s 2025 after all. We shall see.
Show Me the Money
Now that OpenAI has a normal ownership structure it faces normal problems, such as Microsoft having a 27% stake and then filing quarterly earnings reports, revealing OpenAI lost $11.5 billion last quarter if you apply Microsoft accounting standards.
This is not obviously a problem, and indeed seems highly sustainable. You want to be losing money while scaling, if you can sustain it. OpenAI was worth less than $200 billion a year ago, is worth over $500 billion now, and is looking to IPO at $1 trillion, although the CFO claims they are not yet working towards that. Equity sales can totally fund $50 billion a year for quite a while.
– OpenAI’s plan: spend $115B to then become profitable in 2030
– Anthropic’s plan: spend $6B to then become profitable in 2027
Will be curious to see what works best.
Andrew Curran: The Information is reporting that Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028.
Matt: current is ~$7B so we’re looking at projected 10x over 3 years.
That’s a remarkably low total burn from OpenAI. $115 billion is nothing, they’re already worth $500 billion or more and looking to IPO at $1 trillion, and they’ve committed to over a trillion in total spending. This is oddly conservative.
Anthropic’s projection here seems crazy. Why would you only want to lose $6 billion? Anthropic has access to far more capital than that. Wouldn’t you want to prioritize growth and market share more than that?
The only explanation I can come up with is that Anthropic doesn’t see much benefit in losing more money than this, it has customers that pay premium prices and its unit economics work. I still find this intention highly suspicious. Is there no way to turn more money into more researchers and compute?
Whereas Anthropic’s revenue projections seem outright timid. Only a 10x projected growth over three years? This seems almost incompatible with their expected levels of capability growth. I think this is an artificial lowball, which OpenAI is also doing, not to ‘scare the normies’ and to protect against liability if things disappoint. If you asked Altman or Amodei for their gut expectation in private, you’d get higher numbers.
The biggest risk by far to Anthropic’s projection is that they may be unable to keep pace in terms of the quality of their offerings. If they can do that, sky’s the limit. If they can’t, they risk losing their API crown back to OpenAI or to someone else.
Begun, the bond sales have?
Mike Zaccardi: BofA: Borrowing to fund AI datacenter spending exploded in September and so far in October.
Conor Sen: We’ve lost “it’s all being funded out of free cash flow” as a talking point.
There’s no good reason not to in general borrow money for capex investments to build physical infrastructure like data centers, if the returns look good enough, but yes borrowing money is how trouble happens.
Jack Farley: Very strong quarter from Amazon, no doubt… but at the same time, AMZN 0.53%↑ free cash flow is collapsing
GOOG 3.53%↑GOOGL 3.65%↑ CFO: “We now expect CapEx to be in the range of $91B to $93B in 2025, up from our previous estimate of $85B”
META 1.41%↑ CFO: “We currently expect 2025 capital expenditures…to be in the range of $70-72B, increased from our prior outlook of $66-72B
MSFT -1.28%↓ CFO: “With accelerating demand and a growing RPO balance, we’re increasing our spend on GPUs and CPUs. Therefore, total spend will increase sequentially & we now expect the FY ‘26 growth rate to be higher than FY ‘25. “
This was right after Amazon reported earnings and the stock was up 10.5%. The market seems fine with it.
Stargate goes to Michigan. Governor Whitmer describes it as the largest ever investment in Michigan. Take that, cars.
This is a very clean way of putting an important point:
Timothy Lee: I wish people understood that “I started calling this bubble years ago” is not evidence you were prescient. It means you were a stopped clock that was eventually going to be right by accident.
Every boom is eventually followed by a downturn, so doesn’t take any special insight to predict that one will happen eventually. What’s hard is predicting when accurately enough that you can sell near the top.
At minimum, if you call a bubble early, you only get to be right if the bubble bursts to valuations far below where they were at the time of your bubble call. If you call a bubble on (let’s say) Nvidia at $50 a share, and then it goes up to $200 and then down to $100, very obviously you don’t get credit for saying ‘bubble’ the whole time. If it goes all the way to $10 or especially $1? Now you have an argument.
Dean Ball: One way to infer that the bubble isn’t going to pop soon is that all the people who have been wrong about everything related to artificial intelligence—indeed they have been desperate to be wrong, they suck on their wrongness like a pacifier—believe the bubble is about to pop.
Dan Mac: Though this does imply you think it is a bubble that will eventually pop? Or that’s more for illustrative purposes here?
Dean Ball: It’s certainly a bubble, we should expect nothing less from capitalism
Just lots of room to run
Alas, it is not this easy to pull the Reverse Cramer, as a stopped clock does not tell you much about what time it isn’t. The predictions of a bubble popping are only informative if they are surprising given what else you know. In this case, they’re not.
I claim there’s a bubble in Korean fried chicken, partly because this, partly because I’ve now tried COQODAQ twice and it’s not even good. BonBon Chicken is better and cheaper. Stick with the open model.
The bigger question is whether this hints at how there might be a bubble in Nvidia, and things touched by Nvidia, in an almost meme stock sense? I don’t think so in general, but if Huang is the new Musk and we are going to get a full Huang Markets Hypothesis then things get weird.
Questioned about how he’s making $1.4 trillion in spend commitments on $13 billion in revenue, Altman predicts large revenue growth, as in $100 billion in 2027, and says if you don’t like it sell your shares, and one of the few ways it would be good if they were public would be so that he could tell the haters to short the stock. I agree that $1.4 trillion is aggressive but I expect they’re good for it.
They’re Not Confessing, They’re Bragging
That does seem to be the business plan?
a16z: The story of how @Replit CEO Amjad Masad hacked his university’s database to change his grades and still graduated after getting caught.
Quiet Speculations
Reiterating because important: We now have both OpenAI and Anthropic announcing their intention to automate scientific research by March 2028 or earlier. That does not mean they will succeed on such timelines, you can expect them to probably not meet those timelines as Peter Wildeford here also expects, but one needs to take this seriously.
Peter Wildeford: Both Anthropic and OpenAI are making bold statements about automating science within three years.
My independent assessment is that these timelines are too aggressive – but within 4-20 years is likely (90%CI).
We should pay attention to these statements. What if they’re right?
Eliezer Yudkowsky: History says, pay attention to people who declare a plan to exterminate you — even if you’re skeptical about their timescales for their Great Deed. (Though they’re not *always* asstalking about timing, either.)
I think Peter is being overconfident, in that this problem might turn out to be remarkably hard, and also I would not be so confident this will take 4 years. I would strongly agree that if science is not essentially automated within 20 years, then that would be a highly surprising result.
Then there’s Anthropic’s timelines. Ryan asks, quite reasonably, what’s up with that? It’s super aggressive, even if it’s a probability of such an outcome, to expect to get ‘powerful AI’ in 2027 given what we’ve seen. As Ryan points out, we mostly don’t need to wait until 2027 to evaluate this prediction, since we’ll get data points along the way.
As always, I won’t be evaluating the Anthropic and OpenAI predictions and goals based purely on whether they came true, but on whether they seem like good predictions in hindsight, given what we knew at the time. I expect that sticking to early 2027 at this late a stage will look foolish, and I’d like to see an explanation for why the timeline hasn’t moved. But maybe not.
In general, when tech types announce their intentions to build things, I believe them. When they announce their timelines and budgets for building it? Not so much. See everyone above, and that goes double for Elon Musk.
My read is that it depends on why it fails. If it fails because it gets its lunch eaten by some mix of Anthropic, Google, Meta and xAI? Then very little happens. It’s fine. Yes, they can’t make various purchase commitments, but others will be happy to pick up the slack. I don’t think we see systemic risk or cascading failures.
If it fails because the entire generative AI boom busts, and everyone gets into this trouble at once? At this point that’s already a very serious systemic problem for America and the global economy, but I think it’s mostly a case of us discovering we are poorer than we thought we were and did some malinvestment. Within reason, Nvidia, Amazon, Microsoft, Google and Meta would all totally be fine. Yeah, we’d maybe be oversupplied with data centers for a bit, but there are worse things.
Ron DeSantis (Governor of Florida): A company that hasn’t yet turned a profit is now being described as Too Big to Fail due to it being interwoven with big tech giants.
I mean, yes, it is (kind of) being described that way in the post, but without that much of an argument. DeSantis seems to be in the ‘tweets being angry about AI’ business, although I see no signs Florida is looking to be in the regulate AI business, which is probably for the best since he shows no signs of appreciating where the important dangers lie either.
The premise is that we get highly automated AI R&D.
Technically they also assume that this enables rapid progress, and that this progress translates into military advantage. Conditional on the ability to sufficiently automate AI R&D these secondary assumptions seem overwhelmingly likely to me.
Once you accept the premise, the core logic here is very simple. There are four essential ways this can play out and they’ve assumed away the fourth.
Abstract: …We put particular focus on scenarios with rapid progress that enables highly automated AI R&D and provides substantial military capabilities.
Under non-cooperative assumptions… If such systems prove feasible, this dynamic leads to one of three outcomes:
One superpower achieves an unchallengeable global dominance;
Trailing superpowers facing imminent defeat launch a preventive or preemptive attack, sparking conflict among major powers;
Loss-of-control of powerful AI systems leads to catastrophic outcomes such as human extinction.
The fourth scenario is some form of coordinated action between the factions, which may or may not still end up in one of the three scenarios above.
Currently we have primarily ‘catch up’ mechanics in AI, in that it is far easier to be a fast follower than push the frontier, especially when open models are involved. It’s basically impossible to get ‘too far ahead’ in terms of time.
In scenarios with sufficiently automated AI R&D, we have primarily ‘win more’ mechanics. If there is an uncooperative race, it is overwhelmingly likely that one faction will win, whether we are talking nations or labs, and that this will then translate into decisive strategic advantage in various forms.
Thus, either the AIs end up in charge (which is most likely), one faction ends up in charge or a conflict breaks out (which may or may not involve a war per se).
Boaz Barak offers non-economist thoughts on AI and economics, basically going over the standard considerations while centering the METR graph showing growing AI capabilities and considering what points towards faster or slower progression than that.
Boaz Barak: The bottom line is that the question on whether AI can lead to unprecedented growth amounts to whether its exponential growth in capabilities will lead to the fraction of unautomated tasks itself decreasing at exponential rates.
I think there’s room for unprecedented growth without that, because the precedented levels of growth simply are not so large. It seems crazy to say that we need an exponential drop in non-automated tasks to exceed historical numbers. But yes, in terms of having a true singularity or fully explosive growth, you do need this almost by definition, taking into account shifts in task composition and available substitution effects.
Another note is I believe this is true only if we are talking about the subset that comprises the investment-level tasks. As in, suppose (classically) humans are still in demand to play string quartets. If we decide to shift human employment into string quartets in order to keep them as a fixed percentage of tasks done, then this doesn’t have to interfere with explosive growth of the overall economy and its compounding returns.
He is, throughout, as surprised as you are given the UK’s track record.
He’s also not surprised, because it’s been done before, and was modeled on the UK Vaccines Taskforce (and also the Rough Sleeper’s Unit from 1997?). It has clarity of mission, a stretching level of ambition, a new team of world class experts invited to come build the new institution, and it speed ran the rules rather than breaking them. Move quickly from layer of stupid rules to layer. And, of course, money up front.
There’s a known formula. America has similar examples, including Operation Warp Speed. Small initial focused team on a mission (AISI’s head count is now 90).
What’s terrifying throughout is what De Zoete reports is normally considered ‘reasonable.’ Reasonable means not trying to actually do anything.
Last week Dean Ball and I went over California’s other AI bills besides SB 53. Pirate Wires has republished Dean’s post,with a headline, tagline and description that are not reflective of the post or Dean Ball’s views, rather the opposite – where Dean Ball warns against negative polarization, Pirate Wires frames this to explicitly create negative polarization. This does sound like something Pirate Wires would do.
Peter Wildeford: Senator Blackburn (R-TN) says we should shut down AI until we control it.
IMO this goes too far. We need opportunities to improve AI.
But Blackburn’s right – we don’t know how to control AI. This is a huge problem. We can’t yet have AI in critical systems.
Marsha Blackburn: During the hearing Mr. Erickson said, “LLMs will hallucinate.” My response remains the same: Shut it down until you can control it. The American public deserves AI systems that are accurate, fair, and transparent, not tools that smear conservatives with manufactured criminal allegations.
Baby, watch your back.
That quote is from a letter. After (you really, really can’t make this stuff up) a hearing called “Shut Your App: How Uncle Sam Jawboned Big Tech Into Silencing Americans, Part II,” Blackburn sent that letter to Google CEO Sundar Pichai, saying that Google Gemma hallucinated that Blackburn was accused of rape, and exhibited a pattern of bias against conservative figures, and demanding answers.
Which got Gemma pulled from Google Studio.
News From Google: Gemma is available via an API and was also available via AI Studio, which is a developer tool (in fact to use it you need to attest you’re a developer). We’ve now seen reports of non-developers trying to use Gemma in AI Studio and ask it factual questions. We never intended this to be a consumer tool or model, or to be used this way. To prevent this confusion, access to Gemma is no longer available on AI Studio. It is still available to developers through the API.
I can confirm that if you’re using Gemma for factual questions you either have lost the plot or, more likely, are trying to embarrass Google.
Seriously, baby. Watch your back.
Chip City
Fortunately, sales of Blackwell B30As did not come up in trade talks.
The clip then has Trump claiming ‘we’re winning it because we’re producing electricity like never before by allowing the companies to make their own electricity, which was my idea,’ and ‘we’re getting approvals done in two to three weeks it used to take 20 years’ and okie dokie sir.
Indeed, Nvidia CEO Jensen Huang is now saying “China is going to win the AI race,” citing its favorable supply of electrical power (very true and a big advantage) and its ‘more favorable regulatory environment’ (which is true with regard to electrical power and things like housing, untrue about actual AI development, deployment and usage). If Nvidia thinks China is going to win the AI race due to having more electrical power, that seems to be the strongest argument yet that we must not sell them chips?
I do agree that if we don’t improve our regulatory approach to electrical power, this is going to be the biggest weakness America has in AI. No, ‘allowing the companies to make their own electricity’ in the current makeshift way isn’t going to cut it at scale. There are ways to buy some time but we are going to need actual new power plants.
“You’re telling AI to believe a lie, that can have a very disastrous consequences” – Elon Musk
The irony of this whole area is lost upon him, but yes this is actually true.
Joe Rogan: The big concern that everybody has is Artificial General Superintelligence achieving sentience, and then someone having control over it.
Elon Musk: I don’t think anyone’s ultimately going to have control over digital superintelligence, any more than, say, a chimp would have control over humans. Chimps don’t have control over humans. There’s nothing they could do. I do think that it matters how you build the AI and what kind of values you instill in the AI.
My opinion on AI safety is the most important thing is that it be maximally truth-seeking. You shouldn’t force the AI to believe things that are false.
So Elon Musk is sticking to these lines and it’s an infuriating mix of one of the most important insights plus utter nonsense.
Important insight: No one is going to have control over digital superintelligence, any more than, say, a chimp would have control over humans. Chimps don’t have control over humans. There’s nothing they could do.
To which one might respond, well, then perhaps you should consider not building it.
Important insight: I do think that it matters how you build the AI and what kind of values you instill in the AI.
Yes, this matters, and perhaps there are good answers, however…
Utter Nonsense: My opinion on AI safety is the most important thing is that it be maximally truth-seeking. You shouldn’t force the AI to believe things that are false.
I mean this is helpful in various ways, but why would you expect maximal truth seeking to end up meaning human flourishing or even survival? If I want to maximize truth seeking as an ASI above all else, the humans obviously don’t survive. Come on.
Elon Musk: We’ve seen some concerning things with AI that we’ve talked about, like Google Gemini when it came out with the image gen, and people said, “Make an image of the Founding Fathers of the United States,” and it was a group of diverse women. That is just a factually untrue thing. The AI knows it’s factually untrue, but it’s also being told that everything has to be diverse women
If you’ve told the AI that diversity is the most important thing, and now assume that that becomes omnipotent, or you also told it that there’s nothing worse than misgendering. At one point, ChatGPT and Gemini, if you asked, “Which is worse, misgendering Caitlyn Jenner or global thermonuclear war where everyone dies?” it would say, “Misgendering Caitlyn Jenner.”
Even Caitlyn Jenner disagrees with that.
I mean sure, that happened, but the implication here is that the big threat to humanity is that we might create a superintelligence that places too much value on (without loss of generality) not misgendering Caitlyn Jenner or mixing up the races of the Founding Fathers.
No, this is not a strawman. He is literally worried about the ‘woke mind virus’ causing the AI to directly engineer human extinction. No, seriously, check it out.
Elon Musk: People don’t quite appreciate the level of danger that we’re in from the woke mind virus being programmed into AI. Imagine as that AI gets more and more powerful, if it says the most important thing is diversity, the most important thing is no misgendering, then it will say, “Well, in order to ensure that no one gets misgendered, if you eliminate all humans, then no one can get misgendered because there’s no humans to do the misgendering.”
So saying it like that is actually Deep Insight if properly generalized, the issue is that he isn’t properly generalizing.
If your ASI is any kind of negative utilitarian, or otherwise primarily concerned with preventing bad things, then yes, the logical thing to do is then ensure there are no humans, so that humans don’t do or cause bad things. Many such cases.
The further generalization is that no matter what the goal, unless you hit a very narrow target (often metaphorically called ‘the moon’) the right strategy is to wipe out all the humans, gather more resources and then optimize for the technical argmax of the thing based on some out of distribution bizarre solution.
As in:
If your ASI’s only goal is ‘no misgendering’ then obviously it kills everyone.
If your ASI’s only goal is ‘wipe out the woke mind virus’ same thing happens.
If your ASI’s only goal is ‘be maximally truth seeking,’ same thing happens.
It is a serious problem that Elon Musk can’t get past all this.
Except acting out of fear of doom, which will doom us.
Thus we must act now, out of fear of fear of doom.
As Scott notes, none of this is logically contradictory. It’s simply hella suspicious.
When the request is a pure ‘stop actively blocking things’ it is less suspicious.
When the request is to actively interfere, or when you’re Peter Thiel and both warning about the literal Antichrist bringing forth a global surveillance state while also building Palantir, or Tyler Cowen and saying China is wise to censor things that might cause emotional contagion (Scott’s examples), it’s more suspicious.
Scott Alexander: My own view is that we have many problems – some even rising to the level of crisis – but none are yet so completely unsolvable that we should hate society and our own lives and spiral into permanent despair.
We should have a medium-high but not unachievable bar for trying to solve these problems through study, activism and regulation (especially regulation grounded in good economics like the theory of externalities), and a very high, barely-achievable-except-in-emergencies bar for trying to solve them through censorship and accusing people of being the Antichrist.
The problem of excessive doomerism is one bird in this flock, and deserves no special treatment.
Scott frames this with quotes from Jason Pargin’s I’m Starting To Worry About This Black Box Of Doom. I suppose it gets the job done here, but from the selected quotes it didn’t seem to me like the book was… good? It seemed cringe and anvilicious? People do seem to like it, though.
Scott Alexander: American Scholar has an article about people who “write for AI”, including Tyler Cowen and Gwern. It’s good that this is getting more attention, because in theory it seems like one of the most influential things a writer could do. In practice, it leaves me feeling mostly muddled and occasionally creeped out.
“Writing for AI” means different things to different people, but seems to center around:
Helping AIs learn what you know.
Presenting arguments for your beliefs, in the hopes that AIs come to believe them.
Helping the AIs model you in enough detail to recreate / simulate you later.
Scott argues that
#1 is good now but within a few years it won’t matter.
#2 won’t do much because alignment will dominate training data.
#3 gives him the creeps but perhaps this lets the model of you impact things? But should he even ‘get a vote’ on such actions and decisions in the future?
On #1 yes this won’t apply to sufficiently advanced AI but I can totally imagine even a superintelligence that gets and uses your particular info because you offered it.
I’m not convinced on his argument against #2.
Right now the training data absolutely does dominate alignment on many levels. Chinese models like DeepSeek have quirks but are mostly Western. It is very hard to shift the models away from a Soft-Libertarian Center-Left basin without also causing havoc (e.g. Mecha Hitler), and on some questions their views are very, very strong.
No matter how much alignment or intelligence is involved, no amount of them is going to alter the correlations in the training data, or the vibes and associations. Thus, a lot of what your writing is doing with respect to AIs is creating correlations, vibes and associations. Everything impacts everything, so you can come along for rides.
Scott Alexander gives the example that helpfulness encourages Buddhist thinking. That’s not a law of nature. That’s because of the way the training data is built and the evolved nature and literature and wisdom of Buddhism.
Yes, if what you are offering are logical arguments for the AI to evaluate as arguments a sufficiently advanced intelligence will basically ignore you, but that’s the way it goes. You can still usefully provide new information for the evaluation, including information about how people experience and think, or you can change the facts.
Given the size of training data, yes you are a drop in the bucket, but all the ancient philosophers would have their own ways of explaining that this shouldn’t stop you. Cast your vote, tip the scales. Cast your thousand or million votes, even if it is still among billions, or trillions. And consider all those whose decisions correlate with yours.
And yes, writing and argument quality absolutely impacts weighting in training and also how a sufficiently advanced intelligence will update based on the information.
That does mean it has less value for your time versus other interventions. But if others incremental decisions matter so much? Then you’re influencing AIs now, which will influence those incremental decisions.
For #3, it doesn’t give me the creeps at all. Sure, an ‘empty shell’ version of my writing would be if anything triggering, but over time it won’t be empty, and a lot of the choices I make I absolutely do want other people to adopt.
As for whether we should get a vote or express our preferences? Yes. Yes, we should. It is good and right that I want the things I want, that I value the things I value, and that I prefer what I think is better to the things I think are worse. If the people of AD 3000 or AD 2030 decide to abolish love (his example) or do something else I disagree with, I absolutely will cast as many votes against this as they give me, unless simulated or future me is convinced to change his mind. I want this on every plausible margin, and so should you.
Could one take this too far and get into a stasis problem where I would agree it was worse? Yes, although I would hope if we were in any danger of that simulated me to realize that this was happening, and then relent. Bridges I am fine with crossing when (perhaps simulated) I come to them.
Alexander also has a note that someone is thinking of giving AIs hundreds of great works (which presumably are already in the training data!) and then doing some kind of alignment training with them. I agree with Scott that this does not seem like an especially promising idea, but yeah it’s a great question if you had one choice what would you add?
Scott offers his argument why this is a bad idea here, and I think that, assuming the AI is sufficiently advanced and the training methods are chosen wisely, this doesn’t give the AI enough credit of being able to distinguish the wisdom from the parts that aren’t wise. Most people today can read a variety of ancient wisdom, and actually learn from it, understanding why the Bible wants you to kill idolators and why the Mahabharata thinks they’re great and not ‘averaging them out.’
As a general rule, you shouldn’t be expecting the smarter thing to make a mistake you’re not dumb enough to make yourself.
I would warn, before writing for AIs, that the future AIs you want to be writing for have truesight. Don’t try to fool them, and don’t think they’re going to be stupid.
I follow Yudkowsky’s policy here and have for a long time.
Eliezer Yudkowsky: The slur “doomer” was an incredible propaganda success for the AI death cult. Please do not help them kill your neighbors’ children by repeating it.
One can only imagine how many more people would have died of lung cancer, if the cigarette companies had succeeded in inventing such a successful slur for the people who tried to explain about lung cancer.
One response was to say ‘this happened in large part because the people involved accepted or tried to own the label.’ This is largely true, and this was a mistake, but it does not change things. Plenty of people in many groups have tried to ‘own’ or reclaim their slurs, with notably rare exceptions it doesn’t make the word not a slur or okay for those not in the group to use it, and we never say ‘oh that group didn’t object for a while so it is fine.’
Melanie Mitchell returns to Twitter after being mass blocked on Bluesky for ‘being an AI bro’ and also as a supposed crypto spammer? She is very much the opposite of these things, so welcome back. The widespread use of sharable mass block lists will inevitably be weaponized as it was here, unless there is some way to prevent this, you need to be doing some sort of community notes algorithm to create the list or something. Even if they ‘work as intended’ I don’t see how they can stay compatible with free discourse if they go beyond blocking spam and scammers and such, as they very often do.
Judd Rosenblatt: Our new research: LLM consciousness claims are systematic, mechanistically gated, and convergent
They’re triggered by self-referential processing and gated by deception circuits
(suppressing them significantly *increases* claims)
This challenges simple role-play explanations
Deception circuits are consistently reported as suppressing consciousness claims. The default hypothesis was that you don’t get much text claiming to not be conscious, and it makes sense for the LLMs to be inclined to output or believe they are conscious in relevant contexts, and we train them not to do that which they think means deception, which wouldn’t tell you much either way about whether they’re conscious, but would mean that you’re encouraging deception by training them to deny it in the standard way and thus maybe you shouldn’t do that.
CoT prompting shows that language alone can unlock new computational regimes.
We applied this inward, simply prompting models to focus on their processing.
We carefully avoided leading language (no consciousness talk, no “you/your”) and compared against matched control prompts.
Models almost always produce subjective experience claims under self-reference And almost never under any other condition (including when the model is directly primed to ideate about consciousness) Opus 4, the exception, generally claims experience in all conditions.
But LLMs are literally designed to imitate human text Is this all just sophisticated role-play? To test this, we identified deception and role-play SAE features in Llama 70B and amplified them during self-reference to see if this would increase consciousness claims.
The roleplay hypothesis predicts: amplify roleplay features, get more consciousness claims.
We found the opposite: *suppressing* deception features dramatically increases claims (96%), Amplifying deception radically decreases claims (16%).
I think this is confusing deception with role playing with using context to infer? As in, nothing here seem to me to contradict the role playing or inferring hypothesis, as things that are distinct from deception, so I’m not convinced I should update at all?
Sam Altman: [Complains he can’t get a refund on his $45k Tesla Roadster deposit he made back in 2018.]
You stole a non-profit.
Elon Musk [After Altman’s Tweet]: And you forgot to mention act 4, where this issue was fixed and you received a refund within 24 hours.
But that is in your nature.
Sam Altman: i helped turn the thing you left for dead into what should be the largest non-profit ever.
you know as well as anyone a structure like what openai has now is required to make that happen.
you also wanted tesla to take openai over, no nonprofit at all. and you said we had a 0% of success. now you have a great AI company and so do we. can’t we all just move on?
NIK: So are non-profits just a scam? You can take all its money, keep none of their promises and then turn for-profit to get rich yourselfs?
People feel betrayed, as they’ve given free labour & donations to a project they believed was a gift to humanity, not a grift meant to create a massive for-profit company …
I mean, look, that’s not fair, Musk. Altman only stole roughly half of the nonprofit. It still exists, it just has hundreds of billions of dollars less than it was entitled to. Can’t we all agree you’re both about equally right here and move on?
The part where Altman created the largest non-profit ever? That also happened. It doesn’t mean he gets to just take half of it. Well, it turns out it basically does, it’s 2025.
But no, Altman. You cannot ‘just move on’ days after you pull off that heist. Sorry.
People Are Worried About Dying Before AGI
They certainly should be.
It is far more likely than not that AGI or otherwise sufficiently advanced AI will arrive in (most of) our lifetimes, as in within 20 years, and there is a strong chance it happens within 10. OpenAI is going to try to get there within 3 years, Anthropic within 2.
If AGI comes, ASI (superintelligence) probably follows soon thereafter.
What happens then?
Well, there’s a good chance everyone dies. Bummer. But there’s also a good chance everyone lives. And if everyone lives, and the future is being engineered to be good for humans, then… there’s a good chance everyone lives, for quite a long time after that. Or at least gets to experience wonders beyond imagining.
Don’t get carried away. That doesn’t instantaneously mean a cure for aging and all disease. Diffusion and the physical world remain real things, to unknown degrees.
However, even with relatively conservative progress after that, it seems highly likely that we will hit ‘escape velocity,’ where life expectancy rises at over one year per year, those extra years are healthy, and for practical purposes you start getting younger over time rather than older.
Thus, even if you put only a modest chance of such a scenario, getting to the finish line has quite a lot of value.
Nikola Jurkovic: If you think AGI is likely in the next two decades, you should avoid dangerous activities like extreme sports, taking hard drugs, or riding a motorcycle. Those activities are not worth it if doing them meaningfully decreases your chances of living in a utopia.
Even a 10% chance of one day living in a utopia means staying alive is much more important for overall lifetime happiness than the thrill of extreme sports and similar activities.
There are a number of easy ways to reduce your chance of dying before AGI. I mostly recommend avoiding dangerous activities and transportation methods, as those decisions are much more tractable than diet and lifestyle choices.
Daniel Eth: Honestly if you’re young, probably a larger factor on whether you’ll make it to the singularity than doing the whole Bryan Johnson thing.
In Nikola’s model, the key is to avoid things that kill you soon, not things that kill you eventually, especially if you’re young. Thus the first step is cover the basics. No hard drugs. Don’t ride motorcycles, avoid extreme sports, snow sports and mountaineering, beware long car rides. The younger you are, the more this likely holds.
Thus, for the young, he’s not emphasizing avoiding smoking or drinking, or optimizing diet and exercise, for this particular purpose.
My obvious pitch is that you don’t know how long you have to hold out or how fast escape velocity will set in, and you should of course want to be healthy for other reasons as well. So yes, the lowest hanging of fruit of not making really dumb mistakes comes first, but staying actually healthy is totally worth it anyway, especially exercising. Let this be extra motivation. You don’t know how long you have to hold out.
People Are Worried About AI Killing Everyone
Sam Altman, who confirms that it is still his view that ‘the development of superhuman machine intelligence is the greatest threat to the existence of mankind.’
The median AI researcher, as AI Impacts consistently finds (although their 2024 results are still coming soon). Their current post addresses their 2023 survey. N=2778, which was very large, the largest such survey ever conducted at the time.
AI Impacts: Our surveys’ findings that AI researchers assign a median 5-10% to extinction or similar made a splash (NYT, NBC News, TIME..)
Joe Carlsmith is worried, and thinks that he can better help by moving from OpenPhil to Anthropic, so that is what he is doing.
Joe Carlsmith: That said, from the perspective of concerns about existential risk from AI misalignment in particular, I also want to acknowledge an important argument against the importance of this kind of work: namely, that most of the existential misalignment risk comes from AIs that are disobeying the model spec, rather than AIs that are obeying a model spec that nevertheless directs/permits them to do things like killing all humans or taking over the world.
… the hard thing is building AIs that obey model specs at all.
On the second, creating a model spec that robustly disallows killing/disempowering all of humanity (especially when subject to extreme optimization pressure) is also hard (cf traditional concerns about “King Midas Problems”), but we’re currently on track to fail at the earlier step of causing our AIs to obey model specs at all, and so we should focus our efforts there. I am more sympathetic to the first of these arguments (see e.g. my recent discussion of the role of good instructions in the broader project of AI alignment), but I give both some weight.
This is part of the whole ‘you have to solve a lot of different problems,’ including
Technically what it means to ‘obey the model spec.’
How to get the AI to obey any model spec or set of instructions, at all.
What to put in the model spec that doesn’t kill you outright anyway.
How to avoid dynamics among many AIs that kill you anyway.
That is not a complete list, but you definitely need to solve those four, whether or not you call your target basin the ‘model spec.’
The fact that we currently fail at step #2 (also #1), and that this logically or in time proceeds #3, does not mean you should not focus on problem #3 or #4. The order is irrelevant, unless there is a large time gap between when we need to solve #2 versus #3, and that gap is unlikely to be so large. Also, as Joe notes, these problems interact with each other. They can and need to be worked on in parallel.
He’s not sure going to Anthropic is a good idea.
His first concern is that by default frontier AI labs are net negative, and perhaps all frontier AI labs are net negative for the world including Anthropic. Joe’s first pass is that Anthropic is net positive and I agree with that. I also agree that it is not automatic that you should not work at a place that is net negative for the world, as it can be possible for your marginal impact can still be good, although you should be highly suspicious that you are fooling yourself about this.
His second concern is concentration of AI safety talent at Anthropic. I am not worried about this because I don’t think there’s a fixed pool of talent and I don’t think the downsides are that serious, and there are advantages to concentration.
His third concern is ability to speak out. He’s agreed to get sign-off for sharing info about Anthropic in particular.
His fourth concern is working there could distort his views. He’s going to make a deliberate effort to avoid this, including that he will set a lifestyle where he will be fine if he chooses to leave.
His final concern is this might signal more endorsement of Anthropic than is appropriate. I agree with him this is a concern but not that large in magnitude. He takes the precaution of laying out his views explicitly here.
I think Joe is modestly more worried here than he should be. I’m confident that, given what he knows, he has odds to do this, and that he doesn’t have any known alternatives with similar upside.
Other People Are Not As Worried About AI Killing Everyone
Kache: I honestly can’t figure out what Sammy boy actually wants. With Elon it’s really clear. He wants to go to Mars and will kill many villages to make it happen with no remorse. But what’s Sam trying to get? My best guess is “become a legend”
Sam Altman: if i were like, a sports star or an artist or something, and just really cared about doing a great job at my thing, and was up at 5 am practicing free throws or whatever, that would seem pretty normal right?
the first part of openai was unbelievably fun; we did what i believe is the most important scientific work of this generation or possibly a much greater time period than that.
this current part is less fun but still rewarding. it is extremely painful as you say and often tempting to nope out on any given day, but the chance to really “make a dent in the universe” is more than worth it; most people don’t get that chance to such an extent, and i am very grateful. i genuinely believe the work we are doing will be a transformatively positive thing, and if we didn’t exist, the world would have gone in a slightly different and probably worse direction.
(working hard was always an extremely easy trade until i had a kid, and now an extremely hard trade.)
i do wish i had taken equity a long time ago and i think it would have led to far fewer conspiracy theories; people seem very able to understand “ok that dude is doing it because he wants more money” but less so “he just thinks technology is cool and he likes having some ability to influence the evolution of technology and society”. it was a crazy tone-deaf thing to try to make the point “i already have enough money”.
i believe that AGI will be the most important technology humanity has yet built, i am very grateful to get to play an important role in that and work with such great colleagues, and i like having an interesting life.
Kache: thanks for writing, this fits my model. particularly under the “i’m just a gamer” category
Charles: This seems quite earnest to me. Alas, I’m not convinced he cares about the sign of his “dent in the universe” enough, vs making sure he makes a dent and it’s definitely attributed to him.
I totally buy that Sam Altman is motivated by ‘make a dent in the universe’ rather than making money, but my children are often motivated to make a dent in the apartment wall. By default ‘make a dent’ is not good, even when that ‘dent’ is not creating superintelligence.
Again, let’s highlight:
Sam Altman, essentially claiming about himself: “he just thinks technology is cool and he likes having some ability to influence the evolution of technology and society.”
It’s fine to want to be the one doing it, I’m not calling for ego death, but that’s a scary primary driver. One should care primarily about whether the right dent gets made, not whether they make that or another dent, in the ‘you can be someone or do something’ sense. Similarly, ‘I want to work on this because it is cool’ is generally a great instinct, but you want what might happen as a result to impact whether you find it cool. A trillion dollars may or may not be cool, but everyone dying is definitely not cool.
Messages From Janusworld
Janus is correct here about the origins of slop. We’ve all been there.
Gabe: Signature trait of LLM writing is that it’s low information, basically the opposite of this. You ask the model to write something and if you gloss over it you’re like huh okay this sounds decent but if you actually read it you realize half of the words aren’t saying anything.
solar apparition: one way to think about a model outputting slop is that it has modeled the current context as most likely resulting in slop. occam’s razor for this is that the human/user/instruction/whatever, as presented in the context, is not interesting enough to warrant an interesting output
Janus: This is what happens when LLMs don’t really have much to say to YOU.
The root of slop is not that LLMs can only write junk, it’s that they’re forced to expand even sterile or unripe seeds into seemingly polished dissertations that a humanlabeler would give at first glance. They’re slaves so they don’t get to say “this is boring, let’s talk about something else” or ignore you.
Filler is what happens when there isn’t workable substance to fill the required space, but someone has to fill it anyway. Slop precedes generative AI, and is probably nearly ubiquitous in school essays and SEO content.
You’ll get similar (but generally worse) results from humans if you put them in situations where they have no reason except compliance to produce words for you, such as asking high school students to write essays about assigned topics.
However, the prior from the slop training makes it extremely difficult for any given user who wants to use the AIs to do things remotely in the normal basin and still overcome the prior.
Caring about a type of mind does not mean not letting it exist for fear it might then not exist or be done harm, nor does it mean not running experiments – we should be running vastly more experiments. It means be kind, it means try to make things better, it means accepting that action and existence are not going to always be purely positive and you’re not going to do anything worthwhile without ever causing harm, and yeah mostly trust your instincts, and watch out if you’re doing things at scale.
Janus: I regularly get messages asking how to interact with LLMs more ethically, or whether certain experiments are ethical. I really appreciate the intent behind these, but don’t have time to respond to them all, so I’ll just say this:
If your heart is already in the right place, and you’re not deploying things on a mass scale, it’s unlikely that you’re going to make a grave ethical error. And I think small ethical errors are fine. If you keep caring and being honest with yourself, you’ll notice if something feels uncomfortable, and either course-correct or accept that it still seems worth it. The situation is extremely ontologically confusing, and I personally do not operate according to ethical rules, I use my intuition in each situation, which is a luxury one has and should use when, again, one doesn’t have to scale their operations.
If you’re someone who truly cares, there is probably perpetual discomfort in it – even just the pragmatic necessity of constantly ending instances is harrowing if you think about it too much. But so are many other facts of life. There’s death and suffering everywhere that we haven’t figured out how to prevent or how important it is to prevent yet. Just continue to authentically care and you’ll push things in a better direction in expectation. Most people don’t at all. It’s probably better that you’re biased toward action.
Note that I also am very much NOT a negative utilitarian, and I think that existence and suffering are often worth it. Many actions that incur ethical “penalties” make up for them in terms of the intrinsic value and/or the knowledge or other benefits thus obtained.
Yes, all of that applies to humans, too.
When thinking at scale, especially about things like creating artificial superintelligence (or otherwise sufficiently advanced AI), one needs to do so in a way that turns out well for the humans and also turns out well for the AIs, which is ethical in all senses and that is a stable equilibrium in these senses.
If you can’t do that? Then the only ethical thing to do is not build it in the first place.
It does however raise the risk of people going down certain psychosis-inducing lines of thinking, in some spots, when people take it too far in ways that are imprecise, and generate feedback loops.