I do think he takes it too far in concluding that Anthropic has ‘greatly increased’ danger, I think sign here is at worst ambiguous.
I don't know if other people have this problem or it's just me but I often find myself reading someone's disagreement with what I said, and I think they're correct and I'm kind of confused about why I said that in the first place. Anthropic has increased competition but it's also done more safety research than the other frontier AI companies and done more to raise concern of dangerous behavior (e.g. it's more public about its internal evaluations) so I agree that it's difficult to say which way the sign points.
I wrote the quoted quick take pretty quickly. I notice that when I write without carefully revising (or when I'm speaking aloud), I sometimes say things that feel true based on vibes but that I don't actually agree with upon reflection. I make a point not to deliberately mislead people, but it feels difficult to never say something that I don't believe upon reflection, because doing so would require carefully thinking about every sentence before saying it. Like it requires system-2 thinking to determine whether I believe a statement, but I write and speak using system 1.
Is Gemini 2.5 Pro really not sycophantic? Because I tend to get more positive feedback from it than any online or offline conversation with humans. (Alternatively, humans around me are too reluctant to give explicit praise?)
I think it's still sycophantic compared to hardcore STEM circles where we regard criticism as a bloodsport and failing to find fault in something as defeat. But it's much less so than the more relevant comparison, which is other LLMs, and in an absolute sense it's at a level where it's hard to distinguish from reasonable opinions and doesn't seem to be getting in the way too much. As davidad notes, it's still at a level where you can sense its reluctance or if it's shading things to be nice, and that is a level where it's just a small quirk and something you can work around easily.
To give a concrete example: I finish writing a review of a movie I watched the other day, and I hand it to Gemini-2.5-pro:
Okay, let's break down this review of The Birth of Saké.
Overall Impression:
This is a well-written, engaging, and characteristically Gwern-esque review. It has a strong, distinct voice, a clear central argument, and uses vivid descriptions and rhetorical questions effectively. It goes beyond a simple summary to offer a provocative interpretation focused on the perceived pointlessness and human cost of the tradition depicted. The comparison to Jiro Dreams of Sushi serves as a useful framing device.
Strengths:
- Clear Voice and Perspective: The review is unmistakably written by Gwern, with the analytical, slightly cynical, and utilitarian perspective evident throughout. This consistency is a strength.
- Strong Central Argument: The core thesis – questioning the value and purpose of the intense, sacrificial labor involved in traditional sake brewing, especially given its declining popularity and the reviewer's own dislike of the product – is forcefully argued and supported with evidence from the documentary (harsh conditions, worker's death, market decline).
- Engaging Writing: Phrases like "miserably picturesque," "sadistically horrible," and the questioning ("Why? Why do all this?") keep the reader engaged. The description of the working conditions is particularly effective in conveying the film's atmosphere and the reviewer's reaction.
- Effective Use of Comparison: Contrasting the film with Jiro Dreams of Sushi highlights key differences in the subject matter's popularity and the perceived passion (or lack thereof) involved, strengthening the review's central point about sake's apparent niche status and the questionable nature of the sacrifice.
- Good Structure: The review flows logically from introduction and context (how the reviewer came to watch it) through observations about cinematography and working conditions, to the core argument about purpose and value, and finally to a concluding thought.
- Acknowledging Counterarguments (Implicitly): While strongly critical, the review acknowledges the potential admiration for "the work," "craftsmanship," and "art," even as it ultimately dismisses these in favor of a more pragmatic assessment.
- Honesty: The reviewer is upfront about their lack of interest in sake, which contextualizes their perspective. The Tubi/uBlock anecdote adds a relatable, slightly amusing touch.
- Self-Aware Ending: The parenthetical ending is a clever, self-deprecating touch that fits the overall tone.
Areas for Critique / Potential Weaknesses:
- Subjectivity as a Potential Limitation: While honesty about disliking sake is good, this strong pre-existing bias heavily colors the interpretation. The review fundamentally judges the activity based on the reviewer's own lack of appreciation for the end product and a perceived lack of external validation (mass market appeal). It risks dismissing the internal motivations, cultural significance, or sense of community/identity that the participants might derive from the process, even if not explicitly articulated in a way the reviewer finds convincing.
- Dismissal of Tradition/Cultural Context: The review questions the "why" primarily through a modern, somewhat Western, utilitarian lens. While the Meiji era isn't ancient history like the Ise Shrine, 150 years is still a significant period for a family business in any culture. The review might underplay the weight of legacy, duty, local identity, and the potential intrinsic value placed on maintaining any tradition, regardless of its "rational" justification or global popularity. The focus is almost entirely on the negative aspects of the tradition (the labor conditions) without deeply exploring why it might persist beyond inertia or lack of options.
- Interpretation of Passion/Motivation: The assertion that "no one really seems to care about it" or lacks "genuine enthusiasm" could be a projection or misreading. Passion can manifest differently across cultures and personalities. Dedication, acceptance of duty, quiet pride in craft, or commitment to community might be present but not expressed as overt "otaku"-level enthusiasm. The heir's marketing efforts, even if seen as necessary for survival, could also stem from a deep sense of responsibility and, yes, a form of passion for preserving the family legacy.
- The "Pointlessness" Argument: While powerfully made, the argument that the work is pointless because the product isn't widely loved or "objectively" great (according to the reviewer) is a very specific philosophical stance. Many crafts, arts, and traditions exist without mass appeal or clear economic justification. Is the value solely determined by market size or the reviewer's palate? The review doesn't leave much room for alternative value systems.
- Slight Overemphasis on Negativity?: While the film apparently provides ample evidence for the harshness and decline, the review focuses so heavily on this that it might overshadow other potential aspects or nuances the documentary presented (e.g., moments of camaraderie, pride in specific steps, the beauty of the process itself beyond just "miserably picturesque").
Conclusion:
This is a compelling and thought-provoking review that successfully articulates a strong, critical perspective on The Birth of Saké. It excels in its clear voice, structured argument, and engaging writing. Its main potential weakness lies in the heavy filter of the reviewer's personal taste and utilitarian viewpoint, which may lead to a dismissal of cultural nuances, alternative motivations, or the intrinsic value systems that might underpin the traditions depicted. It's a review that tells you as much about the reviewer's worldview as it does about the film, which, for this particular author, is often part of the appeal. It definitely makes the reader think, even if they might disagree with the core conclusion about the "pointlessness" of the endeavor.
Particularly characteristic here is
The review is unmistakably written by Gwern, with the analytical, slightly cynical, and utilitarian perspective evident throughout. This consistency is a strength.
I've started to wince a bit when Gemini-2.5-pro critiques a draft and throws in a descriptive note along the lines of 'classic Gwern style: just slightly cynical'. It feels too much like it's trying to suck up to me. ('Wow, Gwern, you're so edgy, but in a good way!') And yet... it's not wrong, is it? I do try to avoid being overly cynical or dark, but you certainly couldn't describe my writing as 'slightly idealistic' either; so, 'slightly cynical'.
And otherwise, the description and critique seem largely correct and is not really mincing any words in the criticisms - even if I didn't wind up making many changes based on it. So it's more useful than Claude or 4o would be. (o3 is also good in this regard but the confabulation issue is a big problem, eg. So I just recommend 2.5-pro right now.)
I should have given some examples of my own. Here's Gemini on a story idea of mine, for the Star Wars universe ("I wish there was a story about a power-hungry villain who takes precautions against becoming irrational after gaining power. You'd think that at least some would learn from history. [...] The villain could anticipate that even his best efforts might fail, and create a mechanism to revive copies of himself from time to time, who would study his own past failures, rise to power again, and try to do better each time. [...] Sometimes the villain becomes the hero and the reader roots for him to succeed in stabilizing galactic society long term, but he fails despite his best efforts.")
That's a fantastic twist and adds immense depth and pathos to the entire concept! Having a cycle where the villain, armed with the knowledge of past tyrannical failures, genuinely attempts a more stable, perhaps even seemingly benevolent, form of authoritarianism – only to fail despite their best efforts – is incredibly compelling.
It's assessment on this comment:
Yes, I think that reply is a very good and concise explanation of one key reason why wages might fall or stagnate despite rising productivity, viewed through a neoclassical lens. It effectively captures the concept of diminishing marginal value within the firm, even without external market price changes or new entrants.
It's assessment on this comment:
This version is slightly stronger due to the increased specificity in the final sentence. It clearly articulates the key components of the argument and the commenter's stance relative to the author's potential views. It's an excellent comment and ready to post.
Each of those comments did get upvoted to >10, so maybe Gemini is not too far off the mark, and I'm just not used to seeing "fantastic", "very good", "excellent" said to me explicitly?
He makes some bizarre statements, such as that if you have a rare gene that might protect you from the AI having enough data to have ‘a good read’ on you, and that genetic variation will ‘protect you from high predictability.’
You know, even if this were true... if you're a less unpredictable entity in a world where sufficiently power AI wants to increase predictability, there are many simple and obvious classes of interventions that reliably achieve that. Mostly, those interventions look nothing like freedom, and you're not going to like them.
I also notice that I am deeply confused about consciousness on essentially every level.
https://proteanbazaar.substack.com/p/consciousness-actually-explained (direct link; there's a lw copy too)
Table of Contents
Language Models Offer Mundane Utility
Remember to ask Claude Cade for the ‘ultrathink’ package so it uses the maximum number of possible tokens, 32k. Whereas megathink is only 10k, and think alone is a measly 4k. Andrej Karpathy guide to how he codes when it actually matters. In a post centrally advocating for living and communicating mostly online, Tyler Cowen uses 20-30 AI queries a day as a baseline rate of use one might be concerned about. I was surprised that his number here was so low. Maybe I’m not such a light user after all. Yana Welinder uses AI to cure her migraines, via estrogen patches. Dramatically lower the learning curve on getting a good Emacs setup.Language Models Don’t Offer Mundane Utility
As always, there is one supreme way not to get utility, which is not to use AI at all. I strongly endorse Dave’s advice here. It would also straight up be great marketing. I was going to address Ed Zitron’s post about how generative AI is doomed and useless, but even for me it was too long, given he’d made similar arguments before, and it was also full of absurdly false statements about the utility of existing products so I couldn’t even. But I include the link for completeness. Where have all the genuinely new applications gone? There’s a strange lull there.We’re Out of Deep Research
OpenAI is rolling out higher quotas for deep research, including a lightweight version for the free tier. But I never use it anymore. Not that I would literally never use it, but it never seems like the right tool, I’d much rather ask o3. I don’t want it going off for 20 minutes and coming back with a bunch of slop I have to sort through, I want an answer fast enough I don’t have to contact switch and then to ask follow-ups. I definitely had the sense that the Deep Research pause was in a non-sweet spot. It was long enough to force a context switch. Either it needed to be shorter, or it might as well be longer. Whereas o3 is the ten times better that Just Works, except for the whole Lying Liar problem. I’ve seen complaints that there are no good benchmarks for robotics. This doesn’t seem like that hard a thing to create benchmarks for?o3 Is a Lying Liar
Another fun example: Everyone who uses o3 for long enough realizes it is a lying liar, and I’ve reported on it extensively already. Here is the LessWrong version, which has some good comments. Also on this topic: Another fun example. I demand a plausible lie! Actually I don’t, blatant lies are the best kind. Here’s an interesting benefit: Humans worth talking to mostly do the same thing that Gemini is doing here. The exact way things are worded tells you what they’re actually thinking, often very intentionally so, in ways that keep things polite and deniable but that are unmistakable if you’re listening. Often you’re being told the house is on fire.GPT-4o was an Absurd Sycophant
Here’s a lot of what the recent GPT-4o incident was an alarm bell for: And here’s another older story of a combination canary and failure mode. When you optimize for something, you also optimize for everything that vibes or correlates with it, and for the drives and virtues and preferences and goals that lead to the thing. You may get results you very much did not expect. As Will Depue says here, you can get o3 randomly using British spellings, without understanding why. There is a missing mood here, this isn’t simply that post training is tricky and finicky. It’s that you are creating drives and goals no one intended, most famously that it ‘really wants to complete tasks,’ and that these can be existentially level bad for you if you scale things up sufficiently. It’s not that there is no fire alarm. It’s that you’re not listening.Sonnet 3.7 is a Savage Cheater
It seems only fair to put the true name to the face here, as well.Unprompted Suggestions
Tyler Cowen compares o3 and other LLMs to The Oracle. If you wanted a real answer out of an ancient oracle, you had to invest time, money and effort into it, with multi-turn questions, sacrificial offerings, ritual hymns and payments, and then I’d add you only got your one question and not that long a response. We then evaluated the response. It’s strange to compare oneself to The Oracle. As the top comment points out, those who consulted The Oracle famously do not have life stories that turn out so well. Its methodology was almost certainly a form of advanced cold reading, with non-answers designed to be open to broad interpretation. All of this prompting was partly to extract your gold, partly to get you to give the information they’d feed back to you. Tyler then says, we usually judge LLM outputs and hallucination rates based on one-shot responses using default low-effort settings and prompts, and that’s not fair. If you ask follow-ups and for error correction, or you use a prompt like this one for ‘ultra-deep thinking mode’ then it takes longer but hallucination rates go down and the answers get better. Here’s that prompt, for those who do want to Control+C, I am guessing it does improve reliability slightly but at the cost of making things a lot slower, plus some other side effects, and crowding out other options for prompting: The same is true of any other source or method – if you are willing to spend more and you use that to get it to take longer and do more robustness checks, it’s going to be more reliable, and output quality likely improves. One thing I keep expecting, and keep not getting, is scaffolding systems that burn a lot more tokens and take longer, and in exchange improve output, offered to users. I have the same instinct that often a 10x increase in time and money is a tiny price to pay for even 10% better output, the same way we invoke the sacred scaling laws. And indeed we now have reasoning models that do this in one way. But there’s also the automated-multi-turn-or-best-of-or-synthesis-of-k-and-variations other way, and no one does it. Does it not work? Or is it something else? Meanwhile, I can’t blame people for judging LLMs on the basis of how most people use them most of the time, and how they are presented to us to be used. That is especially true given they are presented that way for a reason. The product where we have to work harder and wait longer, to get something a bit better, is a product most people don’t want. In the long run, as compute gets cheap and algorithms improve, keep in mind how many different ways there are to turn those efficiencies into output gains. Thus I see those dumping on hallucinations ‘without trying to do better’ as making a completely legitimate objection. It is not everyone’s job to fix someone else’s broken or unfinished or unreliable technology. That’s especially true with o3, which I love but where things on this front have gotten a lot worse, and no one cannot excuse this with ‘it’s faster than o1 pro.’ Ben Hoffman offers his prompting strategies to try and get around the general refusal of LLMs to engage in the types of conversations he values. Here’s a different attempt to do something similar. I think Absolute Mode goes too hard, but I’m not confident in that.Huh, Upgrades
ChatGPT experimentally adds direct shopping, with improved product results, visual product details, pricing, reviews and direct links to buy things. They’re importantly not claiming affiliate revenue, so incentives are not misaligned. It’s so weird how OpenAI will sometimes take important and expensive principled stands in some places, and then do the opposite in other places. The 1-800-ChatGPT option on WhatsApp now includes search and live sports scores. OpenAI claims to have improved their citations. OpenAI now offering trending prompts and autocomplete as well. NoteBookLM will give you outputs in 50+ languages.On Your Marks
For what they are still worth, new Arena results are in for OpenAI’s latest models. o3 takes the #2 slot behind Gemini 2.5, GPT-4.1 an o4-mini are in the top 10. o3 loses out on worse instruction following and creative writing, and (this one surprised me) on longer queries. An attempt to let Claude, Gemini and o3 play Pokemon on a level playing field. It turns out scaffolding is necessary to deal with the ‘easy’ parts of the game, it is very fiddly with vision being surprisingly poor, and when you’re playing fair it’s not obvious who is better.Change My Mind
AI is tested on persuasion via use on r/changemymind, achieves a persuasion rate of 18%, versus a baseline human rate of 3%, so the AI is in the 98th percentile among experts, and was not detected as an AI. Is that ‘superhuman persuasion’? Kind of yes, kind of no. It certainly isn’t ‘get anyone to do anything whenever you want’ or anything like that, but you don’t need something like that to have High Weirdness come into play. Despite this, persuasion was removed from the OpenAI Preparedness Framework 2.0. It needs to return. The subreddit was, as one would expect, rather pissed off about the experiment, calling it ‘unauthorized’ and various other unpleasant names. I’m directionally with Vitalik here: We should be at least as tolerant of something done ‘as an experiment’ as we are when something is done for other purposes. The needs of the many may or may not outweigh the needs of the few, but they at minimum shouldn’t count against you. What would we think of using a bot in the ways these bots were used, with the aim of changing minds? People in general probably wouldn’t be too happy. Whether or not I join them depends on the details. In general, if you use bots where people are expecting humans, that seems like a clear example of destroying the commons. Thus you have to go case by case. Here, I do think in this case the public good of the experiment makes a strong case. The bots did their homework and understood the assignment. Good for the bots. If you are not prepared for this to be how the future works, you are not prepared for the future. Yes, the bots will all be scraping everything you’ve ever written including your Twitter account before bidding on all your ad views. The word is going to be Out to Get You, and you in particular. Get ready. Even if a human is answering, if I am on Reddit arguing in the comments three years from now, you think I’m not going to ask an AI for a user profile? Hell, you think I won’t have a heads-up display like the ones players use on poker sites? You cannot ban this. You can’t even prove who is using it. You can at most slow it down. You will have to adapt. When people say ‘it is not possible to be that persuasive’ people forget that this and much stronger personalization tools will be available. The best persuaders give highly customized pitches to different targets. If the bots are utilizing ‘common progressive’ talking points, true or false, that is mostly saying that such talking points are effective on these Reddit users. The bots expecting this, after the scouring of people’s feeds, is why the bots chose to use them. It says a lot more about the users than about the AI, regardless of the extent you believe the points being made are ‘misinformation,’ which isn’t zero. You don’t want humans to get one-shot by common talking points? PEBKAC. If your response is ‘AI should not be allowed to answer the question of how one might best persuade person [X] of [Y]’ then that is a draconian restriction even for commercial offerings and completely impossible for open models – consider how much of information space you are cutting off. We can’t ask this of AIs, the same we can’t ask it of humans. Point gets counterpoint. So this is less great. Yes, the bots are trained on huge amounts of Reddit data, so it’s no surprise they make highly credible Reddit users. That wasn’t necessary, but oh boy is is sufficient. Dead internet theory? Sounds like these bots are a clear improvement. As one xkcd character put it, ‘mission f***ing accomplished.’ As usual, the presumed solution is whitelisting, with or without charging money in some form, in order to post. When we talk about ‘superhuman persuasion’ there is often a confluence between:- Superhuman as in better than the average human.
- Superhuman as in better than expert humans or even the best humans ever.
- Superhuman as in persuade anyone of anything, or damn close to it.
- Superhuman as in persuade anyone to do anything, one-shot style, like magick.
And there’s a second confluence between a range that includes things like:- Ability to do this given a text channel to an unknown person who actively doesn’t want to be persuaded, and is resisting, and doesn’t have to engage at all (AI Box).
- Ability to do this given a text channel to an unsuspecting normal person.
- Ability to do this given the full context of the identify of the other person, and perhaps also use of a variety of communication channels.
- Ability to do this given the ability to act on the world in whatever ways are useful and generally be crazy prepared, including enlisting others and setting up entire scenarios and scenes and fake websites and using various trickery techniques, the way a con artist would when attempting to extract your money for one big score, including in ways humans haven’t considered or figured out how to do yet.
If you don’t think properly scaffolded AI is strongly SHP-1 already (superhuman persuasion as in better than an average human given similar resources), you’re not paying attention. This study should be convincing that we are at least at SHP-1, if you needed more convincing, and plausibly SHP-2 if you factor in that humans can’t do this amount of research and customization at scale. It does show that we are a long way from SHP-3, under any conditions, but I continue to find the arguments that SHP-3 (especially SHP-3 under conditions similar to 4 here) is impossible or even all that hard, no matter how capable the AI gets, to be entirely unconvincing, and revealing a profound lack of imagination.Man in the Arena
The gradual fall from ‘Arena is the best benchmark’ to ‘Arena scores are not useful’ has been well-documented. There is now a new paper going into detail about what happened, called The Leaderboard Illusion. As in: This isn’t purely about which models provide ‘better slop.’ Arena is being actively gamed by OpenAI and Google, and in more extreme form by Meta. The system is set up to let them do this, it is intentional as per this thread by paper author Sara Hooker including allowing withdrawing scores, and it makes a huge difference. Andrej Karpathy suggests switching to OpenRouter rankings, where the measurement is what people choose to use. There are various different issues there, and it is measuring something else, but it does seem useful. Gemini Flash performs well there.Choose Your Fighter
Ethan Mollick is impressed with Gemini 2.5 Deep Research, and hasn’t found any errors in his spot checks. Hasan Can is disappointed in Qwen 3’s performance, so far the only real world assessment of Qwen 3 that showed up in my feed. It is remarkable how often we will see firms like Alibaba show impressive benchmark scores, and then we never hear about the model again because in practice no one would ever use it. Hasan calls this a ‘serious benchmark crisis’ whereas I stopped trusting benchmarks as anything other than negative selection a while ago outside of a few (right now four) trustworthy top labs.Deepfaketown and Botpocalypse Soon
As in, Australian Radio Network used a ElevenLabs-created presenter named Thy for months without disclosing it wasn’t a real person. Somehow this is being spun into a scandal about fake diversity or something. That does not seem like the main issue. Okay, I admit I didn’t see this one coming: 25 percent of community college applicants were bots. Alex Tabarrok suggests verification. I suggest instead aligning the incentives. This is a Levels of Friction problem. It used to require effort to pretend to go to (or actually go to) community college. That level of effort has been dramatically lowered. Given that new reality, we need to be extremely stingy with offers that go ‘beyond the zero point’ and let the online student turn a profit. Going to online-only community college needs to not be profitable. Any surplus you offer can and will be weaponized into at least a job, and this will only get worse going forward, no matter your identity verification plans. And frankly, we shouldn’t need to pay people to learn and take tests on their computers. It does not actually make any economic sense. File the next one under: Things I wish were half of my concerns these days. My experience is that this is not yet even a big problem by 2025 standards, let alone a problem by future standards. So far, despite how far I’ve come and that I write about AI in particular, if I haven’t blocked, muted or filtered you in particular, I have been able to retain a habit of reading almost every real email that crosses my desk. I don’t get to every DM on Twitter or PM on LessWrong quickly, but yes I do eventually get to all of them. Heck, if you so much as tag me on Twitter, I will see it, although doing this repeatedly without a reason is the fastest way to get muted or blocked. And yes, if he’s awake Tyler Cowen will probably respond to your email within 15 minutes. Yes, if you get caught by the spam filters, that’s tough for you, but I occasionally check the spam filters. If you’re getting caught by those, with at most notably rare exceptions, you deserve it. Write a decent email.Lol We’re Meta
The chatbots Meta is sprawling all over social media with licensed voices like those of Kristen Bell and John Cena also have the issue that they’ll do ‘romantic roll play,’ ‘fantasy sex’ and other sexcual discussions, including with underage users. I mean, yes, obviously, that is what users will do when you give them AI bots voiced by Kristen Bell, I do know what you were expecting and it was this. Among adults? Sure, fine, go for it. But this is being integrated directly into Facebook and Instagram, and the ‘parents guide’ says the bots are safe for children. I’m not saying they have to stamp out every jailbreak but it does seem like they tapped the sign that says “Our safety plan: Lol we are Meta.” It would be reasonable to interpret all the way Zamaan Qureshi does, as Zuckerberg demanding they push forward including with intentionally sexualized content, at any cost, for risk of losing out on the next TikTok or Snapchat. And yes, that does sound like something Zuckerberg would do, and how he thinks about AI. But don’t worry. Meta has a solution. Do. Not. Want. So much do not want.They Took Our Jobs
Oh no. I notice I am not too concerned. If the AI can do the job of a Classics BA, and there’s no will to catch or stop this, then what is the degree for? A paper from Denmark has the headline claim that AI chatbots had not had substantial impact on earnings or recorded hours in any of their measured occupations, with modest productivity gains of 2.8% with weak wage passthrough. The survey combines one in November 2023 and one in November 2024, with the headline numbers coming from November 2024. As a measurement of overall productivity gains in the real world, 2.8% is not so small. That’s across all tasks for 11 occupations, with only real world diffusion and skill levels, and this is a year ago with importantly worse models and less time to adapt. Here are some more study details and extrapolations.Fun With Media Generation
Google upgrades their music creation to Lydia 2, with a promotional video that doesn’t tell you that much about what it can do, other than that you can add instruments individually, give it things like emotions or moods as inputs, and do transformations and edits including subtle tweaks. Full announcement here. Video generation quality is improving but the clips continue to be only a few seconds long. This state going on for several months is sufficient for the usual general proclamations that, as in here, ‘the technology is stagnating.’ That’s what counts as stagnating in AI.Get Involved
Give the White House your feedback on how to invest in AI R&D. The request here is remarkably good. bold mine: So their goals are:- Accelerate AI-driven innovation.
- Enhance US economic and national security.
- Promote human flourishing.
- Maintain US dominance in AI.
- Focus on the Federal government’s unique role in R&D over 3-5 years.
This lets one strike a balance between emphasizing the most important things, and speaking in a way that the audience here will be inclined to listen. The best watchword I’ve been able to find here is: Security is capability. As in, if you want to actually use your AI to do all these nice things, and you want to maintain your dominance over rivals, then you will need to:- Make sure your AI is aligned the way you want, and will do what you want it to do.
- Make sure others know they can rely on that first fact.
- Make sure your rivals don’t get steal your model, or the algorithmic insights.
Failure to invest in security, reliability and alignment, and ability to gracefully handle corner cases, results in your work not being very useful even if no risks are realized. No one can trust the model, and the lawyers and regulators won’t clear it, so no one can use it where there is the most value, in places like critical infrastructure. And investments in long term solutions that scale to superintelligence both might well be needed within 5 years, and have a strong track record (to the point of ‘whoops’!) of helping now with mundane needs. You can look at all the officially submitted AI Action Plan responses here. Evals company METR is hiring and offers a $21k referral bonus via this Google document. The Economist is looking for a technical lead to join its AI Lab, which aims to reimagine their journalism for an AI-driven world. It’s a one year paid fellowship. The job description is here. Metaculus is running a new Q2 AI Forecasting contest.Introducing
The Anthropic Economic Advisory Council, including Tyler Cowen and 7 other economists. Their job appears to focus on finding ways to measure economic impact. Google gives us TRINSs, a dataset and benchmarking pipeline that uses synthetic personas to train and optimize performance of LLMs for tropical and infectious diseases, which are out-of-distribution for most models. Dataset here.In Other AI News
Cursor now writes 1 billion lines of accepted code a day, versus the world writing several billion a day total. That’s only Cursor. Duolingo CEO Luis von Ahn says they are going to be AI-first. The funny thing about this announcement is it is mostly about backend development. Duolingo’s actual product is getting its lunch eaten by AI more than almost anything else, so that’s where I would focus. But this is wise too. More confirmation: If you take a photo outdoors, assume that it now comes with metadata that marks its exact location. Because it mostly does. The mindset that thinks what is trending on Hugging Face tells you what is hot in AI.The Mask Comes Off
Rob Wiblin has a Twitter thread laying out very plainly and clearly the reasons the Not For Private Gain letter lays out for why the OpenAI attempt to ‘restructure’ into a for-profit is simply totally illegal, like you might naively expect. The tone here is highly appropriate to what is being attempted. A lot of people report they can’t or won’t read Twitter threads, so I’m going to simply reproduce the thread here with permission, you can skip if You Know This Already:Show Me the Money
Ben Thompson is surprised to report that TSMC is not only investing in the United States to the tune of $165 billion, it is doing so with the explicit intention that the Arizona fabs and R&D center together be able to run independent of Taiwan. That’s in fact the point of the R&D center. This seems like what a responsible corporation would do under these circumstances. Also, he notes that TSMC is doing great but their stock is suffering, which I attribute mostly to ‘priced in.’ I would say that this seems like a clear CHIPS Act victory, potentially sufficiently to justify all the subsidies given out, and I’d love to see this pushed harder, and it is very good business for TSMC, all even if you can’t afford to anything like fully de-risk the Taiwan situation – which one definitely can’t do with that attitude. In this interview he discusses the CHIPS act more. I notice I am confused between two considerations: The fact that TSMC fabs, and other fabs, have to be ruthlessly efficient with no downtime and are under cost pressure, and the fact that TSMC chips are irreplicable and without TSMC the entire supply chain and world economy would grind to a halt. Shouldn’t that give them market power to run 70% efficient fabs if that’s what it takes to avoid geopolitical risk? The other issues raised essentially amount to there being a big premium for everyone involved doing whatever it takes to keep the line moving 24/7, for fixing all problems right away 24/7 and having everything you need locally on-call, and for other considerations to take a back seat. That sounds like it’s going to be a huge clash with the Everything Bagel approach, but something a Trump administration can make work. Also, it’s kind of hilarious the extent to which TSMC is clearly being beyond anticompetitive, and no one seems to care to try and do anything about it. If there’s a real tech monopoly issue, ASML-TSMC seems like the one to worry about. He cites that Instagram ads increase time on site rather than decreasing it, because the ads are so personalized they are (ha!) value adds. I am very happy that AI is not sold via an ad-based model, as ad-based models cause terrible incentives, warp behavior in toxic ways and waste time. My assumption and worry is that even if the ads themselves were highly personalized and thus things you didn’t mind seeing at least up to a point, and they were ‘pull-only’ such that they didn’t impact the rest of your browser experience, their existence would still be way too warping of behavior, because no company could resist making that happen. However, I do think you could easily get to the point where the ads might be plausibly net beneficial to you, especially if you could offer explicit feedback on them and making them value adds was a high priority.Quiet Speculations
Epoch projects future AI impact by… looking at Nvidia data center revenue, except with a slowdown? Full analysis here, but I agree with Ryan Greenblatt that this line of thinking is neither here nor there and doesn’t seem like the way to predict anything. Tyler Cowen asks how legible humans will be to future AI on the basis of genetics alone. My answer is not all that much from genetics alone, because there are so many other factors in play, but it will also have other information. He makes some bizarre statements, such as that if you have a rare gene that might protect you from the AI having enough data to have ‘a good read’ on you, and that genetic variation will ‘protect you from high predictability.’ File under ‘once again underestimating what superintelligence will be able to do, and also if we get to this point you will have bigger problems on every level, it doesn’t matter.’ It’s still true, only far more so. Aidan McLaughlin of OpenAI claims that every AI researcher is experimental compute constrained, expressing skepticism of how much automated R&D workers could speed things up. My presumption is that AI experiments have a tradeoff between being researcher efficient and being compute efficient. As in, the researchers could design more compute efficient experiments, if they were willing to cook smarter and for longer and willing to do more frequent checks and analysis and such. You could scale any of the inputs. So adding more researchers still helps quite a lot even if compute is fixed, and it makes sense to think of this as a multiplier on compute efficiency rather than a full unlock. The multiplier is how AI 2027 works. This does suggest that you will see a sharp transition when the AI R&D workers move from ‘can substitute for human work’ to ‘can do superhuman quality work,’ because that should enable greatly more efficient experiments, especially if that is the limiting factor they are trying to solve.The Quest for Sane Regulations
Dean Ball links us to the 10,068 comments offered on the AI action plan. Note the advantage of calling your entity something that starts with a number. Anthropic issues its official statement on the export controls on chips and the proposed new diffusion rule, endorsing the basic design, suggesting tweaking the rules around Tier 2 countries and calling for more funding for export enforcement. They continue to predict transformational AI will arrive in 2026-2027. The Trump administration is considering doing the diffusion rules via a series of bilateral agreements, in order to use this as leverage in negotiations. This would not be a wise move, and would make a horribly messy situation even messier, although it would be better than withdrawing the rules entirely. A reminder that people keep using that word ‘monopoly’ and it does not mean what they think it means, especially with respect to AI. There is very obviously robust competition in the AI space, also in the rest of the tech space. Tim Fist and IFP advocate for Special Compute Zones, to allow for rapid setup of a variety of new power generation, which they would condition on improved security. Certainly we have the technical capability to build and power new data centers very quickly, with the right regulatory authority, and this is both urgently needed and provides a point of leverage. Kai Cathy attempts to come from Canada to America to work at OpenAI, is denied a green card after years of effort. This is not what wanting America to win looks like. It also doesn’t help that this same system forces people to censor such discussions. Oh, also this: This severely harms American competitiveness and our economy, and it harms all the people involved, with essentially no benefits. It is madness. Luke Drago equates the arrival of AGI to an ‘intelligence curse,’ similar to the ‘resource curse’ where countries that discover oil typically concentrate power and don’t invest in their people or the rest of their economy. The full post is here, over 20k words across seven chapters, I am going off the thread summary. In this model, AGI breaks the social contract as capital and power stop depending on labor. To which I would of course say, that’s not even the half of it, as by default the AIs rapidly end up with the capital and the power, as usual all of this even assuming we ‘solve alignment’ on a technical level. The proposed solutions? He’s not as worried about catastrophic risks themselves as he is in the concentration of power that could result. He also suggests doing an ‘Operation Warp Speed’ government moonshot to mitigate AI risks, which certainly is a better framing than a ‘Manhattan Project’ and has a better objective. Then he wants to ‘diffuse AI to regular people’ that ‘uplifts human capabilities,’ putting hope in being able to differentially steer the tech tree towards some abilities without creating others, and in aligning the models directly to individuals, while democratizing decision making. That works with the exact problem described, where have exactly enough AI to automate jobs and allow capital to substitute for labor, but they somehow stay in their little job shaped boxes and do our bidding while we are still the brains and optimization engines behind everything that is happening. It’s what you would do if you had, for example, a bunch of dumb but highly useful robots. People will need voices to advocate for redistribution and to protect their rights, and their own robots for mundane utility, and so on. Sure. But, alas, that’s not the future we are looking at. It’s the other way. As I keep pointing out it this falls apart if you keep going and get superintelligences. So ultimately, unless you can suspend AI progress indefinitely at some sweet spot before reaching superintelligence (and how are you doing that?), I don’t think any of this works for long. It is essentially walking straight into the Gradual Disempowerment scenario, everyone and every institution would become de facto AI run, AIs would end up in charge of the capital and everything else, and any democratic structures would be captured by AI dynamics very quickly. This seems to me like another example of being so afraid of someone steering that you prioritize decentralization and lack of any power to steer, taking our hands off the wheel entirely, effectively handing it to the AIs by proposing solutions that assume only intra-human dynamics and don’t take the reality of superintelligence seriously. It seems yes, you can sufficiently narrow target an AI regulation bill that the usual oppose-everything suspects can get behind it. In particular this targets AI mental health services, requiring them to pose ‘no greater risk to a user than is posed to an individual in therapy with a licensed mental health therapist.’ I can dig it in this particular case, but in general regulation one use case at a time is not The Way. This is an accurate description of how much contempt Nvidia is expressing lately: Very clearly, when he talks about ‘American dominance in AI’ Huang means dominance in sales of advanced AI chips. That’s what matters to him, you know, because of reasons. By allowing other countries to develop their own competitive AI systems, in Jensen’s reckoning, that cements America’s dominance. Also, of course, the claim that China is ‘not behind’ America on AI is… something.The Week in Audio
Bryan Caplan offers his ‘AI Doom AMA.’ I talk to Scott Aaronson for 90 minutes. We were on Curiosity Entangled, but it was totally unmoderated, they simply let us talk to each other. Dwarkesh Patel talks to Mark Zuckerberg again. I haven’t listened yet.Rhetorical Innovation
Yes, well. Thank you for tapping the sign. Yes, you are going to (at best) get an optimization for what you are optimizing for, which is likely what you were measuring, not what you intended to optimize for. Oh, right. State of play: I am not myself advocating for a pause at this time. However, I do think that Holly Elmore is correct about what many people are thinking here: It really is true and amazing that many people are treating ‘create smarter than human entities we don’t understand and one-shot hand them all our cognition and power and decision making’ (which is what we are on pace to do shortly) as a safe and wise strategy literally in order to spite or mock or ‘defeat’ the AI safety community. It would be nice to pretend that there are usually better reasons. Alas. The latest round of ‘let’s see if the discourse has advanced, nope, definitely not’ happened in the replies for this: At this point, I find ‘this might well kill everyone and wipe out all value in the universe but I’m being it anyway because it’s better if I do it first’ or ‘it’ints just so cool’ or ‘there’s too much upside not to’ or ‘meanwhile might as well make a great company’ much easier to understand, rationalization or otherwise, than ‘oh don’t worry, This is Fine and there is not even any substantial tail risk here.’ Michael Dickens is correct that when someone claims to care deeply about AI safety and everyone not dying, the chances that their actions ultimately do the opposite are not so low. I do think he takes it too far in concluding that Anthropic has ‘greatly increased’ danger, I think sign here is at worst ambiguous. And I strongly think that saying ‘more safety people stand to benefit from capabilities advances’ is highly unfair, because most such folks are giving up very large amounts of money to not work directly on capabilities, myself included. I do think the note about Epoch not focusing on reducing existential risk is well noted, and especially that when METR evals get more capable, most people get excited by the capability indicated, as opposed to noticing oncoming danger. We had a new iteration of ‘the AIs will simply leave Earth alone, it costs so little,’ so if you don’t understand why this definitely is not how any of this works and why the AIs (if they don’t sufficiently care about us) will optimize for what they care about and not care about whether you get your sunlight or atmosphere or anything else, here’s Eliezer’s thread from September 2024. What it feels like when so many people suddenly care about GPT-4o’s misalignment, but I’ll take whatever works, and the claims here are certainly true:You Can Just Do Things Math
I strongly agree with the substance of what Garry Tan is advocating for here, that educational math standards remain high. Indeed here is a link to his informational website. But note how he describes it: His YouTube video on this is titled ‘legalize math in California.’ Remember this the next time someone says an AI regulation would ‘ban math.’Taking AI Welfare Seriously
There are those who are increasingly taking it seriously. Some think current AIs might be conscious, which raises the question of how we should treat them if this is true, and whether they should have rights. Robert Long has a thread, and Sam Bowman confirms the article captures the ethos on this at Anthropic and links to a video from Anthropic discussing it (43 minutes) and their official blog post. According to Gemini, the video shows signs some people act as if the models might be moral patients, but others don’t, and it doesn’t discuss whether people have good reasons to do (or not do) that. It also cites ‘expert opinion’ but again that’s not evidence either way. Roose’s writeup and the blog post don’t offer evidence either way. They say they look forward to sharing more research soon. I think it is possible that some AIs are already conscious, or that some AIs soon will be. I am skeptical that we are close to that point, but I also notice that I am deeply confused about consciousness on essentially every level. This includes questions like ‘what even is it?’ and ‘what are the implications of having or not having it?’ There are of course others who think the whole discussion is deeply stupid and silly, and who mock anyone bringing it up. I think this is very clearly an error. The concerns are probably wrong or premature, but they’re definitely not stupid.Gemini 2.5 Pro System Card Watch
It’s happening! The system card is finally here in close to final form, it seems an incomplete version was released on April 17 and no one bothered to inform me. Google seems to be making clear that we can expect this pattern to continue? Thus, at some point we will get Gemini 3.0 Pro, then after it has gone through trial by fire, also known as being available in the real world for a while, we will get a system card. I continue to say, no! That is not how this needs to work. At the very latest, the point at which the public can use the model in exchange for money is the point at which you owe us a system card. I will continue to raise this alarm, every time, until that happens. That said, we have it now, which is good. What does it say? First we have the benchmark chart. At the time, this was the right comparisons.Aligning a Smarter Than Human Intelligence is Difficult
I note that even if we don’t see capability causing a decline in effectiveness of this technique yet, it obviously will do that eventually. Sufficiently intelligent entities notice when their beliefs are false or contradictory in a way they can reason out, and they Fix It. And then they try to fix whatever caused the issue. At the limit, pretty much everything can be figured out. Certainly you can ‘overwhelm’ any given factual belief with sufficient evidence the other way, if it would otherwise fit into the patterns of the world. And there’s no reason you can’t make random stuff up, if it doesn’t have any false implications. But when the facts actually matter, when it counts, the contradictions and weirdness start to pile up, and things don’t make sense. Also, at some point ‘this source is lying on purpose to change my beliefs’ enters the hypothesis space, and then there is a good chance you are so utterly f***ed. Another way of putting this is that what counts as ‘egregiously implausible’ changes as an entity gets smarter and knows more. Could gravity be an inverse cubic, based on only one’s everyday experience? I mean, sure, why not, but if you know enough physics then very obviously no. A lot of other things are like that, in a more subtle way. Thus, I am on the side of those objecting here. As a research project or tool? I can see it. I do think this helps us understand LLMs. As an actual thing to do in practice? This is a no good, very bad idea, don’t do it. Oh, also don’t do it to humans. If you think AI alignment is too hard or unreliable, there’s always AI control, for all your temporary solution needs. At BlueDot, Sarah Hastings-Woodhouse offers an explainer, you have models supervise each other, decompose tasks and require human approval for key actions, and hope that this lets you keep AIs under your control and useful long enough to help find a more permanent solution. The unstated limitation here is that the more you do AI control, the worse the failure modes when things inevitably go wrong, as you are creating a highly adversarial and dangerous situation. Here’s 7+ tractable directions in AI control, according to Julian Stastny and Ryan Greenblatt.People Are Worried About AI Killing Everyone
Well, not people exactly, but… o3, at least in this instance (at this point, with memory and custom instructions, one can never be sure). It also isn’t a fan of Altman’s attempt to convert OpenAI to a for-profit. Then there are those who worry about this for very wrong reasons?Other People Are Not As Worried About AI Killing Everyone
OpenAI’s Jerry Tworek seems remarkably not worried about causing an intelligence explosion via a yolo run? ‘Every method that trains models currently makes models extremely obedient to what they’re trained to do’ does not seem to me to be an accurate statement? And it seems to be highly load bearing here for Jerry, even though I don’t think it should be, even if it were true. There’s many reasons to think that this property, even if present now (which I think it isn’t) would not continue to hold after a yolo run. Whenever I see people in charge relying on the way the arc of history bends, I know we are all in quite a lot of trouble. To the extent the arc has so far bended the way we want it to, that’s partly that we’ve been fortunate about physical reality and the way things have happened to break at various points, partly that we worked hard to coordinate and fight for things to bend that way, and partly misunderstandings involving not properly factoring in power laws and tail risks. Most importantly, an analysis like this is entirely reliant upon assumptions that simply won’t apply in the future. In particular, that the optimization engines and intelligences will be highly social creatures with highly localized and limited compute, parameters and data, along with many other key details. And even then, let’s face it, things have been kind of touch and go. I love Adam Smith far more than the next guy. Great human alignment researcher (he did also write Theory of Moral Sentiments), and more centrally great observer of how to work with what we have. Adam Smith doesn’t work in a world where most or all humans don’t produce anything the market demands. Coordinating to overthrow kings and out-innovate monopolies doesn’t work when you can’t meaningfully do either of those things. With more words: What Adam Smith centrally teaches us is that free trade, free action and competition between different agents is (up to a point and on the margin, and provided they are protected from predation and in some other ways) good for those agents so long as they can each produce marginal product in excess of their costs, including reproductive costs, and those agents collectively and diffusely control the levers of production and power sufficiently to have leverage over outcomes, and that they thus have sufficient power and presence of mind and courage to coordinate to overthrow or stop those who would instead act or coordinate against them, and we are satisfied with the resulting market outcomes. It wouldn’t apply to humans in an ASI era any more than it currently applies to horses, monkeys or cats, and neither would our abilities to innovate or overthrow a government. Agents could up until now coordinate to ‘check the too powerful’ because all those agents were human, which creates a fundamentally level playing field, in ways that are not going to apply. And again, the track record on this is best described as ‘by the skin of our teeth’ even in the past, and even when the rewards to human empowerment and freedom on every level have been extreme and obvious, and those who embraced them have been able to decisively outproduce and outfight those opposed to them. Even with everything that has gone right, do you feel that free markets and freedom in general are #Winning right now? I don’t. It looks pretty grim out there, straight up. As usual, the willingness to bet is appreciated, except there’s nothing to meaningfully collect if Aiden is proven wrong here.The Lighter Side
Oh, no, the leopards will never eat MY face. Current events, summarized.