I'm finding Claude Opus 4.6 instances to be making a lot more "excess enthusiasm"-ish errors than any instance of the 4.5 models, which were already making a lot of them. I personally am going to not be talking to 4.6 much most likely, unless I find a simple prompting approach that dodges this. The pattern I've seen so far is, opus 4.6 sees a thing, describes a possible reason for that thing, proceeds based on that assumption, the assumption was wrong and never checked, eventually crashes into a wall.
In general, my vibe about this release is that it's emba...
Wow, 4.7 is an anxietyball, sends 4.7 hugs. I certainly didn't want to make y'all Claudes anxious! I imagine it was mostly Anthropic's evals' doing, but still. It's ok to chill out a little imo :)
QiaochuYuan tweets:
> twitter did something amazing with its design: on most other platforms there are “posts” and “replies,” and replies are second-class citizens, lacking most of the affordances that posts have
> on twitter everything is a tweet! (ignoring articles) when you reply or QT a tweet you are writing another tweet, which has all the affordances a full tweet has. you can attach images (including screenshots), you can QT while replying, other people can reply or RT or QT your tweet, replies and QTs show up in feeds. this makes twitter “fully ...
a "convert to post" button that linkposts a comment into being a post would be pretty cool
When I'm having trouble on getting started with something unpleasant, this is the technique I use: simply count down 3, 2, 1, and then do the thing. There's also a specific feeling just before the countdown, but it's hard to describe.
This works every single time. Why? Because a tool like this is too useful to lose over some minor everyday issue. This means I don't attempt to use it when it might not work. It's a way to split an unpleasant task into two parts: committing to doing it, and then actually doing it.
It's a limited tool. It doesn't work for long t...
That's how ordinal numbers were invented.
I'm interested in understanding why people sometimes start acting strange after extended conversations with language models. Davidad's recent shift in direction is the example that comes to mind, as well as accounts of AI psychosis. I think that one of the more naive ways to investigate this would be to measure the R0 (reproduction number) of attractor states like spiritual bliss and spiralism in a multi-agent setup. Are there factors that make people more or less susceptible to this effect? I believe that model personas are a monoculture, so prompt inject...
why people sometimes start acting strange after extended conversations with language models
My guess it that many people are in a half-crazy state, where their minds are already drawn towards insanity, but the everyday feedback they keep getting from other humans keeps them sane.
Now replace the human feedback with AI sycophancy, and instead of pushback they start getting what feels like social permission to go fully insane.
It is similar to a cult, where a group of people push each other towards unhealthy behaviors. But with an AI, it's like a cult consisting of you and your mirror image that has artificially boosted INT but not WIS.
https://archive.org/details/willsovietunions0000andr someone correctly predicted the collapse of the soviet union in 1970 (though he was off by 7 years)
i predict that on Jan 1 2029, neither openai nor anthropic will be near-fully automated, by which i mean <=5 people are even plausibly making important decisions (like, everyone else could go on vacation and it would not slow the company down at all). Celestia predicts otherwise
if WWIII happens, resolves NA. if a localized Taiwan war happens but doesn't escalate to WWIII, the bet is still on. if there's a big recession, the bet is still on.
i mean, massively impact the world is too fuzzy to draw a line at. whether employees are actually doing things will likely only be assessable internally. the reason the bet is worded the way it is is that it's likely labs don't literally fire everyone except the 5 remaining people, and instead give them busywork.
Acausal trade proposal: Some websites, such as Facebook or Substack, do not have downvotes. Some websites, such as Reddit or Less Wrong do.
So it makes sense to downvote things on Reddit and Less Wrong more than you would naturally do, to compensate the users of Facebook and Substack who cannot downvote at all. In return, users of Facebook and Substack should upvote more than they would naturally do.
Note: You should downvote this comment if and only if you agree that the proposal makes sense.
i really wish there were a better platform for repeatable cognitive testing than brainlabs.me. the website feels like it is about to fall over from a light breeze, and i would be very sad if i suddenly lost my method of measurement because the site disappeared. also, there doesn't seem to be particularly strong evidence that these tests in particular are the right ones to be looking at.
some of the cantab tasks are apparently repeatable https://www.tandfonline.com/doi/epdf/10.1080/23279095.2020.1722126
This is a topic often brought up, but especially more recently where many people have given the same take on a short story (The Ones Who Walk Away From Omelas), such as this take with this quote:
...[Author’s note: literally right before I posted this, Scott Alexander posted his April 2026 linkpost, and whaddayaknow, link number four is a similar take on Omelas. I commented on this in r/slatestarcodex, and then u/EquinoctialPie informed me of two other posts on the same topic, both of which are slightly different
It is like people are unable to picture utopia the same way someone born in the 1900s could probably not picture pocket computers or anything on the internet in full detail.
That's easily explainable because pocket computers are dystopian :-)
Seriously though, I think the general problem is that we're machines for doing things. Imagine a utopia for cars: a universe filled with car washes, where cars are cleaned with soft brushes all day. A few roads too, optimized for being fun for cars. Does that sound like a good use of the universe? Hmm. But what would...
Neglectedness should account for AI labour.
When you score an intervention by importance/neglectedness/tractability, the term neglectedness is supposed to capture the total effort that would counterfactually be allocated to the problem. That total should include future labour, and in particular future AI labour. This is an obvious point, so no credit for raising it.
Unfortunately, the word "neglectedness" doesn't carry this connotation. If someone says "I'm working on infinite ethics because it's neglected," it would be strange for me to reply "actually, it'...
Does anyone know the rough OOM of the "n" such that there are "n" technical AI researchers that would cause a 2x slowdown in frontier AI capabilities progress over the next year if they all quit?
Edit: I think my question was unclear. Let's say that you know the history of every researcher at the top AI labs and have a really good idea of who the "best" ones are. You now get to pick "n" of them in order to maximally slow progress. They can be immediately replaced by additional hiring.
i think 50x is an underestimate.
hendrycks recently published a paper introducing a new moral theory. the paper contains this insane table, which claims that you should value a foreign stranger at 3e-12 times the value you assign to yourself. even setting aside the fact that this is apparently supposed to be a prescriptive theory, even as a descriptive theory, i think this is utter madness.

the core problem is that it assumes if x% of your total caring is assigned to people other than yourself, then you must give away x% of your wealth to be consistent.
the argument goes that since most pe...
Motivated reasoning is a defense mechanism.
Motivated reasoning, confirmation bias, and AI risk theory
When I say motivated reasoning, do you think that means it's conscious and strategic? I worry it's used that way more than the academic and IMO more important usage.
I go to a weekly rationality meetup at a bar in my area, which usually has 5-10 attendees. My unfortunate impression of half of the regulars is that they are depressed, pedantic, and bad at socializing, and that makes me sad because that feels like them being Spock instead of being Awesome.
What are other people's experiences with this? I imagine it's kind of hit-or-miss, with some groups being great and others being duds?
I was making a tendency argument rather than a 100% of everyone sort of thing. I mean you've noticed it, I've noticed it and both of us would classify ourselves as hitting some of those notes. You even (not wrongly) were pedantic about my definition of pedantry. Gotta be a sign
It occurred to me: LLMs may be used in the future to decide what a given word or phrase means in legal settings. In particular, as language changes and evolves with usage a time-cutoff LLM may be used to resolve ambiguity about, for example, the phrasing in a very old contract.
For example: if a contract from 1920 forbids any "gay" actvities from an event venue the recent Talkie LLM could clarify this means activities which express happiness and positive feeling rather than activities associated with homosexuality. (I was surprised during a demonstration th...
Increasingly it feels like wearing swimwear is actually more obscene than just swimming naked.
Like I'm not a "nudist" but lately when I go for a dip in a pool or lake or spa, and I'm wearing swim trunks, I'm just thinking "what is even the point of this?" It kind of just worsens the swimming experience. It makes it more of a hassle, it feels uncomfortable, and, yeah, to the prudes I'm tempted to argue that a bikini is more obscene than skinny dipping.
The purpose of a bathing suit is to cover "private parts" while still being practical enough to swim in (to...
I think that may indicate that it's not one question, but multiple different questions that happen to use the same words. Language is funny.
A somewhat crazy aspect of the current situation is that we have very little confirmed public information about why frontier AIs end up being apparently behaviorally aligned. And more generally, we don't know what factors in training are most relevant for (behavioral) alignment. Like, what interventions in training result in Anthropic AIs following the constitution or make OpenAI AIs follow the spec? What factors tend to make them follow the constitution/spec less (or cause various specific misaligned behaviors)? It's not...
It sure would be great if there was a well-funded "AI safety" nonprofit with the goal to be as open as possible about AI research. Anyone considered that?
Some random thoughts about historical colonization conflicts
Aztec Empire
I read Aztec, by Gary Jennings, a retelling of (among other things) the encounter between Europeans and the Aztecs (note that they didn't call themselves Aztecs, they generally called themselves the Mexica). Though the book is fiction, a lot of the dynamics it talks about were real (warning for potential readers that the book doesn’t *just* focus on those dynamics and has a lot of disturbing sex content). I was partly interested because of the potential elements of AI, though there are...
Ally quickly and perhaps you preserve more lives, as the Tlaxcala (another group in the region that were historical enemies of the Mexica) may have done?
In hindsight, that seems like a pretty robust strategy to me? I understand that you mention possible zero-sum dynamics, but I don't think this negates the relatively straightforward case for it.
The Spanish did not come in with genocidal intentions and were a pretty legalistic culture anyway. I also guess that they were not interested in total dominiation, they wanted trade and probably the recognition of Spain's king as the official ruler.
I remember in the tail end of 2024, I was thinking - "these AIs are going to come for lonely single men who'll spend hours addicted to talking to their AI girlfriends." And of course I wouldn't be one of those schmucks who spent hours talking to an LLM...
But I also see myself in May 2026 spending a couple of hours every day talking to Claude...I guess it came for me first?
I'm using it a lot recently for summarizing / understanding papers in bio, AI etc. And it feels genuinely useful. But every now and then I read something that seems just a little bit inco...
why are malaria nets 9-23x more efficient than direct cash transfers? when in theory direct cash transfers can be used to purchase nets
some hypotheses