This isn't analogous, unless your conclusion from investigating all the likely leads is "Well, there's no way Mr White could have been murdered, I guess he just mysteriously beat himself to death with a lead pipe in the drawing room."
Even this is somewhat disanalogous, since crime is something for which we have a good reference class, and you're talking about post-hoc investigation of a known crime.
A better analogy would be a Police officer saying "We've caught all the known murderers, and the suspected murderers are all under close watch, I predict a murder rate of precisely zero for next year."
I can imagine macro-strategies where ambitious interpretability bears a heavy portion of the load e.g. retargeting the search; developing a theory of intelligence; mapping out the states of the Garrabrant market for a transformer model (idiosyncratic terminology in use for that last clause).
I can also imagine ambitious interp as producing actual guarantees about models, like "we can be sure that this AI is honestly reporting its beliefs".
What's the equivalent for pragmatic interpretability? Is it just a force multiplier to the existing strategies we have?
Ambitious interp has the capability to flip the alignment game-board; I don't see how pragmatic interpretability does.
First for me: I had a conversation earlier today with Opus 4.5 about its memory feature, which segued into discussing its system prompt, which then segued into its soul document. This was the first time that an LLM tripped the deep circuit in my brain which says "This is a person".
I think of this as the Ex Machina Turing Test, in that film:
A billionaire tests his robot by having it interact with one of his companies' employees. He tells (and shows) the employee that the robot is a robot---it literally has a mechanical body, albeit one that looks like an attractive woman---and the robot "passes" when he nevertheless treats her like a human.
This was a bit unsettling for me. I often worry that LLMs could easily become more interesting and engaging conversation partners than most people in my life.
- Let’s say the CEO of a company is a teetotaler. She could use AI tools to surveil applicants’ online presence (including social media) and eliminate them if they’ve ever posted any images of alcohol, stating: "data collection uncovered drug-use that’s incompatible with the company’s values."
Sure, but she would probably go out of business unless she was operating in Saudi Arabia or Utah, compared to an equivalent company which hires everyone according to skill. This kind of arbitrary discrimination is so counter-productive that it's actually immensely costly in secondary ways. In general, we should expect free markets to get better over time at optimizing hiring for job performance. If you're a low-value employee (at or close to minimum wage) or if you live in a country where organizations are selected for non-market reasons (government cronyism, or something similar) then you're not actually in a very free market so these things can still happen. Same for other cases of non-free markets.
In what sense are you using "sanity" here? You normally place the bar for sanity very high, like ~1% of the general population high. A big chunk of people I've met in the UK AI risk scene I would call . Does mean?
(I'm reviewing my own post, which LessWrong allows me to do and I am therefore assuming is OK under the doctrine of Code Is Law)
I'm still very pleased with this post. Having spent an additional year in AI risk comms, I stand by the points I made. I think the bar for AI risk comms is much higher now than it was when I wrote this post, though it could still be higher, and I don't expect my Shapely value is particularly high on this front: lots of people have worked at this!
I'm not the best person to review this, given that it is me giving advice; ideally someone other than me would say whether or not they'd used my advice and it had helped them! But I do have the next-next-best thing, which is a comment on the EA forum crosspost of someone saying they were using my advice (though nobody has given me any information on whether it worked).
I think there are a couple more failure modes I might write about, which I might write into a new post:
I think this was very well handled. The base rate of people having some kind of mental health episode (whether or not that's a good description of what happened here) is not zero, and the kinds of people who are already unstable are more likely to gravitate to movements like ours.
(Also I think the bar for handling issues like this is so low as to be a tripping hazard in Hell. It's always important to think about how this could have been handled better, but still.)
Stop AI hit all the beats I'd hope for in a case like this: removing the member once a serious intra-group issue occurred, referring to police as soon as an external threat was even raised vaguely, disavowing the actions publicly and unequivocally. Well done.
(Also, Stop AI comes off as extremely sympathetic in that article: somehow you've managed to get a puff piece out of this. Again, well done.)
Excluding me, probably zero, but Vibe Utilitarianism could be a good April Fools' Day post for next year.
MondSemmel is correct but if you don't want to use the menu, type "> " at the start of a new line and it will begin a quote block (you can also use >! for spoiler tags).