"You know the kinda guy who flips fifteen coins, gets seven heads, tells you he flipped ten coins[1], claims he got eight heads, then when someone digs into it and points out one of those heads was actually a tails he makes you prove it exhaustively before admitting maybe he said one false thing but jeez why are you so invested in every single mistake he makes?"
". . . no, I don't think I know anyone like that. Why do you ask?"
"He just volunteered to run a competitive coin-flipping tournament."
Technically true! He did flip ten coins! He just flipped five m
Sounds like you have enough material for another interesting article!
There is something weird about LLM-produced text. It seems to be very often the case that if I'm trying to read a long text that has been produced primarily by an LLM, I notice that I find it difficult to pay attention to the text. Even if there's apparently semantically rich content, I notice that I'm not even trying to decode it.
the typical LLM writing style has a tendency to make people's eyes slide off of it.
It's kind of similar to the times when your attention wanders away during reading, and then you realize that you were scanning/semi-re...
Tom White has this non-exhaustive list of telltale tics of LLM writing:
...
- The “It’s not X, it’s Y” Antithesis
- The most common tell. A fake profundity wrapped in a neat contrast: “We’re not a company, we’re a movement.” “It’s not just a tool, it’s a journey.” Humans use this sparingly; AI uses it compulsively
- The Punchline Em-Dash
- Every section feels like it’s waiting for a big reveal—until the reveal is obvious or hollow
- The Three-Item List
- AI loves the rhythm of threes: “clarity, precision, and impact.” It’s a pattern baked deep into training data and reinforced
New reacts available only to paid users of LessWrong Premium (not you freeloaders) facilitate frictionless, borderline-telepathic communication.

‘I will NEVER change my mind’: Use this react to assert that you’re content with exactly how wrong you are (which is not at all), and that the case is permanently closed on this matter, so far as you’re concerned.1
‘EY Stamp of Approval’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky agrees with the contents of the comment, rendering it beyond reproach.
‘NOT EY Approved’: Use this react...
I’ve seen utilizations ranging from ‘This post belongs in the toilet’ to ‘I enjoyed reading this on the toilet.’
Perhaps both. The composer and musician Max Reger once responded to a disagreeable review thus:
Ich sitze im kleinsten Raum des Hauses. Ich habe Ihre Kritik vor mir. Bald werde ich sie hinter mir haben.
New book out today: The Infinity Machine: Demis Hassabis, DeepMind, and the Quest for Superintelligence
This article (archived paywall bypass) by the author includes some quotes:
...Hassabis acknowledges the setbacks, and yet he keeps racing forward. His faith in governance mechanisms now shattered, he has come to see salvation, paradoxically, in his own career advancement. He knows that his personal goal is to shape AI for the good of humanity. His new safety agenda therefore involves securing personal influence.
“Safety isn’t about governance structures,” Hass
Hassabis is fluent in the full gamut of AI doom scenarios. He met one of his DeepMind co-founders, Shane Legg, at a lecture on AI safety. He buttonholed his first financial backer, Peter Thiel, at a Singularity summit, where futurists shared visions of machines that outsmart people.
...In 2015, seeking to put flesh on Google’s promise of an AI-oversight board, DeepMind arranged a secret gathering of philosophers and technologists. To lock in potential rivals, and to promote his singleton vision, Hassabis granted Elon Musk the honor of convening the meeting at
https://en.wikipedia.org/wiki/Anti-satellite_weapon
It's only getting easier. You also don't necessarily need to shoot it down. You can try to hit it with another satellite, or use directed energy. If you're desperate, you can try to trigger Kessler.
Meanwhile, on the ground you can rest in the relative safety of layered air defense.
You have limited fuel for manoeuvring.
I find trying to find funding or paid roles or even unpaid roles so demoralizing. How do I keep motivated?
I don't want to focus on trying to survey the landscape of funding opportunities and learning to network with people productively. It's so much nicer to just focus on the work I want to be doing, but it seems I either can't make it legible enough fast enough, or it's actually not valuable and I should go do something else with my time.
I want advice. How do I get funding? How do I think about getting funding? How do I stay motivated to keep thinking about how to get funding?
What work are you doing? Is any of it publicly viewable?
If you code with Claude Code and you randomly ask it a question about something non-related to the thing you are doing right now it will get pissed off. Example:
Hmm... it might depend on context. I can give you more examples but I can't share the exact conversation because it's usually work related. Would it be interesting to have an exact chat that can replicate this?
I keep thinking about having a server where I'd run ai coding agents. Why: always on for recurring tasks (that would use my discount tokens 🙃) + I can send messages to them from my phone + better isolation vs running things directly on my laptop (but still some scoped access)
Do people do this/what's your setup? (I'm vaguely aware of the Mac mini OpenClaw meme, but I'm not particularly interested in running local models, so doesn't seem like the best fit?)
I wrote an HTTP server for my home computer that wraps Claude Code sessions running in containers (using the Agent SDK), and then I access it from my phone using Tailscale.
In retrospect, I'd advise something simpler: Just use a local user account to run it instead of containers. Mine works well enough that I can't be bothered to rewrite it though.
Just wrapping Claude Code to have HTML output was 90% of the work, and adding things like scheduling would be pretty easy.
https://happy.engineering/ seems like a more polished app that works similarly (and doesn't...
Regarding AGI race dynamics -- I wonder if there's an intuition pump for 'time vs competitor' preference?
For example, to me, based on my current knowledge, I think Anthropic reaching RSI before the next best company (Deepmind, maybe?) is worth about two years of time. (I.e. I estimate equal safety-relevant outcomes from Claude hitting RSI in 2027 as from Gemini hitting RSI in 2029).
That's a super weird framework, and I just made up that two years number, but I think maybe helps me reason through preferences.
The neat thing about the framework is that it's p...
Would love to know where the disagreement is, btw. If you disagree, is it the framing as a whole you think is not useful? Or the specific spitball numbers?
One of the things I hated most when I first saw a Claude Code demo. Disrespectful of my time and limited cognitive bandwidth to throw in a lot of completely meaningless, distracting, wasteful, exhausting BS to be 'cute'.
(On Gwern.net, we would never do that. If we had to have anything beyond the standard, compact, understandable, spinning cursor, then we would at least encode some sort of useful semantics into it, like sorting them by implied expected thinking time.)
traveling through Europe, looking out the window, and seeing the national flag flying next to the flag of the EU fills me with a strange feeling. this isn't an original thought at all, but still: it's really crazy that just 50 years ago Europe was divided by the iron curtain, and that people would have to go to insane lengths and risk their lives to get across that border; and that less than 100 years ago all of these countries were at war with each other, and had been at war on and off for centuries with ever shifting alliances and boundaries.
I think we should be relatively less worried about instrumental power-seeking and relatively more worried about terminal power-seeking. Note that this is only a relative update on the margin, and maybe on net I am still more concerned about the instrumental version because I started much more concerned about it. This is also not a super recent update—I just haven't seen it written up before.
Simple argument:
Certainly the really concerning thing here is (1). Though indeed one way you might get (1) is by generalization from (2).
Occurred to me that a perfect predictor would not need to go through the ritual of presenting boxes and asking to choose. It already knows the outcome, so it would just give $1000 to those who would two-box and $1M to those who would one-box. The Newcomb's paradox thought experiment has been thereby dissolved. Thank you for your attention to the matter.
you had (and lost) me at (A -> B) -> A
I think it is probably possible in principle to train superintelligence on a laptop, and I worry that this inconvenient fact is often elided in discourse about halting AI. It is extremely helpful that for now, AI training is so absurdly inefficient that non-proliferation strategies roughly as light-touch as the IAEA—e.g., bans on AI data centers, or powerful GPUs—might suffice to seriously slow AI progress. And I think humanity would be foolish not to take advantage of this relatively cheap temporary opportunity to slow AI progress, so that we can buy as m...
Nah, my model allows ASI without massive compute at any point in the process, see “Foom & Doom 1: ‘Brain in a box in a basement’” (esp. §1.3), and maybe also “The nature of LLM algorithmic progress” §4.
I am convinced that a incorrigible aligned (CEV-like) SI is the only definition of aligned SI which survives paradox.
If it was corrigible, that would imply a select group of actors could in principal re-orient its preference model, which would imply two paradoxes:
That it somehow has somehow its capacity for moral reasoning is inferior to humans, which is near approximately equal to determining "what matters to us" violating the SI definition of 'all means of reasoning that matter to us'
That some particular subset of agents other than the representati
The point I am making is more subtle and precise than that. I am saying that because of the implications of corrigibility in an SI scenario, If you believe that CEV-like SI in principal can exist and is worthwhile pursuing - the implication of that is that you are suggesting that orthagonality necessarily doesn't hold at the limit of rationality.
Why do you think there is such an implication?
In light of a recent post and comment, and several months of thinking, I have come to the position that one of our (humanity's) biggest problems is that we suck at precise coordination at every level.
This is not very specifically defined but I am trying to gesture at a problem area I think is super important. Some thoughts to convey my intuition here:
Apologies for the delayed response.
I should note that the post was somewhat hastily written – I agree that my categorization was not comprehensive, and yours is probably better.
I was mainly trying to point at a dynamic I see often online where influential voices present arguments regarding AI risk that completely ignore years of back-and-forth discussion on similar topics, but whose positions are interpreted as the "forefront" of the debate – leading to offshoot discussions that again, miss years of relevant literature and discourse. I think this leads to,...
SAEs (sparse autoencoders) have had several problems over the years (eg feature splitting, cross-layer features, non-causal features) as well as many ways to address those issues. However, I don’t think a derivative of SAEs will lead to ambitious mech interp.
The Apollo (Now Goodfire) folks of Lee, Lucius, Dan have worked on Parameter Decomposition (PD)^[1]^, a weight-based approach intending to improve over SAEs in a couple ways:
Previously, I said:
...People are very worried about a future in which a lot of the Internet is AI-generated. I'm kinda not. So far, AIs are more truth-tracking and kinder than humans. I think the default (conditional on OK alignment) is that an Internet that includes a much higher population of AIs is a much better experience for humans than the current Internet, which is full of bullying and lies.
All such discussions hinge on AI being relatively aligned, though. Of course, an Internet full of misaligned AIs would be bad for humans, but the reason is human di
So far, we have documented cases of Generative AI being used to subvert elections in Romania (actually causing an annulment).
AFAIK that was not because of Gen AI, though the broader point of your comment does stand.