ControlAI has launched an official campaign in Canada, with 33 politicians across party lines backing a clear statement in favor of an international prohibition against developing superintelligent AI: https://controlai.com/canada-statement/en
I used to be ambivalent about government intervention wrt AI safety, because politics can screw things up. However, it seems clearly positive to me now: the development of superintelligence impacts the whole human species, and as such, the democratic process should have a say in whether (and how) superintelligence is ...
There is a recurring pattern of motte and bailey around these accusations. It always starts with accusations of very egregious high-level things like: "trying to get people to be coy around x-risks" or "being quite consistently deceptive and engaged in strategic conflation of AI existential risk with other issues to win political battles".
This is always in the higher level threads or comments, which more people will see. Taken by themselves, these are ridiculous claims, as is shown by Andrea, Gabe and Connor’s track record.
When pushed on this, habryka retr...
Great post on using measuring logit distribution for eval awareness. Surprisingly effective, even for unverbalised eval awareness! ✍️
https://www.lesswrong.com/posts/PK7ZvFZxrgpYtrpF4/logits-as-a-new-monitor-for-evaluation-awareness-1
saw an interesting post for those also trying to transition into technical AI safety :)
In my personal experience the peak of AI was exactly a year ago. While others were discussing hallucinations and spiraling, my chats could stay useful for hundreds of messages and allow me to work with huge tasks. Back then ChatGPT could even understand a new mathematical notion I came up with and engage in discussion of the theorems.
But during the last year i was mentally tracking how long does a chat last before it starts hallucinating (saying nonsense, or generating the human part of the conversation, or starting to repeat itself, or to ignore recent co...
0) my post was observation about personal experience with free tools, not about state of the art. They do not match. Maybe some of my speech patterns were more alligned with old architectures, keeping them sane, but drive new models crazy. Maybe new models are just worse trained in tasks i work with. Maybe i am just incredibly unlucky this year having to dismiss 60% of chats as hallucinating.
I find the state of the free ai more important, since the majority of humanity will never buy any model, so free models will have more impact on society (in an optimis...
I will pay you $1000 if you refer me to a potential cofounder.
Signal: samueldashadrach.02
PGP-encrypted email also available on my website.
I am also open to some initial feedback, although I might take some weeks before properly responding to it. I am not actively in feedback-collecting mode right now.
The document is incomplete. However, I don't plan to complete it any time soon, so I figured it is better to post an incomplete document than to post no...
a corollary to the hazards of arguing against bad takes: please don't write things that are defined entirely by trying to avoid the reader coming away with specific bad takes or misunderstandings people often have.
you should write things primarily to nail down the concepts unambiguously for an audience of generic smart people. your idea should be defined by what it is, and not what it is not. it isn't SCP-055.
if you really need to, add a "things i don't mean" section to concretely describe and disavow some common misunderstandings. but it should be possibl...
I should put a reminder like this on top of my computer screen.
One reason I am often writing long comments is a feeling of defensiveness, as if I don't make my case perfectly ambiguous and bulletproof, by adding more and more words, of course someone will pick up the worst possible misinterpretation. (I had people like that in my life in the past.)
Here are some strange things parents often believe about children and food:
My wife and I reject all four of these beliefs, and (n=2) our children are healthy, happy, and not at all...
I have two kids. One of them is happy to eat a wide range of meals. The other would prefer to only eat pasta and sweets, that probably wouldn't end well.
theory: most people fall into one of the following categories (or some mix of them):
Not sure if there are separate categories.
One thing not in your list: enlightened selfish people who realize that promoting altruism as a social norm is a form of insurance, in case they also would benefit from receiving help one day.
Getting elected involves compromising your values.
Usually, when people say "compromising values", it carries a connotation of low integrity. That's not really my intention here. Instead, I mean it in a more neutral way: if you're running for office, it's pretty likely that your values will be out of step with your constituents' values in one way or another. Maybe you have a wider moral circle of concern, or have a stronger sense of justice, or whatever, leading to you holding views that are unpopular among your constituents.
This makes you less likely to wi...
a much more extreme example of num 3 is LBJ. he spent decades acting extremely racist to get the support of the south, and then did a complete about-face in the presidency, utterly betraying his southern supporters and working to pass the 1964 civil rights bill.
Over the last month or two I have rarely found it worth finishing any posts on the front page (even when I have done so, I usually felt ambivalent about the choice in retrospect). I am not sure whether I am simply busier with other things or whether the quality of the front page has went down, but as an incredibly avid lesswrong reader, it seems worth it for me to flag this bare fact (without proposing any particular explanation).
Posts seem overly long at the moment. They get boring because the substance tapers out within the first 50%. Not sure why.
My first hypothesis is that this is an artefact of Claude 4.7+ being used for editing i.e. maybe Claude 4.7+ is a decent editor but prefers overly long essays. But I have literally no evidence as I don't use AI for editing passes.
ReMVIR: the discipline of sensitive engagement, focused by theorems (whether so named or not), which makes communities superconductive of knowledge.
The acronym stands for Resolution by Mutual Verified Iterative Ratiocommunation. Ratiocommunation is understanding perception and its rationale well enough that the agent will acknowledge your expression of them as sound. Iteration zooms in, increasing resolution at the point of divergence until it is clearly imaged.
Operative in many areas; a central activity of real science. The procedure of mutual mind-debugg...
Congratulation on inventing a word "ratiocommunation" that Google search never heard about. It is a rare achievement.
Apologies if there is a substance behind what you wrote, but to me it mostly sounds like word salad (and possibly AI psychosis). Seems like you are pointing at a concept of "one person understands what the other means" but for some reason you invent complicated words and theorems and proofs.
If you are using an AI, maybe ask it to explain the entire thing using words that a 5 years old would understand?
Bertrand Russell's parents died by the time he was four years old and in 1876 he went to live with his grandfather, the last Whig prime minister, who as a young man had met Napoleon at Elba.
In 1966 at the age of 94 he met Paul McCartney and converted him to his anti-war stance on Vietnam.
Previously, I had figured that this lifespan (roughly 1870 to 1960) was the most extreme length of history for someone to live through. You have to be born early enough to remember a time when big cities didn’t have street lights or automobiles, while ideally living to see...
AGI will not just keep the show going, it will bring it to its grand finale.
I have been revisiting the 2023 post "AI Timelines" today. I would be interested in seeing what the participants within would offer as estimates at the present time.
From my own cursory estimation, it seems like Kokotajlo was wildly miscalibrated on many or most fronts, while Cotra and Erdil's predictions in 2023 seem fairly calibrated, leaning heavily in favor of Erdil (who seems in retrospect to be remarkably well-calibrated).
Scattershot takeaways:
Kokotajlo correctly assumed governments would completely fail to slow down timelines.
Erdil's predictions are ...
@Daniel Kokotajlo's most recent views are expressed in Q1 2026 Timelines Update. Maybe he will release a new update?
Edited to add: Why do you believe that the predictions of Cotra and Erdil are mostly correct? Erdil's prediction which struck me was the following:
Erdil's misprediction
My median world looks something like this: we keep scaling compute until we hit training runs at a size of 1e28 to 1e30 FLOP in maybe 5 to 10 years, and after that scaling becomes increasingly difficult because of us running up against supply constraints. Software progress cont
Anthropic releases new post about RSI and slowdowns-- also commits to doing verification research and verification-focused policymaker engagement. (In my view, this is probably the best post/announcement about AI risks that has been produced by any frontier AI company in recent memory).
https://www.anthropic.com/institute/recursive-self-improvement
...We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology.
The political winds are shifting; the public is becoming very anti-AI; x-risk is becoming more in the Overton Window. The companies are therefore moving somewhat to avoid looking too bad to too many people.
In the recent Gemini update, the ability to clear chat history all at once has been removed. Now, one must navigate to the settings, select 'Delete Activity,' and go through several additional steps, which I find rather frustrating. Furthermore, I have strong doubts as to whether deleting this data actually removes it entirely from Google's databases. I suppose I could simply stop using Gemini, but since this is more of a minor inconvenience than a critical issue, abandoning the platform altogether seems excessive. What are your thoughts on this matter?
openrouter.ai accepts cryptocurrency as payment.
There's also "remote attestation", which Apple is investing heavily in for its AI cloud. "Remote attestation" is a combination of hardware features and cryptography meant to assure the public (well, technologists who have the time to review Apple's architecture) that the software running on Apple's servers is really what Apple says it is. The public can then review the source code for this software (which Apple plans to publish) to verify that Apple is not saving chat history.
Apple's scheme is only for devel...
How do LLMs and humans compare with regards to the amount of energy, data, and compute that's used to train/run them? I was inspired by Samuel Knoche's post on sample efficiency to come up with some numbers. This table was made by Opus 4.8. after some iteration:

"Task" = "produce one thoughtful ~1,000-token answer"; unclear if this is a useful number, obliviously it doesn't generalize. I do think it's interested to compare the energy/computer ratio for inference between human and AI.
Training data for humans is a big "???". There's ~4x10^8 waking hours. Opu...
Some Fermi estimates:
However, I think the brain is much less data- and compute-efficient than an optimal AGI algorithm would be. So I don't think it is a good predictor of how much data future AI algorithms will require.
I've been interested in the potential of zero-knowledge proof type things to verify that computers are not running unfriendly AIs, to get the minimal amount of omniveillance that may be necessary to thread the Scylla and Charybdis of x-risk via extinction and x-risk via totalitarian stagnation. Each computer attesting that it's not doing <bad things>, with no more than a yes/no. Possible issues: maybe it's computationally infeasible, hard to operationalize, or can be used to do more intrusive surveillance. I know Drexler was interested in this. IIRC ...
Rather than argue for or against the consciousness of LLMs per se, I argue that consciousness in LLMs (or 'qualia') implies philosophical monism.
My claim is a logical implication, and not about the fact of the matter. If the implication holds, then:
Because LLMs are not biological, if LLMs are conscious, conscious experience must be substrate independent. Nonetheless, biological brains and LLMs share the same substrate in the sense that they are both constr...
There are plenty of non-monist theories in which LLMs could be conscious, so the first bullet point is simply false. The second point seems true, but the combination just means:
I’d make it highly recommend / mandatory on LessWrong for all posts at the top to disclose to what extent they have been generated using ideas from AI. I think a rule like that is already in place if the post is partially AI written. And I think knowing where the ideas are coming from is just as important. I’m beginning to suspect that more and more posts (everywhere) are influenced in large part by AI. In which case, why am I even here?
I do really think there's a lot of using-AI and a dearth of understanding-how-people-are-using-AI.
I think using AI is often good, but I would like more understanding of what's going on.