It seems to me that many disagreements regarding whether the world can be made robust against a superintelligent attack (e. g., the recent exchange here) are downstream of different people taking on a mathematician's vs. a hacker's mindset.
...A mathematician might try to transform a program up into successively more abstract representations to eventually show it is trivially correct; a hacker would prefer to compile a program down into its most concrete representation to brute force all execution paths & find an exploit trivially proving it
The concept of weird machine is the closest to be useful here and an important quetion here is "how to check that our system doesn't form any weird machine here".
I have recurring worries about how what I've done could turn out to be net-negative.
I wouldn’t worry too much about these. It’s not at all clear that all the alignment researchers moving to Anthropic is net-negative, and for AI 2027, the people who are actually inspired by it won’t care too much if you’re being dunked on.
Plus, I expect basically every prediction about the near future to be wrong in some major way, so it’s very hard to determine what actions are net negative vs. positive. It seems like your best bet is to do whatever has the most direct positive impact.
Thought this would help, since these worries aren’t productive, and anything you do in the future is likely to lower p(doom). I’m looking forward to whatever you’ll do next.
This is probably right. Though perhaps one special case of my point remains correct: the value of a generalist as a member of a team may be somewhat reduced.
iiuc, xAI claims Grok 4 is SOTA and that's plausibly true, but xAI didn't do any dangerous capability evals, doesn't have a safety plan (their draft Risk Management Framework has unusually poor details relative to other companies' similar policies and isn't a real safety plan, and it said "We plan to release an updated version of this policy within three months" but it was published on Feb 10, over five months ago), and has done nothing else on x-risk.
That's bad. I write very little criticism of xAI (and Meta) because there's much less to write about than...
RLVR involves decoding (generating) 10K-50K long sequences of tokens, so its compute utilization is much worse than pretraining, especially on H100/H200 if the whole model doesn't fit in one node (scale-up world). The usual distinction in input/output token prices reflects this, since processing of input tokens (prefill) is algorithmically closer to pretraining, while processing of output tokens (decoding) is closer to RLVR.
The 1:5 ratio in API prices for input and output tokens is somewhat common (it's this way for Grok 3 and Grok 4), and it might reflect...
LLM coding assistants may actually slow developers down, contrary to their expectations:
(Epistemic status: I am signal boosting this with an explicit one-line summary that makes clear it is bearish for LLMs, because scary news about LLM capability acceleration is usually more visible/available than this update seems to be. Read the post for caveats.)
Kishore Mahbubani, Singaporean diplomat and former president of the UN Security Council, studied philosophy full-time as an undergraduate in the late 60s. Recounting that period in his autobiography Living the Asian Century he wrote
...For the final examinations, which I took at the end of my fourth year, our degree was determined by how well we did in eight three-hour examinations. In one of the papers, we had to answer a single question. The one question I chose to answer over three hours was “Can a stone feel pain?”
From my exam results, I gained a fir
The recent article by John Baez is important.
https://johncarlosbaez.wordpress.com/2025/02/08/category-theorists-in-ai/
It outlines applications of category theory to AI safety.
Someone has posted about a personal case of vision deterioration after taking lumina and a proposed mechanism of action. I learned about lumina on lesswrong a few years back, so sharing this link.
https://substack.com/home/post/p-168042147
For the past several months I have been slowly losing my vision, and I may be able to trace it back to taking the Lumina Probiotic. Or rather, one of its byproducts that isn’t listed in the advertising
I don't know enough about this to make an informed judgement on the accuracy of the proposed mechanism.
My Lumina aldehyde dehydrogenase deficiency post was cited by him in support, and I think that it is extremely unlikely he is correct. The mechanisms proposed just don't work - it is far harder for soluble chemicals to reach the optic nerves via diffusion from oral tissues than through blood. There are so many things wrong with his analysis I could do a 3-hour presentation.
I would be willing to bet a bullet to my head (I'll let him pick the bullet) vs. $100 that his blindness did not result from Lumina -> high oral formic acid levels -> direct cellular diffusion to optic nerve -> blindness.
We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI".
It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state.
If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.
Emacs has Eliza still built in by default of course :)
I don't understand how illusionists can make the claims they do (and a quick ramble about successionists).
The main point for this being that I am experiencing qualia right now and ultimately it's the only thing I can know for certain. I know that me saying "I experience qualia and this is the only true fact I can prove form certain about the universe" isn't verifiable from the outside, but certainly other people experience the exact same thing? Are illusionists, and people who claim qualia doesn't exist in general P-Zombies?
As for successionists, and hones...
That's pretty much it! If everyone in the world was set to die four minutes after I died, and this was just an immutable fact of the universe, then that would be super unfortunate, but oh well, I can't do anything about it, so I shouldn't really care that much. In the situation in which I more directly cause/choose, not only have I cut my and everyone else's lives short to just a year, I also am directly responsible, and could have chosen to just not do that!
Grok 4 doesn’t appear to be a meaningful improvement over other SOTA models. Minor increases in benchmarks are likely the result of Goodharting.
I expect that GPT 5 will be similar, and if it is, this gives greater credence to diminishing returns on RL & compute.
It appears the only way we will see continued exponential progress is with a steady stream of new paradigms like reasoning models. However, reasoning models are a rather self-suggesting and low-hanging fruit, and new needle-moving ideas will become increasingly hard to come by.
As a result, I’m increasingly bearish on AGI within 5-10 years, especially as a result of merely scaling within the current paradigm.
Current AIs are trained with 2024 frontier AI compute, which is 15x original GPT-4 compute (of 2022). The 2026 compute (that will train the models of 2027) will be 10x more than what current AIs are using, and then plausibly 2028-2029 compute will jump another 10x-15x (at which point various bottlenecks are likely to stop this process, absent AGI). We are only a third of the way there. So any progress or lack thereof within a short time doesn't tell much about where this is going by 2030, even absent conceptual innovations.
Grok 4 specifically is made by xA...
Partial alignment is not necessarily symmeric.
Imagine two people, Edward and Gene.
If you maximized Edward's values, you'd get 100% of galaxies tiled with some kind of nice utopia. If you maximized Gene's values, you'd get 95% of galaxies tiled with the exact same kind of nice utopia, and 5% of galaxies tiled with maximally tortured copies of Gene's ex-wife.
Gene would be pretty happy with the world where Edward's values are maximized. He would think that it's not optimal (it's missing the ex-wife torture galaxies), but still captures most of what Gene cares...
I previously made a post that hypothesized a combination of the extra oral ethanol from Lumina and genetic aldehyde deficiency may lead to increased oral cancer risks in certain population. It has been cited in a recent post about Lumina potentially causing blindness in humans.
I've found that hypothesis less and less plausible ever since I published that post. I still think it is theoretically possible in a small proportion (extremely bad oral hygene) of the aldehyde deficient population, but even then it is very unlikely to raise the oral cancer incidence...
Many people appreciated my Open Asteroid Impact startup/website/launch/joke/satire from last year. People here might also enjoy my self-exegesis of OAI, where I tried my best to unpack every Easter egg or inside-joke you might've spotted, and then some.
The three statements "there are available farmlands", "humans are mostly unemployed" and "humans starve" are close to incompatible when taken together. Therefore, most things an AGI could do will not ruin food supply very much.
Unfortunately, the same cannot be said of electricity, and fresh water could possibly be used (as coolant) too.
Moving things from one place to another, especially without the things getting ruined in transit, is way harder than most people think. This is true for food, medicine, fuel, you name it.
Prime Day (now not just an amazon thing?) ends tomorrow, so I scanned Wirecutter's Prime Day page for plausibly-actually-life-improving purchases so you didn't have to (plus a couple others I found along the way; excludes tons of areas that I'm not familiar with, like women's clothing or parenting):
Seem especially good to me:
It did cause my probability to go from 20% to 80%, so it definitely helped!
There's a good overview of his views expressed in this manifold thread.
Basically:
I think that I've historically underrated learning about historical events that happened in the last 30 years, compared to reading about more distant history.
For example, I recently spent time learning about the Bush presidency, and found learning about the Iraq war quite thought-provoking. I found it really easy to learn about things like the foreign policy differences among factions in the Bush admin, because e.g. I already knew the names of most of the actors and their stances are pretty intuitive/easy to understand. But I still found it interesting to ...
How do you recommend studying recent history?