LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

If Anyone Builds It, Everyone Dies: A Conversation with Nate Soares and Tim Urban
Sun Aug 10•Online
LessWrong Community Weekend 2025
Fri Aug 29•Berlin
LW-Cologne meetup
Sat Jul 12•Köln
07/14/25 Monday Social 7pm-9pm @ Segundo Coffee Lab
Tue Jul 15•Houston
Thane Ruthenis's Shortform
Thane Ruthenis1d*Ω15336

It seems to me that many disagreements regarding whether the world can be made robust against a superintelligent attack (e. g., the recent exchange here) are downstream of different people taking on a mathematician's vs. a hacker's mindset.

Quoting Gwern:

A mathematician might try to transform a program up into successively more abstract representations to eventually show it is trivially correct; a hacker would prefer to compile a program down into its most concrete representation to brute force all execution paths & find an exploit trivially proving it

... (read more)
Reply5
2Noosphere897h
A key issue here is that computer security is portrayed as way poorer in popular articles than it actually is, because there are some really problematic incentives, and a big problematic incentive is that the hacker mindset is generally more fun to play as a role, as you get to prove something is possible rather than proving that something is intrinisically difficult or impossible to do, and importantly journalists have no news article and infosec researchers don't get paid money, which is another problematic incentive. Also, people never talk about the entities that didn't get attacked with a computer virus, which means that we have a reverse survivor bias issue here: https://www.lesswrong.com/posts/xsB3dDg5ubqnT7nsn/poc-or-or-gtfo-culture-as-partial-antidote-to-alignment And a comment by @anonymousaisafety changed my mind a lot on hardware vulnerabilities/side-channel attacks, as it argues that lots of the hardware vulnerabilities like Rowhammer have insane requirements to actually be used such that they are basically worthless, and two of the more notable requirements for these hardware vulnerabilities to work is that you need to know what exactly you are trying to attack in a way that doesn't matter for more algorithmic attacks, and no RAM scrubbing needs to be done, and if you want to subvert the ECC RAM, you need to know the exact ECC algorithm, which means side-channel attacks are very much not transferable/attacking one system successfully doesn't let you attack another with the same side-channel attack. Admittedly, it does require us trusting that he is in fact as knowledgable as he claims to be, but if we assume he's correct, then I wouldn't be nearly as impressed by side-channel attacks as you are, and in particular this sort of attack should be assumed to basically not work in practice unless there's a lot of evidence for it actually being used to break into real targets/POCs: https://www.lesswrong.com/posts/etNJcXCsKC6izQQZj/pivotal-outcomes-and-pi
4Cole Wyeth13h
Yeah, I like this framing. I don’t really know how to make it precise, but I suspect that real life has enough hacks and loopholes that it’s hard to come up with plans that knowably don’t have counterplans which a smarter adversary can find, even if you assume that adversary is only modestly smarter. That’s what makes me doubt that what I called adversarially robust augmentation and distillation actually works in practice. I don’t think I have the frames for thinking about this problem rigorously. 
quetzal_rainbow3h20

The concept of weird machine is the closest to be useful here and an important quetion here is "how to check that our system doesn't form any weird machine here".

Reply
Daniel Kokotajlo's Shortform
Daniel Kokotajlo2d6618

I have recurring worries about how what I've done could turn out to be net-negative.

  • Maybe my leaving OpenAI was partially responsible for the subsequent exodus of technical alignment talent to Anthropic, and maybe that's bad for "all eggs in one basket" reasons.
  • Maybe AGI will happen in 2029 or 2031 instead of 2027 and society will be less prepared, rather than more, because politically loads of people will be dunking on us for writing AI 2027, and so they'll e.g. say "OK so now we are finally automating AI R&D, but don't worry it's not going to be superintelligent anytime soon, that's what those discredited doomers think. AI is a normal technology."
Reply171
Showing 3 of 10 replies (Click to show all)
Nullity3h32

I wouldn’t worry too much about these. It’s not at all clear that all the alignment researchers moving to Anthropic is net-negative, and for AI 2027, the people who are actually inspired by it won’t care too much if you’re being dunked on.

Plus, I expect basically every prediction about the near future to be wrong in some major way, so it’s very hard to determine what actions are net negative vs. positive. It seems like your best bet is to do whatever has the most direct positive impact.

Thought this would help, since these worries aren’t productive, and anything you do in the future is likely to lower p(doom). I’m looking forward to whatever you’ll do next.

Reply
19lc7h
Frankly - this is what is going to happen, and your worry is completely deserved. The decision to name your scenario after a "modal" prediction you didn't think would happen with even >50% probability was an absurd communication failure.
3leogao8h
i think the exodus was not literally inevitable, but it would have required a heroic effort to prevent. imo the two biggest causes of the exodus were the board coup and the implosion of superalignment (which was indirectly caused by the coup). my guess is there will be some people who take alignment people less seriously in long timelines because of AI 2027. i would not measure this by how loudly political opponents dunk on alignment people, because they will always find something to dunk on. i think the best way to counteract this is to emphasize the principle component that this whole AI thing is really big deal, and that there is a very wide range of beliefs in the field, but even "long" timeline worlds are insane as hell compared to what everyone else expects. i'm biased, though, because i think sth like 2035 is a more realistic median world; if i believed AGI was 50% likely to happen by 2029 or something then i might behave very diffrently
Kabir Kumar's Shortform
Kabir Kumar6h-10

2 hours ago I had a grounded, real, moment when I realized agi is actually going to be real and decide the fate of everyone I care about and I personally, am going to need to significantly play a big role in making sure that it doesn't kill them and felt fucking terrified.

Reply3
Cole Wyeth's Shortform
Cole Wyeth13h30

That moment when you’ve invested in building a broad and deep knowledge base instead of your own agency and then LLMs are invented. 

it hurts

Reply
11Thane Ruthenis12h
I don't see it that way. Broad and deep knowledge is as useful as ever, and LLMs are no substitutes for it. This anecdote comes to mind: This fits with my experience. If you're trying to do some nontrivial research or planning, you need to have a vast repository of high-quality mental models of diverse phenomena in your head, able to be retrieved in a split-second and immediately integrated into your thought process. If you need to go ask an LLM about something, this breaks the flow state, derails your trains of thought, and just takes dramatically more time. Not to mention unknown unknowns: how can you draw on an LLM's knowledge about X if you don't even know that X is a thing? IMO, the usefulness of LLMs is in improving your ability to build broad and deep internal knowledge bases, rather than in substituting these internal knowledge bases.
Cole Wyeth11h63

This is probably right. Though perhaps one special case of my point remains correct: the value of a generalist as a member of a team may be somewhat reduced. 

Reply1
Zach Stein-Perlman's Shortform
Zach Stein-Perlman2d11055

iiuc, xAI claims Grok 4 is SOTA and that's plausibly true, but xAI didn't do any dangerous capability evals, doesn't have a safety plan (their draft Risk Management Framework has unusually poor details relative to other companies' similar policies and isn't a real safety plan, and it said "‬We plan to release an updated version of this policy within three months" but it was published on Feb 10, over five months ago), and has done nothing else on x-risk.

That's bad. I write very little criticism of xAI (and Meta) because there's much less to write about than... (read more)

Reply
Showing 3 of 12 replies (Click to show all)
2Peter Wildeford12h
What do you think of this argument that Grok 4 used only ~1/5th RLVR training as pretraining (~3e26 pre-training + ~6e25 RLVR)? https://x.com/tmychow/status/1943460487565578534
Vladimir_Nesov11h50

RLVR involves decoding (generating) 10K-50K long sequences of tokens, so its compute utilization is much worse than pretraining, especially on H100/H200 if the whole model doesn't fit in one node (scale-up world). The usual distinction in input/output token prices reflects this, since processing of input tokens (prefill) is algorithmically closer to pretraining, while processing of output tokens (decoding) is closer to RLVR.

The 1:5 ratio in API prices for input and output tokens is somewhat common (it's this way for Grok 3 and Grok 4), and it might reflect... (read more)

Reply
2ACCount1d
Waiting for elaboration on that then.  Not releasing safety eval data on day 0 is a bad vibe, but releasing it after you release the model is better than not releasing it at all.
Cole Wyeth's Shortform
Cole Wyeth12h20

LLM coding assistants may actually slow developers down, contrary to their expectations: 

https://www.lesswrong.com/posts/9eizzh3gtcRvWipq8/measuring-the-impact-of-early-2025-ai-on-experienced-open

(Epistemic status: I am signal boosting this with an explicit one-line summary that makes clear it is bearish for LLMs, because scary news about LLM capability acceleration is usually more visible/available than this update seems to be. Read the post for caveats.)

Reply
Mo Putera's Shortform
Mo Putera15h40

Kishore Mahbubani, Singaporean diplomat and former president of the UN Security Council, studied philosophy full-time as an undergraduate in the late 60s. Recounting that period in his autobiography Living the Asian Century he wrote

For the final examinations, which I took at the end of my fourth year, our degree was determined by how well we did in eight three-hour examinations. In one of the papers, we had to answer a single question. The one question I chose to answer over three hours was “Can a stone feel pain?” 

From my exam results, I gained a fir

... (read more)
Reply
alti's Shortform
alti15h10

The recent article by John Baez is important.

  https://johncarlosbaez.wordpress.com/2025/02/08/category-theorists-in-ai/

It outlines applications of category theory to AI safety.

Reply
Lun's Shortform
Lun2d261

Someone has posted about a personal case of vision deterioration after taking lumina and a proposed mechanism of action. I learned about lumina on lesswrong a few years back, so sharing this link.

https://substack.com/home/post/p-168042147

For the past several months I have been slowly losing my vision, and I may be able to trace it back to taking the Lumina Probiotic. Or rather, one of its byproducts that isn’t listed in the advertising

I don't know enough about this to make an informed judgement on the accuracy of the proposed mechanism. 

Reply1
Showing 3 of 6 replies (Click to show all)
Lao Mein15h20

My Lumina aldehyde dehydrogenase deficiency post was cited by him in support, and I think that it is extremely unlikely he is correct. The mechanisms proposed just don't work - it is far harder for soluble chemicals to reach the optic nerves via diffusion from oral tissues than through blood. There are so many things wrong with his analysis I could do a 3-hour presentation.

I would be willing to bet a bullet to my head (I'll let him pick the bullet) vs. $100 that his blindness did not result from Lumina -> high oral formic acid levels -> direct cellular diffusion to optic nerve -> blindness.

Reply
6mako yass1d
Someone who's not a writer could be expected to not have a substack account until the day something happens and they need one, with zero suspicion. Someone who's a good writer is more likely to have a pre-existing account, so using a new alt raises non-zero suspicion.
6Lun2d
fwiw I made my account in January, which I guess is still very new relative to average age of account here but hopefully means you can trust I didn't make this account just to drop a link to the lumina post.
Raemon's Shortform
Raemon4d939

We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI".

It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state. 

If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.

Reply7
Showing 3 of 35 replies (Click to show all)
Morpheus16h*51

Emacs has Eliza still built in by default of course :)

Reply
1RationalElf1d
How do you know the rates are similar? (And it's not e.g. like fentanyl, which in some ways resembles other opiates but is much more addictive and destructive on average)
9Guive1d
Also, I bet most people who temporarily lose their grip on reality from contact with LLMs return to a completely normal state pretty quickly. I think most such cases are LLM helping to induce temporary hypomania rather than a permanent psychotic condition. 
Isopropylpod's Shortform
Isopropylpod1mo*100

I don't understand how illusionists can make the claims they do (and a quick ramble about successionists).

The main point for this being that I am experiencing qualia right now and ultimately it's the only thing I can know for certain. I know that me saying "I experience qualia and this is the only true fact I can prove form certain about the universe" isn't verifiable from the outside, but certainly other people experience the exact same thing? Are illusionists, and people who claim qualia doesn't exist in general P-Zombies?

As for successionists, and hones... (read more)

Reply
Showing 3 of 31 replies (Click to show all)
2Morpheus1d
Earlier you say: How are these compatible? You don't care if all other humans die after you die unless you are responsible?
Isopropylpod17h10

That's pretty much it! If everyone in the world was set to die four minutes after I died, and this was just an immutable fact of the universe, then that would be super unfortunate, but oh well, I can't do anything about it, so I shouldn't really care that much. In the situation in which I more directly cause/choose, not only have I cut my and everyone else's lives short to just a year, I also am directly responsible, and could have chosen to just not do that!

Reply
2Isopropylpod1mo
If I come back, then I wasn't dead to begin with, and I'll start caring then. Until then, the odds are low enough that it doesn't matter.
Hide's Shortform
Hide1d3-1

Grok 4 doesn’t appear to be a meaningful improvement over other SOTA models. Minor increases in benchmarks are likely the result of Goodharting.  

I expect that GPT 5 will be similar, and if it is, this gives greater credence to diminishing returns on RL & compute.  


It appears the only way we will see continued exponential progress is with a steady stream of new paradigms like reasoning models. However, reasoning models are a rather self-suggesting and low-hanging fruit, and new needle-moving ideas will become increasingly hard to come by.

As a result, I’m increasingly bearish on AGI within 5-10 years, especially as a result of merely scaling within the current paradigm.

Reply
Vladimir_Nesov17h40

Current AIs are trained with 2024 frontier AI compute, which is 15x original GPT-4 compute (of 2022). The 2026 compute (that will train the models of 2027) will be 10x more than what current AIs are using, and then plausibly 2028-2029 compute will jump another 10x-15x (at which point various bottlenecks are likely to stop this process, absent AGI). We are only a third of the way there. So any progress or lack thereof within a short time doesn't tell much about where this is going by 2030, even absent conceptual innovations.

Grok 4 specifically is made by xA... (read more)

Reply
Tarnish's Shortform
Tarnish20h-31

Partial alignment is not necessarily symmeric.

Imagine two people, Edward and Gene.

If you maximized Edward's values, you'd get 100% of galaxies tiled with some kind of nice utopia. If you maximized Gene's values, you'd get 95% of galaxies tiled with the exact same kind of nice utopia, and 5% of galaxies tiled with maximally tortured copies of Gene's ex-wife.

Gene would be pretty happy with the world where Edward's values are maximized. He would think that it's not optimal (it's missing the ex-wife torture galaxies), but still captures most of what Gene cares... (read more)

Reply
Lao Mein's Shortform
Lao Mein1d*70

I previously made a post that hypothesized a combination of the extra oral ethanol from Lumina and genetic aldehyde deficiency may lead to increased oral cancer risks in certain population. It has been cited in a recent post about Lumina potentially causing blindness in humans.

I've found that hypothesis less and less plausible ever since I published that post. I still think it is theoretically possible in a small proportion (extremely bad oral hygene) of the aldehyde deficient population, but even then it is very unlikely to raise the oral cancer incidence... (read more)

Reply
Linch's Shortform
Linch1d40

Many people appreciated my Open Asteroid Impact startup/website/launch/joke/satire from last year. People here might also enjoy my self-exegesis of OAI, where I tried my best to unpack every Easter egg or inside-joke you might've spotted, and then some. 

Reply
ProgramCrafter's Shortform
ProgramCrafter1d10

The three statements "there are available farmlands", "humans are mostly unemployed" and "humans starve" are close to incompatible when taken together. Therefore, most things an AGI could do will not ruin food supply very much.

Unfortunately, the same cannot be said of electricity, and fresh water could possibly be used (as coolant) too.

Reply
Stephen Martin1d51

Moving things from one place to another, especially without the things getting ruined in transit, is way harder than most people think. This is true for food, medicine, fuel, you name it.

Reply
1Karl Krueger1d
Modern conventional farming relies on inputs other than land and labor, though. Disrupting petrochemical industry would mess with farming quite a bit, for instance.
tlevin's Shortform
tlevin2d252

Prime Day (now not just an amazon thing?) ends tomorrow, so I scanned Wirecutter's Prime Day page for plausibly-actually-life-improving purchases so you didn't have to (plus a couple others I found along the way; excludes tons of areas that I'm not familiar with, like women's clothing or parenting):

Seem especially good to me:

  • Their "budget pick" for best office chair $60 off
  • Whoop sleep tracker $40 off
  • Their top pick for portable computer monitor $33 off (I personally endorse this in particular)
  • Their top pick for CO2 (and humidity) monitor $31 off
  • Crest whiten
... (read more)
Reply2
Showing 3 of 5 replies (Click to show all)
2habryka2d
I am genuinely uncertain whether this is a joke.  We do happen to have had had the great Air Conditioner War of 2022: https://www.lesswrong.com/posts/MMAK6eeMCH3JGuqeZ/everything-i-need-to-know-about-takeoff-speeds-i-learned 
2tlevin1d
Alas, was hoping the smiley at the end would give it away...
habryka1d20

It did cause my probability to go from 20% to 80%, so it definitely helped! 

Reply
Joseph Miller's Shortform
Joseph Miller1d20

Does anyone have a summary of Eliezer Yudkowsky's views on weight loss?

Reply
Hide1d10

There's a good overview of his views expressed in this manifold thread.

Basically:

  • Caloric restriction works, however it impedes his productivity ("ability to think").
  • Exercise isn't effective in promoting weight loss or reducing weight gain due to compensatory metabolic throttling during non-exercise times
  • His fat metabolism is poor, because his fat cells are inclined to leach glucose and triglycerides from his bloodstream to sustain themselves rather than be net contributors, and the effect is that muscle loss makes up the difference, leading to unfavourabl
... (read more)
Reply
Buck's Shortform
Buck2d3811

I think that I've historically underrated learning about historical events that happened in the last 30 years, compared to reading about more distant history.

For example, I recently spent time learning about the Bush presidency, and found learning about the Iraq war quite thought-provoking. I found it really easy to learn about things like the foreign policy differences among factions in the Bush admin, because e.g. I already knew the names of most of the actors and their stances are pretty intuitive/easy to understand. But I still found it interesting to ... (read more)

Reply2
Cole Wyeth1d40

How do you recommend studying recent history?

Reply
1Drake Morrison2d
I have long thought that I should focus on learning history with a recency bias, since knowing about the approximate present screens off events of the past. 
Load More