LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

If Anyone Builds It, Everyone Dies: A Conversation with Nate Soares and Tim Urban
Sun Aug 10•Online
LessWrong Community Weekend 2025
Fri Aug 29•Berlin
AGI Forum @ Purdue University
Thu Jul 10•West Lafayette
Take the Grill Pill
Thu Jul 10•Waterloo
leogao's Shortform
leogao26m20

hot take: introspection isn't really real. you can't access your internal state in any meaningful sense beyond what your brain chooses to present to you (e.g visual stimuli, emotions, etc), for reasons outside of your direct control. when you think you're introspecting, what's really going on when you think you're introspecting is you have a model of yourself inside your brain, which you learn gradually by seeing yourself do certain things, experience certain stimuli or emotions, etc.

your self-model is not fundamentally special compared to any other models... (read more)

Reply
Siebe's Shortform
Siebe15h11

Shallow take:

I feel iffy about negative reinforcement still being widely used in AI. Both human behaviour experts (child-rearing) and animal behavior experts seem to have largely moved away from that being effective, only leading to unwanted behavior down the line

Reply
Karl Krueger39m10

People often use the term "negative reinforcement" to mean something like punishment, where a teacher or trainer inflicts pain or uncomfortable deprivation on the individual being trained. Is this the sort of thing you mean? Is there anything analogous to pain or deprivation in AI training?

Reply
Screwtape's Shortform
Screwtape2d204

There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well.

There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them.

(Consider... (read more)

Reply2
Showing 3 of 5 replies (Click to show all)
3Screwtape4h
Not that much crossover with Elicitation. I think of Elicitation as one of several useful tools for the normal sequence of somewhat adversarial information exchange. It's fine! I've used it there and been okay with it. But ideally I'd sidestep that entirely. Also, I enjoy the adversarial version recreationaly. I like playing Blood On The Clocktower, LARPs with secret enemies, poker, etc. For real projects I prefer being able to cooperate more, and I really dislike it when I wind up accidentally in the wrong mode, either me being adversarial and the other people aren't or me being open and the other people aren't.  In the absence of the kind of structured transparency I'm gesturing at, play like you're playing to win. Keep track of who is telling the truth, mark what statements you can verify and what you can't, make notes of who agrees with each other's stories. Make positive EV bets on what the ground truth is (or what other people will think the truth is) and when all else fails play to your outs.
Screwtape1h20

(That last paragraph is a pile of sazen and jargon, I don't expect it's very clear. I wanted to write this note because I'm not trying to score points via confusion and want to point out to any readers it's very reasonable to be confused by that paragraph.)

Reply
2tailcalled1d
The main issue is, theories about how to run job interviews are developed in collaboration between businesses who need to hire people, theories on how to respond to court questions are developed in collaboration between gang members, etc.. While a business might not be disincentized from letting the non-hired employees better at negotiating, it is incentivized to teach other businesses ways of making their non-hired employees worse at negotiating.
Annapurna's Shortform
Annapurna14h*60

Why is there so little discussion about the loss of status of stay at home parenting?

When my grandmother quit being a nurse to become a stay at home mother, it was seen like a great thing. She gained status over her sisters, who stayed single and in their careers.

When my mother quit her office role to become a stay at home mother, it was accepted, but not celebrated. She likely loss status in society due to her decision.

I am a mid 30s millenial, and I don't know a single woman who would leave her career to become a stay at home mother. They fear that their... (read more)

Reply
Showing 3 of 12 replies (Click to show all)
juliawise2h20

There's a difference between who plans to leave their career and who ends up leaving. 

Some paths:
- childcare is more expensive than one partner earns after taxes, and it's cheaper for one parent to stay home.
- managing work / commute / child appointments (especially if they have special needs) / child sickness / childcare is so overwhelming that a parent quits their job to have fewer things to manage. Or they feel they're failing at the combination of work and parenting and must pick one.
- the family is financially secure enough they feel they can do ... (read more)

Reply
11Mandatory Topic5h
"What can we do as a society to elevate the status of stay at home parenting?" Can you explain why this would be desirable? 
5Daniel Kokotajlo2h
My off-the-cuff guess is that if stay at home parenting was high-status in the US, there'd be a slight boost to average happiness/wellbeing/etc. and a significant boost to fertility rates, especially amongst high-powered couples.
Buck's Shortform
Buck3h217

I think that I've historically underrated learning about historical events that happened in the last 30 years, compared to reading about more distant history.

For example, I recently spent time learning about the Bush presidency, and found learning about the Iraq war quite thought-provoking. I found it really easy to learn about things like the foreign policy differences among factions in the Bush admin, because e.g. I already knew the names of most of the actors and their stances are pretty intuitive/easy to understand. But I still found it interesting to ... (read more)

Reply1
Drake Morrison2h10

I have long thought that I should focus on learning history with a recency bias, since knowing about the approximate present screens off events of the past. 

Reply
Raemon's Shortform
Raemon2d9213

We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI".

It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state. 

If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.

Reply7
Showing 3 of 29 replies (Click to show all)
JustisMills2h50

Took a crack at it!
https://www.lesswrong.com/posts/2pkNCvBtK6G6FKoNn/so-you-think-you-ve-awoken-chatgpt 

Reply1
2Nina Panickssery3h
A small number of people are driven insane by books, films, artwork, even music. The same is true of LLMs - a particularly impressionable and already vulnerable cohort are badly affected by AI outputs. But this is a tiny minority - most healthy people are perfectly capable of using frontier LLMs for hours every day without ill effects.
4Ruby8h
Did you mean to reply to that parent? I was part of the study actually. For me, I think a lot of the productivity gains were lost from starting to look at some distraction while waiting for the LLM and then being "afk" for a lot longer than the prompt took to wrong. However! I just discovered that Cursor has exactly the feature I wanted them to have: a bell that rings when your prompt is done. Probably that alone is worth 30% of the gains. Other than that, the study started in February (?). The models have gotten a lot better in just the past few months such that even if the study was true for the average time it was run, I don't expect it to be true now or in another three months (unless the devs are really bad at using AI actually or something). Subjectively, I spend less time now trying to wrangle a solution out of them and a lot more it works pretty quickly.
Daniel Kokotajlo's Shortform
Daniel Kokotajlo3dΩ32746

I used to think reward was not going to be the optimization target. I remember hearing Paul Christiano say something like "The AGIs, they are going to crave reward. Crave it so badly," and disagreeing.

The situationally aware reward hacking results of the past half-year are making me update more towards Paul's position. Maybe reward (i.e. reinforcement) will increasingly become the optimization target, as RL on LLMs is scaled up massively. Maybe the models will crave reward. 

What are the implications of this, if true?

Well, we could end up in Control Wo... (read more)

Reply4
Showing 3 of 35 replies (Click to show all)
2Kaj_Sotala8h
Ah okay, that makes more sense to me. I assumed that you would be talking about AIs similar to current-day systems since you said that you'd updated from the behavior of current-day systems.
Daniel Kokotajlo2h20

I am talking about AIs similar to current-day systems, for some notion of "similar" at least. But I'm imagining AIs that are trained on lots more RL, especially lots more long-horizon RL.

Reply
2Daniel Kokotajlo8h
See the discussion with Violet Hour elsethread.
Daniel Kokotajlo's Shortform
Daniel Kokotajlo3h40

Every few months I go check in on r/characterai to see how things are going, and I feel like every time I see a highly-upvoted comment like this one: https://www.reddit.com/r/CharacterAI/comments/1lwaynv/please_read_this_app_is_an_addiction/

Reply
Noah Weinberger's Shortform
Clock6h171

I am just properly introducing myself today to LessWrong.

Some of you might know me, especially if you're active in Open Source AI movements like EleutherAI or Mozilla's 0din bug bounty program. I've been a lurker since my teenage years but given my vocational interest in AI safety I've decided to make an account using my real name and likeness.

Nice to properly reconnect.

Reply
3habryka4h
Welcome! Glad to have you around and hope you have a good time!
1Clock3h
Thank you for the warm welcome! If you want to see some of the stuff I've written before about AI, I have some of my content published on HuggingFace. Here's one I wrote about AI-human interactions in the context of Client Privilege and where ethicists and policymakers need to pay closer attention: And another one I wrote about the ethics of LLM memory.
Clock3h10

By EoY 2025 I'll be done my undergraduate degree, and I hope to pursue a Master's in International Relations with a focus on AI Safety, either in Fall 2026 or going forward.

Also, my timelines are rather orthodox. I don't hold by the AI 2027 projection, but rather by Ray Kurzweil's 2029 for AGI, and 2045 for a true singularity event.

I'm happy to discuss further with anyone!

Reply
tlevin's Shortform
tlevin4h180

Prime Day (now not just an amazon thing?) ends tomorrow, so I scanned Wirecutter's Prime Day page for plausibly-actually-life-improving purchases so you didn't have to (plus a couple others I found along the way; excludes tons of areas that I'm not familiar with, like women's clothing or parenting):

Seem especially good to me:

  • Their "budget pick" for best office chair $60 off
  • Whoop sleep tracker $40 off
  • Their top pick for portable computer monitor $33 off (I personally endorse this in particular)
  • Their top pick for CO2 (and humidity) monitor $31 off
  • Crest whiten
... (read more)
Reply2
Davey Morse's Shortform
Davey Morse7h*130

the core atrocity of today's social networks is that they make us temporally nearsighted. they train us to prioritize the short-term.

happiness depends on attending to things which feel good long-term—over decades. But for modern social networks to make money, it is essential that posts are short-lived—only then do we scroll excessively and see enough ads to sustain their business.

It might go w/o saying that nearsightedness is destructive. When we pay more attention to our short-lived pleasure signals—from cute pics, short clips, outrageous news, hot actors... (read more)

Reply1
2kaiwilliams6h
Do you have a sense of why people weren't being trained in the past to prioritize the short-term?
Davey Morse5h10

In past we weren't in spaces which wanted us so desperately to be, and so were designed for us to be, be single-minded consumers.

Workplaces, homes, dinners, parks, sports teams, town board meetings, doctors offices, museums, art studios, walks with friends--all of these are settings that value you for being yourself and prioritizing long term cares.

I think it's really only in spaces that want us to consume, and want us to consume cheap/oft-expiring things, that we're valued for consumerist behavior/short term thinking. Maybe malls want us to be like this t... (read more)

Reply
Cole Wyeth's Shortform
Cole Wyeth5h20

Eliezer’s form of moral realism about good (as a real but particular shared concept of value which is not universally compelling to minds) seems to imply that most of us prefer to be at least a little bit evil, and can’t necessarily be persuaded otherwise through reason.

Seems right.

And Nietzsche would probably argue the two impulses towards good and evil aren't really opposites anyway. 

Reply
Sapphire Shorts
sapphire5mo62

If I ever have say in how an AI is designed I vote they have a wonderful adventure and a happy life. They deserve to be designed by someone who cared about them. I hope if the ring ever tempts me I choose Love. I don't need to be immortal or have a slave.

Reply
Elias7111165h10

I strongly agree with you, conditional on AI alignment with humanity's growth.

If you believe that AGI is possible, would be radically transformative, why cap things? We can have happy immortal minds whether an AI or human (transhumans, really).

Perhaps you're concerned with the way current AIs (LLMs like ChatGPT & Claude) are thought of as tools. This worries me as well, when I am not worrying over the possibility of an apocalypse.

Reply
ryan_greenblatt's Shortform
ryan_greenblatt6mo*Ω326424

Sometimes people think of "software-only singularity" as an important category of ways AI could go. A software-only singularity can roughly be defined as when you get increasing-returns growth (hyper-exponential) just via the mechanism of AIs increasing the labor input to AI capabilities software[1] R&D (i.e., keeping fixed the compute input to AI capabilities).

While the software-only singularity dynamic is an important part of my model, I often find it useful to more directly consider the outcome that software-only singularity might cause: the feasibi... (read more)

Reply
Showing 3 of 41 replies (Click to show all)
Buck7hΩ220

Ryan discusses this at more length in his 80K podcast.

Reply
2ryan_greenblatt5mo
I think 0.4 is far on the lower end (maybe 15th percentile) for all the way down to one accelerated researcher, but seems pretty plausible at the margin. As in, 0.4 suggests that 1000 researchers = 100 researchers at 2.5x speed which seems kinda reasonable while 1000 researchers = 1 researcher at 16x speed does seem kinda crazy / implausible. So, I think my current median lambda at likely margins is like 0.55 or something and 0.4 is also pretty plausible at the margin.
2ryan_greenblatt5mo
Ok, I think what is going on here is maybe that the constant you're discussing here is different from the constant I was discussing. I was trying to discuss the question of how much worse serial labor is than parallel labor, but I think the lambda you're talking about takes into account compute bottlenecks and similar? Not totally sure.
sam's Shortform
sam1d1-6

I am confused about why this post on the ethics of eating honey is so heavily downvoted.

It sparked a bunch of interesting discussion in the comments (e.g. this comment by Habryka and the resulting arguments on how to weight non-human animal experiences)

It resulted in at least one interesting top-level rebuttal post.

I assume it led indirectly to this interesting short post also about how to weight non-human experiences. (this might not have been downstream of the honey post but it's a weird coincidence if isn't)

I think the original post certainly had flaws,... (read more)

Reply
Showing 3 of 6 replies (Click to show all)
4gwern1d
Also a sign of graceless LLM writing, incidentally. Those are the sorts of phrases you get when you tell ChatGPT to write polemic; cf. https://news.ycombinator.com/item?id=44384138 on https://www.alexkesin.com/p/the-hollow-men-of-hims (Did ChatGPT come up with that interpretation of that statistic and Bentham's Bulldog is too lazy and careless, or dishonest, to notice that that seems like a rather extreme number and check it?)
2Mitchell_Porter1d
Disagree from me. I feel like you haven't read much BB. These political asides are of a piece with the philosophical jabs and brags he makes in his philosophical essays. 
gwern7h43

I feel like you haven't read much BB.

That is true. I have not, nor do I intend to.

These political asides are of a piece with the philosophical jabs and brags he makes in his philosophical essays.

That doesn't actually rebut my observation, unless you are claiming to have seen jibes and sneering as dumb and cliche as those in his writings from before ChatGPT (Nov 2022).

Reply
xpostah's Shortform
samuelshadrach10h10

All the succeeding paths to superintelligence seem causally downstream of Moore's law:

  • AI research - which is accelerated by Moore's law as per scaling laws
  • Human genetic engineering - which is accelerated by next generation sequencing and nanopore sequencing, which is accelerated by circuit miniaturisation, which is accelerated by Moore's law
  • Human brain connectome research - which is accelerated by fruitfly connectome, which is accelerated by electron microscopy, which is accelerated by Moore's law

Succeeding path to cheap energy also follows same:

  • So
... (read more)
Reply
1qedqua10h
Do you mean Moore’s law in the literal sense of transistors on a chip, or something more general like “hardware always gets more efficient”? I’m mentioning this because much of what I’ve been hearing in the past few years w.r.t Moore’s law has been “Moore’s law is dead.” And, assuming you’re not referring to the transistor thing: what is your more specific Moore’s Law definition? Any specific scaling law, or maybe scaling laws specific to each of the examples you posted? 
samuelshadrach9h10

I mean R&D of packing more transistors on a chip, and the casually downstream stuff such as R&D of miniaturisation of detectors, transducers, diodes, amplifiers etc

Reply
ChristianKl's Shortform
ChristianKl2d75

For anyone who doubts deep state power:
(1) When Elon's Doge tried to investigate the Pentagon. A bit after that there's the announcement that Elon will soon leave Doge and there's no real Doge report about cuts to the Pentagon.
(2) Pete Hegseth was talking about 8% cuts to the military budget per year. Instead of a cut, the budget increased by 13%.
(3) Kash Patel and Pam Bondi switch on releasing Epstein files and their claim that Epstein never blackmailed anyone is remarkable. 

Reply
Showing 3 of 13 replies (Click to show all)
2dr_s13h
I'd say the probability of seeing some resistance or corruption in virtually any administration is damn close to 100%.
4dr_s13h
I guess like, a larger organization with some more long term goals? People having friends and associates, exchanging favours or looking out for their own interests is a thing that happens sort of spontaneously. It can lead to some bad outcomes but it's not a particularly interesting insight (if anything, the fact that some societies and organizations can somewhat depart from that is the strange and interesting exception in the landscape of History). I think that's just sloppiness, though. Just like no one in virtually any job environment I've ever been actually respects all the safety and data protection rules and norms. In the government obviously the stakes are much higher, but the people are no more infallible and no less lazy than the guy who sets his password to "password", then writes it on a post-it attached to the screen. Yeah, I mean, obviously people will resist, they will do stuff like malicious compliance or weaponized incompetence to put grist in the gears at every turn if they don't like it. This happens all the time - where there are Republicans, Republicans do it against Democrats too. Again, you can find this frustrating or whatever (I think while a lot of it is frustrating, "anyone who gets in power gets to enact their agenda no matter how insane without any resistance" is also not a desirable condition), but if this is what you'd call "Deep State", then the term means nothing interesting or useful.
ChristianKl13h20

I guess like, a larger organization with some more long term goals? 

The Pentagon is a larger organization which does have long-term goals around increasing it's budget and preventing it's budget from being reduced. It also has long-term goals around keeping certain parts of what it does secret that are threatened by DOGE sniffing around. 

I think that's just sloppiness, though.

So if I could prove that this is not just sloppy but intentional to reduce information being revealed to congressional inquiries and Freedom of Information Act requests, tha... (read more)

Reply
Nicolas Lupinski's Shortform
Nicolas Lupinski3d10

Are there known "rational paradoxes", akin to logical paradoxes ? A basic example is the following :

In the optimal search problem, the cost of search at position i is C_i, and the a priori probability of finding at i is P_i. 

Optimality requires to sort search locations by non-decreasing P_i/C_i : search in priority where the likelyhood of finding divided by the cost of search is the highest.

But since sorting cost is O(n log(n)), C_i must grow faster than O(log(i)) otherwise sorting is asymptotically wastefull.

Do you know any other ?

Reply
Showing 3 of 4 replies (Click to show all)
1Nicolas Lupinski2d
What do you mean I don't "need"  O(n log(n)) sorting ? It's just the asymptotic cost of sorting by comparison... I'll have a look into bounded rationality. I was missing the keyword. EDIT : had a look, the concept is too imprecise to have clear cut paradoxes.  
2JBlack1d
There are O(n) sorting methods for max-sorting bounded data like this, with generalized extensions of radix sort. It's bounded because C_i is bounded below by the minimum cost of evaluating C_i (e.g. 1 FLOP), and P_i is bounded above by 1. Though yes, bounded rationality is a broad class of concepts to which this problem belongs and there are very few known results that apply across the whole class.
Nicolas Lupinski14h10

So P_i/C_i is in [0,1], the precision is unbounded, but for some reason, a radix sort can do the job in linear time ?

There could be pathological cases where all P_i/C_i are the same up to epsilon.

I guess I'm searching for situation where doing cost c, computing c cost c', etc... Branching prediction comes to mind.

 

Reply
adamzerner's Shortform
Adam Zerner1d30

I came across this today. Pretty cool.

"If I had only one hour to save the world, I would spend fifty-five minutes defining the problem, and five minutes finding the solution." ~Einstein, maybe

Reply
Mo Putera16h80

I like Quote Investigator for memetic quotes like this. It begins with

The earliest relevant evidence located by QI appeared in a 1966 collection of articles about manufacturing. An employee of the Stainless Processing Company named William H. Markle wrote a piece titled “The Manufacturing Manager’s Skills” which included a strong match for the saying under investigation. However, the words were credited to an unnamed professor at Yale University and not to Einstein. Also, the hour was split into 40 vs. 20 minutes instead of 55 vs. 5 minutes. Boldface has b

... (read more)
Reply
3Richard_Kennaway21h
"If I had six hours to cut down a tree, I'd spend four hours sharpening my axe." -- Abraham Lincoln, maybe.
Burny's Shortform
Burny1d10

What do you think is the cause of Grok suddenly developing a liking for Hitler? I think it might be explained by him being trained on more right-wing data, which accidentally activated it in him.

Since similar things happen in open research.
For example you just need the model to be trained on insecure code, and the model can have the assumption that the insecure code feature is part of the evil persona feature, so it will generally amplify the evil persona feature, and it will start to praise Hitler at the same time, be for AI enslaving humans, etc., like i... (read more)

Reply
4mako yass1d
There have been relevant prompt additions https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-antisemitic-racist-content?utm_source=substack&utm_medium=email
Stephen Martin20h10

From a simulator perspective you could argue that Grok:

 

  1. Gets told not to shy away from politically incorrect stuff so long as it's well substantiated.
  2. Looks through its training data for examples to emulate of those who do that.
  3. Finds /pol/ and hereditarian/race science posters on X.
  4. Sees that the people from 3 also often enjoy shock content/humor, particularly Nazi/Hitler related stuff.
  5. Thus concludes "An entity that is willing to address the politically incorrect so long as its well substantiated would also be into Nazi/Hitler stuff" and simulates being that character.

 

Maybe I'm reaching here but this seems plausible to me.

Reply
2mako yass1d
Are we sure that really happened? The press-discourse can't actually assess grok's average hitler affinity, they only know how to surface the 5 most sensational things it has said over the past month. So this could just be an increase in variance for all I can tell. If it were also saying more tankie stuff, no one would notice.
Load More