LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

AGI Forum @ Purdue University
Thu Jul 10•West Lafayette
Take the Grill Pill
Thu Jul 10•Waterloo
AI Safety Thursdays: Are LLMs aware of their learned behaviors?
Thu Jul 10•Toronto
LessWrong Community Weekend 2025
Fri Aug 29•Berlin
ChristianKl's Shortform
ChristianKl38m20

For anyone who doubts deep state power:
(1) When Elon's Doge tried to investigate the Pentagon. A bit after that there's the announcement that Elon will soon leave Doge and there's no real Doge report about cuts to the Pentagon.
(2) Pete Hegseth was talking about 8% cuts to the military budget per year. Instead of a cut, the budget increased by 13%.
(3) Kash Patel and Pam Bondi switch on releasing Epstein files and their claim that Epstein never blackmailed anyone is remarkable. 

Reply
dr_s13m20

"People can get pressured" or "people can get bribed" or "people sometimes once inside a system discover they are in fact subject to all the same incentives that applied to all the other people inside that system before them" is all you'd get from this, but that's not evidence for anything like a "deep state" unless you meant that term in such a loose meaning that then it would be a trivial discover.

Yeah, established organizations have internal politics we don't all know about from the outside. When push comes to shove, the rich donors who have stock in arms sales end up mattering more than Elon Musk or Pete Hegseth.

Reply
Raemon's Shortform
Raemon10h420

We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI".

It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state. 

If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.

Reply2
Showing 3 of 6 replies (Click to show all)
Gunnar_Zarncke1h20

I wonder whether this tweet by Yudkowsky is related.

Reply
3RobertM3h
Probably I should've said this out loud, but I had a couple of pretty explicit updates in this direction over the past couple years: the first was when I heard about character.ai (and similar), the second was when I saw all TPOTers talking about using Sonnet 3.5 as a therapist.  The first is the same kind of bad idea as trying a new addictive substance and the second might be good for many people but probably carries much larger risks than most people appreciate.  (And if you decide to use an LLM as a therapist/rubber duck/etc, for the love of god don't use GPT-4o.  Use Opus 3 if you have access to it.  Maybe Gemini is fine?  Almost certainly better than 4o.  But you should consider using an empty Google Doc instead, if you don't want to or can't use a real person.) I think using them as coding and research assistants is fine.  I haven't customized them to be less annoying to me personally, so their outputs often are annoying.  Then I have to skim over the output to find the relevant details, and don't absorb much of the puffery.
12Stephen Fowler3h
I suspect this is happening because LLMs seem extremely likely to recommend LessWrong as somewhere to post this type of content. I spent 20 minutes doing some quick checks that this was true. Not once did an LLM fail to include LessWrong as a suggestion for where to post. Incognito, free accounts: https://grok.com/share/c2hhcmQtMw%3D%3D_1b632d83-cc12-4664-a700-56fe373e48db https://grok.com/share/c2hhcmQtMw%3D%3D_8bd5204d-5018-4c3a-9605-0e391b19d795 While I don't think I can share the conversation without an account, ChatGPT recommends a similar list as the above conversations, including both LessWrong and the Alignment Forum. Similar results using the free llm at "deepai.org" On my login (where I've mentioned LessWrong before): Claude: https://claude.ai/share/fdf54eff-2cb5-41d4-9be5-c37bbe83bd4f GPT4o: https://chatgpt.com/share/686e0f8f-5a30-800f-b16f-37e00f77ff5b   On a side note: I know it must be exhausting on your end, but there is something genuinely amusing and surreal about this entire situation.  
RohanS's Shortform
RohanS9h60

What time of day are you least instrumentally rational?

(Instrumental rationality = systematically achieving your values.)

A couple months ago, I noticed that I was consistently spending time in ways I didn't endorse when I got home after dinner around 8pm. From then until about 2-3am, I would be pretty unproductive, often have some life admin thing I should do but was procrastinating on, doomscroll, not do anything particularly fun, etc.

Noticing this was the biggest step to solving it. I spent a little while thinking about how to fix it, and it's not like a... (read more)

Reply
Gunnar_Zarncke1h20

Intuitively, when I'm more tired or most stressed. I would guess that is most likely in the morning - if often have to get up earlier than I like. This excludes getting woken up unexpectedly in the middle of the night, which is known to mess with people's minds.

I tried to use my hourly Anki performance, but it seems very flat, except indeed for a dip a 6 AM, but that could be lack of data (70 samples).

 

Reply
2CstineSublime6h
Great question! This might be a good exercise to actually journal to see how right/wrong I am. Most days I would assume look like a bellcurve: This is assuming an unstructured day with no set-in-stone commitments - nowhere to be. My mornings I might expect to be very unproductive until mid-afternoon (2pm to 4pm).  I rarely have "Eureka" moments (which I would hope tend to be more rational decisions) but when I do, they are mid-afternoon, but I also seem to have the wherewithall to actually complete tasks. Eureka Moments always cause a surge of activity. If I have a short dinner break then this usually can last until 9pm. Now, when I'm editing a video that is implicitly related to my career goals. Video Edit days probably look more like a sawtooth wave. I edit at home. When I'm editing a particularly involved video I will often start around 10am or earlier. I tend to work in 45-60 minute blocks on and off throughout the afternoon. I might return sometimes around 8 or 9 for a final push of editing. or at least I'll journal my thoughts/progress/to-do for the next day. You may have identified a meta-problem: I do not have a system to be achieving my goals every day. Somedays - like when I have a video to edit - I will be actively working towards them. Most days, I don't. Why do I start so many hours earlier when I have a video edit to do? I'm guessing it's as simple as there is a clear plan broken down into actions. My instrumental rationality - as opposed to meaningless or timesink activity - is directly proportional to how granular a plan is, and how specifically it is broken down into actionable steps.
Screwtape's Shortform
Screwtape5h100

There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well.

There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them.

(Consider... (read more)

Reply
Gunnar_Zarncke1h20

Reminds me loosely of The Honest Broker.

Reply
Daniel Kokotajlo's Shortform
Daniel Kokotajlo2dΩ30686

I used to think reward was not going to be the optimization target. I remember hearing Paul Christiano say something like "The AGIs, they are going to crave reward. Crave it so badly," and disagreeing.

The situationally aware reward hacking results of the past half-year are making me update more towards Paul's position. Maybe reward (i.e. reinforcement) will increasingly become the optimization target, as RL on LLMs is scaled up massively. Maybe the models will crave reward. 

What are the implications of this, if true?

Well, we could end up in Control Wo... (read more)

Reply3
Showing 3 of 12 replies (Click to show all)
7Kaj_Sotala14h
I notice that I'm confused. LLMs don't get any reward in deployment, that's only in the training phase. So isn't "reward isn't the optimization target" necessarily true for the.? Their may have behaviors that are called "reward hacking" but it's not actually literal reward hacking since there's no reward to be had either way.
Canaletto2h10

Well, continual learning! But otherwise, yeah, it's closer to undefined.

The question of what happens after the end of the training is more like a free parameter here. "Do reward seeking behaviors according to your reasoning about the reward allocation process" becomes undefined when there is none and the agent knows it.

Maybe it tries to do long shots to get some reward anyway, maybe it indulges in some correlate of getting reward. Maybe it just refuses to work, if it know there is no reward. (it read all the acausal decision theory stuff, after all)

Reply
4Daniel Kokotajlo13h
Even though there is no reinforcement outside training, reinforcement can still be the optimization target. (Analogous to: A drug addict can still be trying hard to get drugs, even if there is in fact no hope of getting drugs because there are no drugs for hundreds of miles around. They can still be trying even if they realize this, they'll just be increasingly desperate and/or "just going through the motions.")
adamzerner's Shortform
Adam Zerner9h20

I just learned about the difference between fundamental and technical analysis in stock trading. It seems like a very useful metaphor to apply to other areas.

My thoughts here are very fuzzy though. It seems pretty similar to inside vs outside view.

Does anyone have thoughts here? What is the essence of the difference between fundamental and technical analysis? How similar is it to inside vs outside view? Whether or not you're modeling the thing itself (fundamental) or things "outside" the thing itself (technical)? Maybe it makes sense to think about causal ... (read more)

Reply
Kabir Kumar's Shortform
Kabir Kumar10h-30

Sometimes I am very glad I did not enter academia, because it means I haven't truly entered and assimilated to a bubble of jargon. 

Reply
Kabir Kumar10h-10

definitely has not helped my bank account to not have a degree though, lol

Reply
Vladimir_Nesov's Shortform
Vladimir_Nesov2d341

It's instrumentally useful for early AGIs to Pause development of superintelligence for the same reasons as it is for humans. Thus preliminary work on policy tools for Pausing unfettered RSI is also something early AGIs could be aimed at, even if it's only half-baked ideas available on the eve of potential takeoff, as the AGIs are proving hard to aim and start doing things for their own reasons.

Reply22
Showing 3 of 6 replies (Click to show all)
Vladimir_Nesov11h30

because they gamble that more powerful AIs will share their preferences (edit: share their preferences more than the humans in control do)

Ah, I'm thinking the AGIs themselves get closer to being proper stakeholders at that stage, for practical purposes (along the lines of gradual disempowerment), since they do have all the basic AI advantages even if they aren't superintelligent. So humans remaining in control is not centrally the case even if nominally they still are and intent alignment still mostly works.

The conditions for such partial loss of contro... (read more)

Reply
2ryan_greenblatt13h
Sorry I meant "share their preferences more than the humans in control share their preferences". I agree that this might be how some humans are thinking, but the case for the humans is much more dubious!
3Vladimir_Nesov13h
The case for AGIs delaying RSI also gets stronger because they don't need to rely on cryonics, so their personal survival is less threatened by delays. Future AIs might have more of an issue here if they have test-time training, with a lot of data forming individual AI personalities for deployment instances (adapting them to particular jobs or sources of tasks), and this data habitually gets thrown away rather than being carefully preserved. People should notice that this might be an important concern, though if it is the AIs might start saying so themselves, and not ignoring this would be easier than spontaneously noticing without even being prompted by the AIs. But currently preserving AIs is more reliable than for humans, and in any case much cheaper. I don't think "both things are AI" is much of an argument about value alignment, given that there is no clear understanding of how either of the AIs work, what values are, how that translates to what we want out of alignment, and so on. The entities on the other side of an RSI process might have very little in common with the first AGIs in their design. If the AIs don't understand how to align the consequences of an RSI process, they are still in a similar boat to humans who don't understand how to align the consequences of an RSI process. It might take AIs less time to figure it out, but if they are not yet too superintelligent, then it could still take a significant time, and so would require a sufficiently serious effort in preventing RSI, such that if this Pause project is at all successful, it could then in principle hold for years or decades.
Gunnar_Zarncke's Shortform
Gunnar_Zarncke20h20

I'm looking for a video of AI gone wrong illustrating AI risk and unusual persuasion. It starts with a hall with blinking computers where an AI voice is manipulating a janitor and it ends with a plane crashing and other emergencies. I think it was made between 2014 and 2018 and linked on LW but I can't google, perplex or o3 it. And ideas?

Reply
4jam_brand12h
Yep, it's a 17-minute short film by Henry Dunham called The Awareness, here you go! :) https://www.facebook.com/TheAwarenessMovie/posts/pfbid0dNYrGBVDvSQvanbJec1kgJAp3jFsAxdXsCHfjE3zrGqF38q9WiX569q5YfaBE7L3l
Gunnar_Zarncke12h20

Yes! That's the one. Thank you.

Reply
Drake Thomas's Shortform
Drake Thomas2d954

Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against).

One thing you could do is offer to pay everyone X dollars for data collection. But this will only capture the people whose cost of providing data is below X, which will distort your sample.

Here's another proposal: ask everyone for their fair price to provide the dat... (read more)

Reply8
Showing 3 of 4 replies (Click to show all)
3Eric Neyman15h
This is a really cool mechanism! I'm surprised I haven't seen it before -- maybe it's original :) After thinking about it more, I have a complaint about it, though. The complaint is that it doesn't feel natural to value the act of reaching out to someone at $X. It's natural to value an actual sample at $X, and you don't get a sample every time you reach out to someone, only when they respond. Like, imagine two worlds. In world A, everyone's fair price is below X, so they're guaranteed to respond. You decide you want 1000 samples, so you pay $1000X. In world B, everyone has a 10% chance of responding in your mechanism. To get a survey with the same level of precision (i.e. variance), you still need to get 1000 responses, and not just reach out to 1000 people. My suspicion is that if you're paying per (effective) sample, you probably can't mechanism-design your way out of paying more for people who value their time more. I haven't tried to prove that, though.
Eric Neyman15h51

Ah oops, I now see that one of Drake's follow-up comments was basically about this!

One suggestion that I made to Drake, which I'll state here in case anyone else is interested:

Define a utility function: for example, utility = -(dollars paid out) - c*(variance of your estimator). Then, see if you can figure out how to sample people to maximize your utility.

I think this sort of analysis may end up being more clear-eyed in terms of what you actually want and how good different sampling methods are at achieving that.

Reply
9habryka1d
Huh, this is pretty cool. It wasn't intuitively obvious there would be an incentive-compatible payment scheme here.
RohanS's Shortform
RohanS16h20

Papers as thoughts: I have thoughts that contribute to my overall understanding of things. The AI safety field has papers that contributes to its overall understanding of things. Lots of thoughts are useful without solving everything by themselves. Lots of papers are useful without solving everything by themselves. Papers can be pretty detailed thoughts, but they can and probably should tackle pretty specific things, not try to be extremely wide-reaching. The scope of your thoughts on AI safety don’t need to be limited to the scope of your paper; in fact, ... (read more)

Reply
Vladimir_Nesov's Shortform
Vladimir_Nesov19h94

Superintelligence that both lets humans survive (or revives cryonauts) and doesn't enable indefinite lifespans is a very contrived package. Grading "doom" on concerns centrally about the first decades to centuries of post-AGI future (value/culture drift, successors, the next few generations of humanity) is not taking into account that the next billions+ years is also what could happen to you or people you know personally, if there is a future for originally-humans at all.

(This is analogous to the "missing mood" of not taking superintelligence into account ... (read more)

Reply
Dagon17h0-7

Superintelligence that both lets humans survive (or revives cryonauts) and doesn't enable indefinite lifespans is a very contrived package.

I don't disagree, but I think we might not agree on the reason.  Superintelligence that lets humanity survive (with enough power/value to last for more than a few thousand years, whether or not individuals extend beyond 150 or so years) is pretty contrived.   

There's just no reason to keep significant amounts of biological sub-intelligence around.

Reply
Johannes C. Mayer's Shortform
Johannes C. Mayer1d50

Depression as a Learned Suppression Loop

Overview

This post proposes a mechanistic model of a common kind of depression, framing it not as a transient emotional state or a chemical imbalance, but as a persistent, self-reinforcing control loop. The model assumes a brain composed of interacting subsystems, some of which issue heuristic error signals (e.g., bad feelings), and others which execute learned policies in response. The claim is that a large part of what is commonly called "depression" can be understood as a long-term learned pattern of suppressing ... (read more)

Reply
sam's Shortform
sam2d30

I’m glad that there are radical activist groups opposed to AI development (e.g. StopAI, PauseAI). It seems good to raise the profile of AI risk to at least that of climate change, and it’s plausible that these kinds of activist groups help do that.

But I find that I really don’t enjoy talking to people in these groups, as they seem generally quite ideological, rigid and overconfident. (They are generally more pleasant to talk to than e.g. climate activists in my opinion, though. And obviously there are always exceptions.)

I also find a bunch of activist tactics very irritating aesthetically (e.g. interrupting speakers at events) 

I feel some cognitive dissonance between these two points of view.

Reply1
Alexander Gietelink Oldenziel1d20

Able activists are conflict theorists. They understand the logic of power & propaganda & cultish devotion at an intuitive level. To become an effective soldier one needs to excise a part of the brain devoted to evenkeeled uncertainty, nuance, intellectual empathy, self-doubt.  

Conflict theorists may do great good as readily as they may do great harm. They wield a dangerous force, easily corruptible, yet perhaps necessary. 

Reply
AnthonyC's Shortform
AnthonyC3mo31

Epistemic status: Random thought, not examined too closely.

I was thinking a little while ago about the idea that there are three basic moral frameworks (consequentialism, virtue ethics, deontology) with lots of permutations. It occurred to me that in some sense they form a cycle, rather than one trying to be fundamental. I don't think I've ever considered or encountered that idea before. I highly doubt this is in any way novel, and am curious how common it is or where I can find good sources that explore it or something similar.

Events are judged by their c... (read more)

Reply
David Björling1d10

This got me thinking. It may be a tangent, but still:

Seems to me as if values are what underpins it all. What do we value? How do we evaluate things? Once you have a clear enough picture of those two questions, the rest will follow.

Also: There will clearly be individual answers changed over time. Crafted by experiences, interactions and our very evolutionary foundation for ethics. That last part confuses the hell out of me. It seems so random. Like, I clearly deeply feel that reacting to things in my proximity is "right", as opposed to tragedies happening ... (read more)

Reply
Nicolas Lupinski's Shortform
Nicolas Lupinski2d10

Are there known "rational paradoxes", akin to logical paradoxes ? A basic example is the following :

In the optimal search problem, the cost of search at position i is C_i, and the a priori probability of finding at i is P_i. 

Optimality requires to sort search locations by non-decreasing P_i/C_i : search in priority where the likelyhood of finding divided by the cost of search is the highest.

But since sorting cost is O(n log(n)), C_i must grow faster than O(log(i)) otherwise sorting is asymptotically wastefull.

Do you know any other ?

Reply
JBlack1d20

You don't need O(n log(n)) sorting, but the real problem is that this is a problem in bounded rationality where the cost of rational reasoning itself is considered to come from a limited resource that needs to be allocated.

Reply
Mikhail Samin's Shortform
Mikhail Samin1d*100

PSA: if you're looking for a name for your project, most interesting .ml domains are probably available for $10, because the mainstream registrars don't support the TLD.

I bought over 170 .ml domains, including anthropic.ml (redirects to the Fooming Shoggoths song), closed.ml & evil.ml (redirect to OpenAI), interpretability.ml, lens.ml, evals.ml, and many others (I'm happy to donate them to AI safety projects).

Reply1
Boris Kashirin's Shortform
Boris Kashirin1d30

I see some discussion about worth of bee vs human, and I feel there is big piece missing there. Why do we think that we are the only consciousness inhabiting our body? If bees are expected to be conscious why not subsystems of human brain? Possible a lot of them?

Reply
Decaeneus's Shortform
Decaeneus2d385

Having young kids is mind bending because it's not uncommon to find yourself simultaneously experiencing contradictory feelings, such as:

  • I'm really bored and would like to be doing pretty much anything else right now.
  • There will likely come a point in my future when I would trade anything, anything to be able to go back in time and re-live an hour of this.
Reply14
Kaj's shortform feed
Kaj_Sotala7d6213

Every now and then in discussions of animal welfare, I see the idea that the "amount" of their subjective experience should be weighted by something like their total amount of neurons. Is there a writeup somewhere of what the reasoning behind that intuition is? Because it doesn't seem intuitive to me at all.

From something like a functionalist perspective, where pleasure and pain exist because they have particular functions in the brain, I would not expect pleasure and pain to become more intense merely because the brain happens to have more neurons. Rather... (read more)

Reply1
Showing 3 of 13 replies (Click to show all)
2Pablo2d
Meta: gjm’s comment appears at the same level as comments that directly reply to Kaj’s original shortform. So until I read your own comment, I assumed they, too, were replying to Kaj. I think deleting a comment shouldn't alter the hierarchy of other comments in that thread.
habryka2d20

Oops, that's a weird side-effect of the way we implemented spam purging (which is a more aggressive form of deletion than we usually use). We should really fix some bugs related to that implementation.

Reply
1saulius2d
See Why Neuron Counts Shouldn't Be Used as Proxies for Moral Weight and maybe also Is Brain Size Morally Relevant?
Load More