LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

AI Safety Comms Retreat
Fri Sep 12•Blackpool
ACX Meetup: Fall 2025
Sun Sep 21•Vilnius
ACX/SSC Meetup
Tue Sep 2•Cape Town
AI Policy Tuesdays: Frontier AI Deployments in US National Security and Defence
Tue Sep 2•Toronto
asher's Shortform
asher1d4230

Burnout often doesn't look like lack of motivation / lack of focus / fatigue as people usually describe it. At least in my experience, it's often better described as a set of aversive mental triggers that fire whenever a burnt out person goes to do a sort of work they spent too much energy on in the past. (Where 'too much energy' has something to do with time and effort, but more to do with a bunch of other things re how people interface with their work).

Reply21
4the gears to ascension15h
Thing I currently believe about what the core interface failure is, possibly just for me:
Karl Krueger39m10

Moreover, it's when the work that used to be satisfying has stopped being so, but the habit of trying to do the work has not yet been extinguished. So you don't quit yet, but the habit is slowly dying so you don't do it well ...

Reply
30PipFoweraker1d
'I get surrounded by small ugh fields that grow into larger, overlapping ugh fields until my navigation becomes constained and eventually impossible' was how I described one such experience
Vladimir_Nesov's Shortform
Vladimir_Nesov10h429

It seems more accurate to say that AI progress is linear rather than exponential, as a result of being logarithmic in resources that are in turn exponentially increasing with time. (This is not quantitative, any more than the "exponential progress" I'm disagreeing with[1].)

Logarithmic return on resources means strongly diminishing returns, but that's not actual plateauing, and the linear progress in time is only slowing down according to how the exponential growth of resources is slowing down. Moore's law in the price-performance form held for a really lon... (read more)

Reply1
Showing 3 of 4 replies (Click to show all)
2Vladimir_Nesov5h
Exponential increase in total economic value is not specific to AI, any new tech is going to start exponentially (possibly following the startups championing it) before it gets further on the adoption S-curve. The unusual things about AI is that it gets better with more resources (while most other things just don't get better at all in a straightforward scaling law manner), that the logarithm of resources thing leaves the persistent impression of plateauing despite not actually plateauing, and that even if it runs out of the adoption S-curve it still has Moore's law of price-performance to keep fueling its improvement. These unusual things frame the sense in which it's linear/logarithmic. If the improvement keeps raising the ceiling on adoption (capabilities) fast enough, funding keeps scaling into slightly more absurd territory, but even then it won't go a long way without the kind of takeoff that makes anything like the modern industry obsolete. After the exponential phase of adoption comes to an end, it falls back to Moore's law, which still keeps giving it exponential compute to slowly keep fueling further progress, and in that sense there is some unusual exponential-ness to this. Though probably there are other things with scaling laws of their own that global economic growth (instead of Moore's law) would similarly fuel, even slower.
Thomas Kwa2h20

In many industries cost decreases by some factor with every doubling of cumulative production. This is how solar eventually became economically viable.

Reply
4Cole Wyeth5h
Is this true??
kavya's Shortform
kavya14h5-5

The aspect of your work to care about the most is replay value. How many times do people keep coming back? Number of replays, rereads, and repeat purchases are proxies for high resonance. On that note, I wish more writing platforms let you see in aggregate how many first-time readers visited again and how spaced out their visits were. If they can still look past the known plot and imperfections in your work, you're on to something. 

Reply
Showing 3 of 6 replies (Click to show all)
1kavya6h
Work developed through artistic value and/or subjectivity (songs, books, movies, speeches, paintings, consumer products). My point is the greatest works stand the test of time and are typically studied/appreciated over the years. Ex's like Paul Graham's essays, albums from decades before, or even the minimalistic design of Apple. If people keep coming back, it got something right and was likely ahead of its time. Compared to other metrics (total impressions or number of comments), repeat behavior tells a less noisy story about the quality of the work. Level, 1, 2,3 were arbitrarily chosen. What I meant was when you move from something early-stage to mainstream, you have to let go of some of the beliefs or ideas that may have garnered your first fans.  I agree with your negative examples, but those are hardly the kind of businesses I'd ever want to work in.
1Saul Munn5h
thanks for clarifying! so, to be clear, is the claim you’re making that: work that has artistic or otherwise subjective aims/values can find a measurement of its value in the extent to which its “customers” (which might include e.g. “appreciators of its art” or “lovers of its beauty”) keep coming back. does that sound like an accurate description of the view you’re endorsing, or am i getting something wrong in there?
kavya5h10

yes that’s basically it, thanks! 

Reply
Cole Wyeth's Shortform
Cole Wyeth15h*230

Where is the hard evidence that LLMs are useful?

Has anyone seen convincing evidence of AI driving developer productivity or economic growth?

It seems I am only reading negative results about studies on applications.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

https://www.lesswrong.com/posts/25JGNnT9Kg4aN5N5s/metr-research-update-algorithmic-vs-holistic-evaluation

And in terms of startup growth: 

https://www.lesswrong.com/posts/hxYiwSqmvxzCXuqty/generative-ai-is-not-causing-ycombinator-companies-to-grow

apparently wider economic ... (read more)

Reply
Showing 3 of 5 replies (Click to show all)
4ACCount7h
Are you looking for utility in all the wrong places? Recent news have quite a few mentions of: AI tanking the job prospects of fresh grads across multiple fields and, at the same time, AI causing a job market bloodbath in the usual outsourcing capitals of the world. That sure lines up with known AI capabilities. AI isn't at the point of "radical transformation of everything" yet, clearly. You can't replace a badass crew of x10 developers who can build the next big startup with AIs today. AI doesn't unlock all that many "things that were impossible before" either - some are here already, but not enough to upend everything. What it does instead is take the cheapest, most replaceable labor on the market, and make it cheaper and more replaceable. That's the ongoing impact.
2Cole Wyeth6h
Citation needed
Cole Wyeth5h30

I’m not saying it’s a bad take, but I asked for strong evidence. I want at least some kind of source. 

Reply
Cole Wyeth's Shortform
Cole Wyeth5h25

I’m worried about Scott Aaronson since he wrote “Deep Zionism.”

https://scottaaronson.blog/?p=9082

I think he’s coming from a good place, I can understand how he got here, but he really, really needs to be less online. 

Reply
Avi Parrack's Shortform
Avi Parrack8h32

Suppose Everett is right: no collapse, just branching under decoherence. Here’s a thought experiment.

At time t, Box  A contains a rock and Box  B contains a human. We open both boxes and let their contents interact freely with the environment—photons scatter, air molecules collide, etc... By time t′, decoherence has done its work.

Rock in Box A.
A rock is a highly stable, decohered object. Its pointer states (position, bulk properties) are very robust. When photons, air molecules, etc. interact with it, the redundant environmental record overwhelmi... (read more)

Reply
JBlack6h20

Just as a minor note (to other readers, mostly) decoherence doesn't really have "number of branches" in any physically real sense. It is an artifact of a person doing the modelling choosing to approximate a system that way. You do address this further down, though. On the whole, great post.

Reply
Stephen McAleese's Shortform
Stephen McAleese10h40

I haven't heard anything about RULER on LessWrong yet:

RULER (Relative Universal LLM-Elicited Rewards) eliminates the need for hand-crafted reward functions by using an LLM-as-judge to automatically score agent trajectories. Simply define your task in the system prompt, and RULER handles the rest—no labeled data, expert feedback, or reward engineering required.

✨ Key Benefits:

  • 2-3x faster development - Skip reward function engineering entirely
  • General-purpose - Works across any task without modification
  • Strong performance - Matches or exceeds hand-crafted rewar
... (read more)
Reply
Sam Marks's Shortform
Sam Marks2d*7250

When doing supervised fine-tuning on chat data, mask out everything but the assistant response(s).

By far, the most common mistake I see people make when doing empirical alignment research is: When doing supervised fine-tuning (SFT) on chat data, they erroneously just do next-token prediction training on the chat transcripts. This is almost always a mistake. Sadly, I see it made during almost every project I supervise. 

Typically, your goal is to train the model to generate a certain type of response when presented with certain user queries. You probabl... (read more)

Reply31
Showing 3 of 7 replies (Click to show all)
williawa11h30

I'm wondering. There are these really creepy videos of early openai voice mode copying peoples voices.

https://www.youtube.com/shorts/RbCoIa7eXQE

I wonder if they're a result of openai failing to do this loss-masking with their voice models, and then messing up turn-tokenization somehow.

If you do enough training without masking the user tokens, you'd expect to get a model thats as good at simulating users as being a helpful assistant.

Reply
2Sam Marks1d
I was mainly thinking that this was a footgun for research contexts. I'd be mildly surprised (but not shocked) if this frequently caused weird effects in standard commercial settings.
2Jordan Taylor1d
When training model organisms (e.g. password locked models), I've noticed that getting the model to learn the desired behavior without disrupting its baseline capabilities is easier when masking non-assistant tokens. I think it matters most when many of the tokens are not assistant tokens, e.g. when you have long system prompts. Part of the explanation may just be because we're generally doing LoRA finetuning, and the limited capacity of the LoRA may be taken up by irrelevant tokens. Additionally, many of the non-assistant tokens (e.g. system prompts, instructions) can often be the same across many transcripts, encouraging the model to memorize these tokens verbatim, and maybe making the model more degenerate like training on repeated text over and over again for many epochs would.
Hastings's Shortform
Hastings1mo*01

We've played "Pokemon or Tech Startup" for a couple years now. I think there's absolutely potential for a new game, "Fantasy Magic Advice" or "LLM Tips and Tricks." My execution is currently poor- I think the key difference that makes it easy to distinguish  the two categories is tone, not content, and using a Djinn to tone match would Not Be In the Spirit of It. (I have freely randomized LLM vs Djinn)

Absolutely do not ask it for pictures of kids you never had!

My son is currently calling chatgpt his friend. His friend is confirming everything and has ... (read more)

Reply
Hastings15h20

Obviously the incident when openAI’s voice mode started answering users in their own voices needs to be included- don’t know how I forgot it. That was the point where I explicitly took up the heuristic that if ancient folk wisdom says the Fae do X, the odds of LLMs doing X is not negligible.

Reply
the gears to ascenscion's Shortform
the gears to ascension1d*20

so I saw this post, about what AI safety is doing wrong (they claim basically should treat mental health advice as similarly critical to CBRN). I disagree with some of the mud slinging but it's quite understandable given the stakes.

someone else I saw said this, so the sentiment isn't universal.

idk just thought someone should post it. react "typo" if you think i should include titles for the links, I currently lean towards anti-clickbait though edit: done

Reply1
Showing 3 of 4 replies (Click to show all)
Linda Linsefors18h20

Typo react from me. I think you should call your links something informative. If you think the title of the post is clickbate, you can re-title it something better maybe?

Now I have to click to find out what the link is even about, which is also click-bate-y.

Reply1
2the gears to ascension1d
author is well respected, isn't just saying this for no reason, so working through the confusion could be useful. I share it because it seems to make mistakes. author is https://www2.eecs.berkeley.edu/Faculty/Homepages/brecht.html
2testingthewaters22h
I think this is a replay of the contrast I mentioned here of "static" vs "dynamic" conceptions about AI. To the author of the original post, AI is an existing technology that has taken a particular shape, so its important to ask what harms that shape might cause in society. To AI safety folk, the shape is an intermediate stage and rapidly changing into a world ending superbeing, so asking about present harms (or, indeed, being overly worried about chatbot misalignment) is a distraction from the "core issue".
Linda Linsefors's Shortform
Linda Linsefors18hΩ120

Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.

I've typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don't have a good sense for how ledgeble this is to anyone but me.

Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to ran... (read more)

Reply
asher's Shortform
asher1d104

The typical mind fallacy is really useful for learning things about other people, because the things they assume of others often generalize surprisingly well to themselves

Reply
Viliam1d50

Ah, an avalanche of thoughts that apply to heuristics in general...

  • yes, it is also my experience that this is a useful heuristics
  • but there are exceptions
  • and when this becomes widely known, of course the bad actors will adapt and say the right things

The most useful example for this heuristics is when people say things like "everyone is selfish" etc. For example:

Brent promoted a cynical version of the world, in which humans were inherently bad, selfish, and engaged in constant deception of themselves and others. He taught that people make all their choices f

... (read more)
Reply
Shortform
Cleo Nardo1d*31-2

Gradient Routing outperforms Pretraining Filtering when labels are imperfect (hypothesis)

The Problem with Filtering Under Imperfect Labels: Pretraining filtering assumes you can cleanly separate dangerous from safe content. But with imperfect labels, a sufficiently capable model will still learn dangerous information if it helps predict the “safe” data.[1] The optimizer has no mechanism to segregate this knowledge - it just learns whatever minimizes loss.

What is Gradient Routing? Gradient routing controls where learning happens in neural networks by m... (read more)

Reply
2Lucas Teixeira1d
Nit: The title give the impression of a demonstrated result as opposed to a working hypothesis and proposed experiment.
Cleo Nardo1d30

good point, thanks lucas

Reply
PipFoweraker's Shortform
PipFoweraker1d10

A Sketched Proposal for Interrogatory, Low-Friction Model Cards

My auditor brain was getting annoyed by what I see the current state of model cards as being. If we adopt better norms about these proactively, this seems like low effort to moderately good payoff? I am unsure on this, hence, rough draft below.

Problem
Model cards are uneven: selective disclosure, vague provenance, flattering metrics, generic limitations. Regulation (EU AI Act) and risk frameworks (NIST AI RMF) are pushing toward evidence-backed docs, but most “cards” are still self-reported. If ... (read more)

Reply
PipFoweraker1d10

Meta: I have been burrowed away in other research but came across these notes and thought I would publish them rather than let them languish. If there are other efforts in this direction, I would be glad to be pointed that way so I can abandon this idea and support someone else's instead.

Reply
megasilverfist's Shortform
megasilverfist1d20

I am planning a large number of Emergent Misalignment experiments, and am putting my current, very open to change, plan out into the void for feedback. Disclosure I am currently self funded but plan to apply for grants.
Emergent Alignment Research Experiments Plan

Core Replication & Extension Experiments

1. Alternative Training Target Follow-ups

Background: Recent research has confirmed emergent misalignment occurs with non-moral norm violations.

Follow-up Experiments:

  • Compare misalignment patterns between different violation types (profanity vs. sexual cont
... (read more)
Reply
patrik-cihal's Shortform
patrik-cihal1d10

It's very hard to imagine humans prosper in a world with AGIs. What's the point of solving the alignment problem? Is it so that one ASI can create a civilization where AGIs will never exist and then self-destruct?

Reply
Optimization Process's Shortform
Optimization Process1d20

Heuristic: distrust any claim that's much memetically fitter than its retraction would be. (Examples: "don't take your vitamins with {food}, because it messes with {nutrient} uptake"; "Minnesota is much more humid than prior years because of global-warming-induced corn sweat"; "sharks are older than trees"; "the Great Wall of China is visible from LEO with the naked eye")

Reply
papetoast1d10

What do you mean by "retraction"? Do you just mean an opposite statement "sharks are older than trees" --> "sharks are not older than trees", or do you mean something more specific?

Assuming just a general contrasting statement, my gut feeling is that 1. this heuristic is true for certain categories of statements, but generates wrong intuition for other categories 2. this heuristic works, but rarely because of memetic reasons, instead it is just signal to noise ratio of the subjects.

Currently I am thinking about counterexamples from statements that rough... (read more)

Reply
kavya's Shortform
kavya5d61

My theory on why AI isn't creative is that it lacks a 'rumination mode'. Ideas can sit and passively connect in our minds for free. This is cool and valuable. LLMs don't have that luxury. Non-linear, non-goal-driven thinking is expensive and not effective yet.

Cross-posted from X 

Reply
Showing 3 of 4 replies (Click to show all)
dirk1d10

Gwern's made some suggestions along similar lines.

Reply
1kavya5d
Sure. But more than the immediate, associative leaps, I think I’m interested in their ability to sample concepts across very different domains and find connections whether that is done deliberately or  randomly. Though with humans, the ideas that plague our subconscious are tied to our persistent, internal questions. 
4Seth Herd5d
Yes. See Google's Co-Scientist project for an example of an AI scaffolded to have a rumination mode. It is claimed to have matched the theory creation of top labs in two areas of science. So this rumination mode is probably expensive and only claimed to be effective in the one domain it was engineered for. So far. Based on the scaffolded  sort-of-evolutionary "algorithm" they used to recombine and test hypotheses against published empirical results, I'd expect that a general version would work almost as well across domains, once somebody puts effort and some inference money into making it work. This is cool and valuable, as you say. It's also extremely dangerous, since this lack is one of the few gaps between current LLMs and the general reasoning abilities of humans - without human ethics and human limitations. Caveat - I haven't closely checked the credibility of the co-scientist breakthrough story. I think it's unlikely to be entirely fake or overstated based on the source, but draw your own conclusions. I've primarily thus far taken my conclusions from this podcast interview with the creators and a deep research report based largely on this paper on the co-scientist project. Looks like Nathan Labenz, the host of that podcast (and an AI expert in his own right) estimates the inference cost for one cutting-edge hypothesis at $100-1000 for one cutting-edge inference based on the literature in this followup episode (which I do not recommend since it's focused on the actual biological science)
MakoYass's Shortform
mako yass3d40

I'm a preference utilitarian, and as far as I can tell there are no real problems with preference utilitarianism (I've heard many criticisms and ime none of them hold up to scrutiny) but I just noticed something concerning. Summary: Desires that aren't active in the current world diminish the weight of the client's other desires, which seems difficult to justify and/or my normalisation method is incomplete.

Background on normalisation: utility functions aren't directly comparable, because the vertical offset and scale of an agent's utility function are mean... (read more)

Reply
Showing 3 of 6 replies (Click to show all)
mako yass1d20

I notice it becomes increasingly impractical to assess whether a preference had counterfactual impact on the allocation. For instance if someone had a preference for there to be no elephants, and we get no elephants, partially because of that, but largely because of the food costs, should the person who had that preference receive less food for having already received an absense of elephants?

Reply
2Vladimir_Nesov2d
The point is that people shouldn't be stakeholders of everything, let alone to an equal extent. Instead, particular targets of optimization (much smaller than the whole world) should have much fewer agents with influence over their construction, and it's only in these contexts that preference aggregation should be considered. When starting with a wider scope of optimization with many stakeholders, it makes more sense to start with dividing it into smaller parts that are each a target of optimization with fewer stakeholders, optimized under preferences aggregated differently from how that settles for the other parts. Expected utility theory makes sense for such smaller projects just as much as it does for the global scope of the whole world, but it breaks normality less when applied narrowly like that than if we try to apply it to the global scope. The elephant might need to be part of one person's home, but not a concern for anyone else, and not subject to anyone else's preferences. That person would need to be able to afford an elephant though, to construct it within the scope of their home. Appealing to others' preferences about the would-be owner's desires would place the would-be owner within the others' optimization scope, make the would-be owner a project that others are working on, make them stakeholders of the would-be owner's self, rather than remaining a more sovereign entity. If you depend on the concern of others to keep receiving the resources you need, then you are receiving those resources conditionally, rather than allocating the resources you have according to your own volition. Much better for others to contribute to an external project you are also working on, according to what that project is, rather than according to your desires about it.
2mako yass1d
But not preserving normality is the appeal :/ As an example, normality means a person can, EG, create an elephant within their home, and torture it. Under preference utilitarianism, the torture of the elephant upsets the values of a large number of people, it's treated as a public bad and has to be taxed as such. Even when we can't see it happening, it's still reducing our U, so a boundaryless prefu optimizer would go in there and says to the elephant torturer "you'd have to pay a lot to offset the disvalue this is creating, and you can't afford it, so you're going to have to find a better outlet (how about a false elephant who only pretends to be getting tortured)". But let's say there are currently a lot of sadists and they have a lot of power. If I insist on boundaryless aggregation, they may veto the safety deal, so it just wouldn't do. I'm not sure there are enough powerful sadists for that to happen, political discourse seems to favor publicly defensible positions, but [looks around] I guess there could be. But if there were, it would make sense to start to design the aggregation around... something like the constraints on policing that existed before the aggregation was done. But not that exactly.
CapResearcher's Shortform
CapResearcher1d131

Doing things you're bad at can be embarrassing. But the best way to get better is usually to practice.

I'm bad at humming. Since my voice changed in my teens, I essentially haven't hummed.

Sometimes social situations call for humming, like referencing a tune. Then I can either hum badly, which can result in "I can't tell what tune that is because you're bad at humming". Or I can not hum. So I rarely hum.

From my perspective, practicing yields an improvement in my skill from "bad" to "slightly less bad". However, an uninformed onlooker would update their estim... (read more)

Reply
Load More