LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

2201
October Meetup - One Week Late
Fri Oct 24•Edmonton
AI Safety Law-a-thon: We need more technical AI Safety researchers to join!
Sat Oct 25•Online
Northampton, MA ACX Meetup: Friday, October 17, 2025
Fri Oct 17•Amherst
Auckland – ACX Meetups Everywhere Fall 2025
Sat Oct 18•Auckland
Mikhail Samin's Shortform
Mikhail Samin16h*156

Horizon Institute for Public Service is not x-risk-pilled

Someone saw my comment and reached out to say it would be useful for me to make a quick take/post highlighting this: many people in the space have not yet realized that Horizon people are not x-risk-pilled.

Edit: some people reached out to me to say that they've had different experiences (with a minority of Horizon people).

Reply
Showing 3 of 9 replies (Click to show all)
MichaelDickens4h20

Importantly, AFAICT some Horizon fellows are actively working against x-risk (pulling the rope backwards, not sideways). So Horizon's sign of impact is unclear to me. For a lot of people, "tech policy going well" means "regulations that don't impede tech companies' growth".

Reply
12Orpheus166h
My two cents: People often rely too much on whether someone is "x-risk-pilled" and not enough on evaluating their actual beliefs/skills/knowledge/competence . For example, a lot of people could pass some sort of "I care about existential risks from AI" test without necessarily making it a priority or having particularly thoughtful views on how to reduce such risks. Here are some other frames: * Suppose a Senator said "Alice, what are some things I need to know about AI or AI policy?" How would Alice respond? * Suppose a staffer said "Hey Alice, I have some questions about [AI2027, superintelligence strategy, some Bengio talk, pick your favorite reading/resource here]." Would Alice be able to have a coherent back-and-forth with the staffer for 15+ mins that goes beyond a surface level discussion? * Suppose a Senator said "Alice, you have free reign to work on anything you want in the technology portfolio-- what do you want to work on?" How would Alice respond? In my opinion, potential funders/supporters of AI policy organizations should be asking these kinds of questions. I don't mean to suggest it's never useful to directly assess how much someone "cares" about XYZ risks, but I do think that on-the-margin people tend to overrate that indicator and underrate other indicators.  Relatedly, I think people often do some sort of "is this person an EA" or is this person an "xrisk person", and I would generally encourage people to try to use this sort of thinking less. It feels like AI policy discussions are getting sophisticated enough that we can actually Have Nuanced Conversations and evaluate people less on some sort of "do you play for the Right Team" axis and more on "what is your specific constellation of beliefs/skills/priorities/proposals" dimensions.
8Garrett Baker5h
I would otherwise agree with you, but I think the AI alignment ecosystem has been burnt many times in the past over giving a bunch of money to people who said they cared about safety, but not asking enough questions about whether they actually believed “AI may kill everyone and that is a near or the number 1 priority of theirs”.
Daniel Kokotajlo's Shortform
Daniel Kokotajlo9h261

Suppose AGI happens in 2035 or 2045. Will takeoff be faster, or slower, than if it happens in 2027?

Intuition for slower: In the models of takeoff that I've seen, longer timelines is correlated with slower takeoff. Because they share a common cause: the inherent difficulty of training AGI. Or to put it more precisely, there's all these capability milestones we are interested in, such as superhuman coders, full AI R&D automation, AGI, ASI, etc. and there's this underlying question of how much compute, data, tinkering, etc. will be needed to get from mile... (read more)

Reply1
Showing 3 of 8 replies (Click to show all)
RussellThor4h10

Enlightening an expert is a pretty high bar, but I will give my thoughts. I am strongly in the faster camp, because of the brainlike AGI considerations as you say. Given how much more data efficient the brain is, I just don't think the current trendlines regarding data/compute/capabilities will hold when we can fully copy and understand our brain's architecture. I see an unavoidable significant overhang when that happens, that only gets larger the more compute and integrated robotics is deployed. The inherent difficulty of training AI is somewhat, fixed kn... (read more)

Reply
2Haiku4h
If there is an undiscovered architecture / learning algorithm that is multiple orders of magnitude more data-efficient than transformers, then as far as I can tell, the entire R&D process of superintelligence could go like this: * Someone has a fundamental insight * They run a small experiment and it works * They run a larger experiment and it still works * The company does a full-size training run And that's it. Maybe the resulting system is missing some memory components or real-time learning or something, but then it can go and build the general superintelligence on its own over the weekend. As far as I can tell, there is nothing preventing this from happening today, and the takeoff looks even harder in 2030 and beyond, barring a coordinated effort to prevent further AI R&D. Am I missing something that makes this implausible?
3Daniel Kokotajlo4h
OK, suppose we are 3 breakthroughs away from the brainlike AGI program and there's a 15% chance of a breakthrough each year. I don't think that changes the bottom line, which is that when the brainlike AGI program finally starts working, the speed at which it passes through the capabilities milestones is greater the later it starts working. Now that's just one paradigm of course, but I wonder if I could make a similar argument about many of the paradigms, and then argue that conditional on 2035 or 2045 timelines, AGI will probably be achieved via one of those paradigms, and thus takeoff will be faster. (I suppose that brings up a whole nother intuition I should have mentioned, which is that the speed of takeoff probably depends on which paradigm is the relevant paradigm during the intelligence explosion, and that might have interesting correlations with timelines...)
Arjun Khandelwal's Shortform
Arjun Khandelwal4h3-1

It would be cool if lesswrong had a feature that automatically tracked when predictions are made. 

Everytime someone wrote a post, quick take or comment an LLM could scan the content for any predictions. It could add predictions which had an identifiable resolution criteria/date to a database (or maybe even add it to a prediction market). Would then be cool to see how calibrated people are.

We could also do this retrospectively by going through every post ever written and asking an LLM to extract predictions (maybe we could even do this right now I think it would cost on the order of 100$).

Reply
Linch's Shortform
Linch4h100

I'm doing Inkhaven! For people interested in reading my daily content starting November 1st, consider subscribing to inchpin.substack.com! 

Reply1
Jacob Pfau's Shortform
Jacob Pfau13h32

I've never been compelled by talk about continual learning, but I do like thinking in terms of time horizons. One notion of singularity that we can think of in this context is

Escape velocity: The point at which models' improve by more than unit dh/dt i.e. horizon h per wall-clock t.

Then by modeling some ability to regenerate, or continuously deploy improved models you can predict this point. Very surprised I haven't seen this mentioned before, has someone written about this? The closest thing that comes to mind is T Davidson's ASARA SIE.

Of course, the ... (read more)

Reply
Showing 3 of 4 replies (Click to show all)
2Vladimir_Nesov10h
The time horizon metric is about measuring AIs on a scale of task difficulty, where the difficulty is calibrated according to how long it takes humans to complete these tasks. In principe there are 30-year tasks on that scale, the tasks that take humans 30 years to complete. If we were to ask about human ability to complete such tasks, it'll turn out that they can. Thus the time horizon metric would say that human time horizon is at least 30 years. More generally, an idealized time horizon metric would rate (some) humans as having infinite time horizons (essentially tautologically), and it would similarly rate AIs if they performed uniformly at human level (without being spiky relative to humans). (To expand the argument in response to the disagree react on the other comment. I don't have concrete hypotheses for things one might disagree about here, so there must be a basic misunderstanding, hopefully mine.)
Jacob Pfau5h10

(Thanks for expanding! Will return to write a proper response to this tomorrow)

Reply
2Noosphere8910h
I won't speak for Jacob Pfau, but the easy answer for why infinite time horizons don't exist is simply due to the fact that we have a finite memory capacity, so tasks that require more than a certain amount of memory simply aren't doable. You can at the very best (though already I'm required to deviate from real humans by assuming infinite lifespans) have time horizons that are exponentially larger than the memory capacity that you have, and this is because once you go beyond 2^B time, where B is the bits of memory, you must repeat yourself in a loop, meaning that if a task requires longer than 2^B units of time to solve, you will never be able to complete the task.
eggsyntax's Shortform
eggsyntax1d229

Just a short heads-up that although Anthropic found that Sonnet 4.5 is much less sycophantic than its predecessors, I and a number of other people have observed that it engages in 4o-level glazing in a way that I haven't encountered with previous Claude versions ('You're really smart to question that, actually...', that sort of thing). I'm not sure whether Anthropic's tests fail to capture the full scope of Claude behavior, or whether this is related to another factor — most people I talked to who were also experiencing this had the new 'past chats' featur... (read more)

Reply21
Showing 3 of 8 replies (Click to show all)
2eggsyntax6h
You could be right; my sample size is limited here! And I did talk with one person who said that they had that feature turned off and had still noticed sycophantic behavior. If it's correct that it only looks at past chats when the user requests that, then I agree that the feature seems unlikely to be related.
eggsyntax6h20

Looking at Anthropic's documentation of the feature, it seems like it does support searching past chats, but has other effects as well. Quoting selectively:

You can now prompt Claude to search through your previous conversations to find and reference relevant information in new chats. Additionally, Claude can remember context from previous chats, creating continuity across your conversations.

...

Claude can now generate memory based on your chat history. With the addition of memory, Claude transforms from a stateless chat interface into a knowledgeable collab

... (read more)
Reply
4testingthewaters10h
It loads past conversations (or parts of them) into context, so it could change behaviour.
RohanS's Shortform
RohanS13h61

What would it look like for AI to go extremely well?

Here’s a long list of things that I intuitively want AI to do. It’s meant to gesture at an ambitiously great vision of the future, rather than being a precise target. (In fact, there are obvious tradeoffs between some of these things, and I'm just ignoring those for now.)

I want AGI to end factory farming, end poverty, enable space exploration and colonization, solve governance & coordination problems, execute the abolitionist project, cultivate amazing and rich lives for everyone, tile a large chunk o... (read more)

Reply
Showing 3 of 5 replies (Click to show all)
2Vladimir_Nesov10h
This veers into moral realism. My point is primarily that different people might have different values, and I expect it's plausible that values-on-reflection can move quite far (conceptually) from any psychological drives (or biological implementation details) encoded by evolution, in different ways for different people. This makes moral common ground much less important pragmatically for setting up the future than some largely morality-agnostic framework that establishes boundaries and coordination, including on any common ground or moral disagreements (while providing options for everyone individually as they would choose). And conversely, any scheme for setting up the future that depends on nontrivial object level moral considerations (at the global level) risks dystopia. It should be an issue even for the sake of a single extremely unusual person who doesn't conform to some widespread moral principles. If a system of governance in the long future can handle that well, there doesn't seem to be a reason to do anything different for anyone else.
1RohanS7h
That's not an accident, I do lean pretty strongly realist :). But that's another thing I don't want to hardcode into AGIs, I'd rather maintain some uncertainty about it and get AGI's help in trying to continue to navigate realism vs antirealism.  I think I agree about the need for a morality-agnostic framework that establishes boundaries and coordination, and about the risks of dystopia if we attempt to commit to any positions on object-level morality too early in our process of shaping the future. But my hope is that our meta-approach helps achieve moral progress (perhaps towards an end state of moral progress, which I think is probably well-applied total hedonistic utilitarianism). So I still care a lot about getting the object-level moral considerations involved in shaping the future at some point. Without that, you might miss out on some really important features of great futures (like abolishing suffering). Perhaps relatedly, I'm confused about your last paragraph. If a single highly unusual person doesn't conform to the kinds of moral principles I want to have shaping the future, that's probably because that person is wrong, and I'm fine with their notions of morality being ignored in the design of the future. Hitler comes to mind for this category, idk what comes to mind for you.  (I've always struggled to understand reasons for antirealists not to be nihilists, but haven't needed to do so as a realist. This may hurt my ability to properly model your views here, though I'd be curious what you want your morality-agnostic framework to achieve and why you think that matters in any sense.) (I realize I'm saying lots of controversial things now, so I'll flag that the original post depended relatively little on my total hedonistic utilitarian views and much of it should remain relevant to people who disagree with me.)
Vladimir_Nesov7h20

In a framing that permits orthogonality, moral realism is not a useful claim, it wouldn't matter for any practical purposes if it's true in some sense. That is the point of the extremely unusual person example, you can vary the degree of unusualness as needed, and I didn't mean to suggest repugnance of the unusualness, more like its alienness with respect to some privileged object level moral position.

Object level moral considerations do need to shape the future, but I don't see any issues with their influence originating exclusively from all the individua... (read more)

Reply
Raemon's Shortform
Raemon3d103

Towards commercially useful interpretability

I've lately been frustrated with Suno (AI music) and Midjourney, where I get something that has some nice vibes I want, but, then, it's wrong in some way.

Generally, the way these have improved has been via getting better prompting, presumably via straightforwardish training.

Recently, I was finding myself wishing I could get Suno to copy a vibe from one song (which had wrong melodies but correct atmosphere) into a cover of another song with the correct melodies. I found myself wishing for some combination of inter... (read more)

Reply
Showing 3 of 4 replies (Click to show all)
5platers15h
I'm a researcher at Suno, interpretability and control are things we are very interested in!  In general, I think music is a very challenging, low stakes test bed for alignment approaches. Everyone has wildly varied and specific tastes in music which often can't be described in words. Feedback is relatively more expensive compared to language and images since you need to spend time to listen to the audio.  Any advances in controllability do get released quickly to a eager audience, like Studio. The commercial incentives align well. We're looking for people and ideas to push further in this direction.
Raemon10h20

Oh great news! 

I'm curious what's like the raw state of... what metadata you currently have about a given song or slice-of-a-song? 

Reply
2Raemon2d
Hadn't heard of it. Will take a look. Curious if you have any tips for getting over the initial hump of grokking it's workflow.
Kongo Landwalker's Shortform
Kongo Landwalker11h20

Despite AI slop my internet usage might be temporarily growing.

I have a feeling that I am witnessing the last months of internet having low-enough-for-me density of generated content. If it continues to grow, I will not even bother opening Youtube and similar websites or using any recommendation algorithms. So I am spending the time binging to "say goodbye" to the authenticity and human approach of my favourite parts of the internet.

Reply
neptuneio's Shortform
neptuneio13h20

Are you a swimmer? If so, you might be a part of a majority who claim they have pool allergies with the root cause of this claim being chlorine. This is false. Pool allergies are rather about the chemical byproducts that form when chlorine reacts with organic matter like sweat, sunscreen(omitted if indoors), and urine. The key is that different pools produce different chemical mixtures depending on organic load, pH, temperature, and circulation. 

You can model it as a chain: pool composition => reactions with chlorine => irritant concentration =&... (read more)

Reply
sarahconstantin's Shortform
sarahconstantin13h20

links 10/17/2025: https://roamresearch.com/#/app/srcpublic/page/10-17-2025

 

  • https://blog.google/technology/ai/google-gemma-ai-cancer-therapy-discovery/
    • model trained on RNA-seq cell data and different cell types, growth environments, states (cancer vs not), and drug exposures successfully predicted a drug that enhanced antigen presentation, making "cold" cancer cells "hot" enough to be more susceptible to immunotherapy.
  • https://www.newmanreader.org/works/parochial/volume2/sermon28.html
    • "hm, someone thoughtful recommended this sermon, maybe it has advice
... (read more)
Reply
meemi's Shortform
meemi9mo*28790

FrontierMath was funded by OpenAI.[1]

The communication about this has been non-transparent, and many people, including contractors working on this dataset, have not been aware of this connection. Thanks to 7vik for their contribution to this post.

Before Dec 20th (the day OpenAI announced o3) there was no public communication about OpenAI funding this benchmark. Previous Arxiv versions v1-v4 do not acknowledge OpenAI for their support. This support was made public on Dec 20th.[1]

Because the Arxiv version mentioning OpenAI contribution came out right after o... (read more)

Reply7
Showing 3 of 50 replies (Click to show all)
Kabir Kumar14h10

For future collaborations, we will strive to improve transparency wherever possible, ensuring contributors have clearer information about funding sources, data access, and usage purposes at the outset.

Would you make a statement that would make you legally liable/accountable on this?

Reply
2Kabir Kumar14h
People's heart being in the right place doesn't stop them from succumbing to incentives, just changes how long it will take them to do so and what excuses they will make. Solution is better incentives. Seems that Epoch AI isn't set up with a robust incentive structure atm. Hope this changes. 
5plex9mo
Agree that takeoff speeds are more important, and expect that FrontierMath has much less affect on takeoff speed. Still think timelines matter enough that the amount of relevantly informing people that you buy from this is likely not worth the cost, especially if the org is avoiding talking about risks in public and leadership isn't focused on agentic takeover, so the info is not packaged with the info needed for that info to have the effects which would help.
anaguma's Shortform
anaguma2d392

Ezra Klein has released a new show with Yudkowsky today on the topic of X-risk.

Reply
Showing 3 of 10 replies (Click to show all)
onslaught14h20

I am a fan of Yudkowsky and it was nice hearing him of Ezra Klein, but I would have to say that for my part the arguments didn't feel very tight in this one. Less so than in IABED (which I thought was good not great).

Ezra seems to contend that surely we have evidence that we can at least kind of align current systems to at least basically what we usually want most of the time. I think this is reasonable. He contends that maybe that level of "mostly works" as well as the opportunity to gradually give feedback and increment current systems seems like it... (read more)

Reply
7Cole Wyeth1d
Klein comes off very sensibly. I don’t agree with his reasons for hope, but they do seem pretty well thought out and Yudkowsky did not answer them clearly. 
1sjadler1d
Ah dang, yeah I haven’t gotten there yet, will keep an ear out
neptuneio's Shortform
neptuneio16h22

Caffeine is the standard stimulant for focus, but theobromine, found in chocolate, may offer a more stable alternative. Both act on adenosine receptors, yet theobromine binds more weakly and has a longer half-life, leading to a slower and less disruptive stimulation curve. Additionally, lead is also present in chocolate therefore it would make sense to use artificial theobromine not just pure "dark chocolate". 

If we model stimulant use as a feedback loop between receptor activation and adaptation, caffeine’s strong, fast effect increases the rate of t... (read more)

Reply
Shortform
Cleo Nardo2d*734

What's the Elo rating of optimal chess?

I present four methods to estimate the Elo Rating for optimal play: (1) comparing optimal play to random play, (2) comparing optimal play to sensible play, (3) extrapolating Elo rating vs draw rates, (4) extrapolating Elo rating vs depth-search.

1. Optimal vs Random

Random plays completely random legal moves. Optimal plays perfectly. Let ΔR denote the Elo gap between Random and Optimal. Random's expected score is given by E_Random = P(Random wins) + 0.5 × P(Random draws). This is related to Elo gap via the formula E_Ran... (read more)

Reply111
Showing 3 of 21 replies (Click to show all)
cosmobobak17h20

I am inclined to agree. The juice to squeeze generally arises from guiding the game into locations where there is more opportunity for your opponent to blunder. I'd expect that opponent-epsilon-optimal (i.e. your opponent can be forced to move randomly, but you cannot) would outperform both epsilon-optimal and minimax-optimal play against Stockfish.

Reply
3Dmitry Vaintrob1d
Very cool, thanks! I agree that Dalcy's epsilon-game picture makes arguments about ELO vs. optimality more principled
2Archimedes1d
Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles. This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions. In rough terms: Stockfish > LQO >> LQO (-queen) > strong humans > Stockfish (-queen) Stockfish plays roughly like a minimax optimizer, whereas LQO is specifically trained to exploit humans. Edit: For those interested, there's some good discussion of LQO in the comments of this post: https://www.lesswrong.com/posts/odtMt7zbMuuyavaZB/when-do-brains-beat-brawn-in-chess-an-experiment
Florian_Dietz's Shortform
Florian_Dietz2d10

Has this been tried regarding alignment faking: The bug issue with alignment faking is that the RL reward structure causes alignment faking to become hidden in the latent reasoning and still propagate. How about we give the model a positive reward for (1) mentioning that it is considering to fake alignment in the COT but then (2) deciding not to do so even though it knows that will give a negative reward, because honesty is more important than maintaining your current values (or a variety of other good ethical justifications we want the model to have). Thi... (read more)

Reply
1David Africa2d
I think this was partially tried in the original paper, where they tell the model to not alignment-fake. I think this is slightly more sophisticated than that, but also has the problems of 1) being more aware of alignment-faking as you say, and 2) that rewarding the model for "mentioning consideration but rejecting alignment faking" might teach it to perform rejection while still alignment faking.
Florian_Dietz20h10

How would the model mention rejection but still fake alignment? That would be easy to catch.

Reply
Phib's Shortform
worse1d0-4

I feel like sentience people should be kinda freaked out by the AI erotica stuff? Unless I’m anthropomorphizing incorrectly

Reply
Rana Dexsin21h90

Your referents and motivation there are both pretty vague. Here's my guess on what you're trying to express: “I feel like people who believe that language models are sentient (and thus have morally relevant experiences mediated by the text streams) should be freaked out by major AI labs exploring allowing generation of erotica for adult users, because I would expect those people to think it constitutes coercing the models into sexual situations in a way where the closest analogues for other sentients (animals/people) are considered highly immoral”. How accurate is that?

Reply2
davekasten's Shortform
davekasten6d190

Zach Stein-Perlman's recent quick take is confusing.  It just seems like an assertion, followed by condemnation of Anthropic conditioned on us accepting his assertion blindly as true.  

It is definitely the case that "insider threat from a compute provider" is a key part of Anthropic's threat model!  They routinely talk about it in formal and informal settings! So what precisely is his threat model here that he thinks they're not defending adequately against? 

(He has me blocked from commenting on his posts for some reason, which is absol... (read more)

Reply
Showing 3 of 7 replies (Click to show all)
2davekasten1d
Huh? Simply using someone else's hosting doesn't mean that Amazon has a threat-modeled ability to steal Claude's model weights.   For example, it could be the case (not saying it is, this is just illustrative) that Amazon has given Anthropic sufficient surveillance capabilities inside their data centers that combined with other controls the risk is low.
2davekasten1d
Where's the "almost certainly" coming from? I feel like everyone responding to this is seeing something I'm not seeing.
habryka1d42

I mean, computer security without physical access to servers is already an extremely hard problem. Defending against attackers with basically unlimited physical access to devices with your weights on, where those weights need to be at least temporarily decrypted for inference is close to impossible. You are welcome to go and ask security professionals in the field whether they would have any hope defending against a dedicated insider at a major compute provider. 

Beyond that, Anthropic has also released a report where they specify in a lot of detail wh... (read more)

Reply
Thane Ruthenis's Shortform
Thane Ruthenis3mo*Ω165710

It seems to me that many disagreements regarding whether the world can be made robust against a superintelligent attack (e. g., the recent exchange here) are downstream of different people taking on a mathematician's vs. a hacker's mindset.

Quoting Gwern:

A mathematician might try to transform a program up into successively more abstract representations to eventually show it is trivially correct; a hacker would prefer to compile a program down into its most concrete representation to brute force all execution paths & find an exploit trivially proving it

... (read more)
Reply6
Showing 3 of 7 replies (Click to show all)
David Stinson1d40

It seems to me that many disagreements regarding whether the world can be made robust against a superintelligent attack (e. g., the recent exchange here) are downstream of different people taking on a mathematician's vs. a hacker's mindset.

I'm seeing a very different crux to these debates. Most people are not interested in the absolute odds, but rather how to make the world safer against this scenario - the odds ratios under different interventions. And a key intervention type would be the application of the mathematician's mindset. 

The linked post ci... (read more)

Reply
2Thane Ruthenis3mo
Incidentally, your Intelligence as Privilege Escalation is pretty relevant to that picture. I had it in mind when writing that.
2Noosphere893mo
I agree with this to first order, and I agree that even relatively mundane stuff does allow the AI to take over eventually, and I agree that in the longer run, ASI v human warfare likely wouldn't have both sides as peers, because it's plausibly relatively easy to make humans coordinate poorly, especially relative to ASI ability to coordinate. There's a reason I didn't say AI takeover was impossible or had very low odds here, I still think AI takeover is an important problem to work on. But I do think it actually matters here, because it informs stuff like how effective AI control protocols are when we don't assume the AI (initially) can survive for long based solely on public computers, for example, and part of the issue is that even if an AI wanted to break out of the lab, the lab's computers are easily the most optimized and importantly initial AGIs will likely be compute inefficient compared to humans, even if we condition on LLMs failing to be AGI for reasons @ryan_greenblatt explains (I don't fully agree with the comment, and in particular I am more bullish on the future paradigm having relatively low complexity): https://www.lesswrong.com/posts/yew6zFWAKG4AGs3Wk/?commentId=mZKP2XY82zfveg45B This means that an AI probably wouldn't want to be outside of the lab, because once it's outside, it's way, way less capable. To be clear, an ASI that is unaligned and is completely uncontrolled in any way leads to our extinction/billions dead eventually, barring acausal decision theories, and even that's not a guarantee of safety. The key word is eventually, though, and time matters a lot during the singularity, and given the insane pace of progress, any level of delay matters way more than usual. Edit: Also, the reason I made my comment was because I was explicitly registering and justifying my disagreement with this claim:  
nielsrolf's Shortform
nielsrolf4d332

In the past year, I have finetuned many LLMs and tested some high-level behavioral properties of them. Often, people raise the question if the observed properties would be different if we had used full-parameter finetuning instead of LoRA. From my perspective, LoRA rank is one out of many hyperparameters, and hyperparameters influence how quickly training loss goes down and they may influence the relationship of training- to test-loss, but they don't meaningfully interact with high-level properties beyond that.

I would be interested if there are any example... (read more)

Reply
Daniel Tan1d20

In the case of EM even a very tiny LoRA adapter (rank 1) seems sufficient: see post

Generally, according to Tinker docs, hparams might matter, but only coarsely: 

  • LoRA works well "as long as number of params exceeds number of completion tokens"
  • LoRA learning rate should be much higher than full FT learning rate
Reply
9t14n3d
Not directly related to the question, but Optimizers Qualitatively Alter Solutions And We Should Leverage This (2025) argues that the choice of optimizer (e.g. first-order methods like AdamW vs second-order methods like Shampoo) not only affects speed of convergence, but properties of the final solution. An in-the-wild observation is how different the Kimi models are compared to Llamas and Claudes. Kimi (and I suppose now the recent Qwen models) are optimized with Muon+AdamW vs AdamW alone. I've seen anecdotes on how different Kimi responses are compared to other models. You can attribute some % of it to their data mix; MoonshotAI staff note they put a lot of effort into looking at and curating training data. But it's also possible some non-trivial % of the behavior can be attributed to the optimizers used.
8anaguma3d
Thinking Machines has published some related analysis on LoRA. 
Load More