LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

3117
Finding Balance & Opportunity in the Holiday Flux [free public workshop]
Berkeley Solstice Weekend
[Tomorrow]11/17/25 Monday Social 7pm-9pm @ Segundo Coffee Lab
OxRat November Pub Social
Simon Lermen's Shortform
Simon Lermen11h456

What's going on with MATS recruitment?

MATS scholars have gotten much better over time according to statistics like mentor feedback, CodeSignal scores and acceptance rate. However, some people don't think this is true and believe MATS scholars have actually gotten worse.

So where are they coming from? I might have a special view on MATS applications since I did MATS 4.0 and 8.0. I think in both cohorts, the heavily x-risk AGI-pilled participants were more of an exception than the rule.

"at the end of a MATS program half of the people couldn't really tell... (read more)

Reply21
Showing 3 of 5 replies (Click to show all)
Daniel Tan8m30

Disagree somewhat strongly with a few points: 

Intuitively it seems to me that people with zero technical skill but high understanding are more valuable to AI safety than somebody with good skills who has zero understanding of AI safety.

IMO not true. Maybe early on we needed really good conceptual work, and so wanted people who could clearly articulate pros / cons of Paul Christiano and Yudkowsky's alignment strategies, etc. So it would have made sense to test accordingly. But I think this is less true now - most senior researchers have more good ideas... (read more)

Reply
3Mateusz Bagiński4h
Perhaps the mentors changed, and the current ones put much more value on stuff like being good at coding, running ML experiments, etc, than on understanding the key problems, having conceptual clarity around AI X-risk, etc. There's certainly more of an ML-streetlighting effect. The most recent track has 5 mentors on "Agency", out of whom (AFAICT), 2 work on "AI agents", 1 works mostly on AI consciousness & welfare, and only two (Ngo & Richardson) work on "figuring out the principles of how [the thing we are trying to point at with the word 'agency'] works". MATS 3.0 (?) had 6 mentors focused on something in this ballpark (Wentworth & Kosoy, Soares & Hebbar, Armstrong & Gorman) (and the total number of mentors was smaller). It might also be the case that there's proportionally more mentors working for capabilities labs.
6Simon Lermen9h
probably closer to 55%
Aprillion (Peter Hozák)'s Shortform
Aprillion35m10

Imitation of error handling with try-catch slop is not "defensive programming", it's "friendly-fire programming" 🤌

Reply
jacquesthibs's Shortform
jacquesthibs1d*6115

Habryka responding to Ryan Kidd:

> the bar at MATS has raised every program for 4 years now

What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn't surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly). 

It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y'all and you thought it was worth the trade

... (read more)
Reply111
Showing 3 of 6 replies (Click to show all)
2Jonas Hallgren6h
I also just want to point out that there should be a base rate here that's higher context in the beginning since before MATS and similar there weren't really that many AI Safety training programs.  So the intiial people that you get will automatically be higher context because the sample is taken from people who have already worked on it/learnt about it for a while. This should go down over time due to the higher context individuals being taken in? (I don't know how large this effect would be but I would just want to point it out.)
2Erich_Grunewald11h
Does it assume that? There are many ways for governments to adjust for d/acc tech being less innately appealing by intervening on market incentives, for example, through subsidies, tax credits, benefits for those who adopt these products, etc. Doing that may for various ways be more tractable than command-and-control regulation. But either way, doing either (incentivising or mandating) seems easier once the tech actually exists and is somewhat proven, so you may want founders to start d/acc projects even if you think they would not become profitable in the free market and even if you want to mandate that tech eventually. (That is not to say that there is a lot of useful d/acc tech that awaits being created, and that if implemented would make a major difference. I just think that, if there is, then that tech being able to compete economically isn't necessarily a huge problem.)
julius vidal1h10

You are right that I am being a bit reductive. Maybe it would be better to say it assumes some kind of ideal combination of innovation, markets and technocratic governance would be enough to prevent catastrophe?

 

And to be clear I do think its much better for people to be working on defensive technologies, than not to. And its not impossible that the right combination of defensive entrepreneurs and technocratic government incentives could genuinely solve a problem.

 

But I think this kind of faith in business as usual but a bit better can lead to a kind of complacency where you conflate working on good things with actually making a difference. 

Reply
Mo Putera's Shortform
Mo Putera3d4216

A sad example of what Scott Aaronson called bureaucratic blankface: Hannah Cairo, who at 17 published a counterexample to the longstanding Mizohata-Takeuchi conjecture which electrified harmonic analysis experts the world over, decided after completing the proof to apply to 10 graduate programs. 6 rejected her because she didn't have a graduate degree nor a high school diploma (she'd been advised by Zvezdelina Stankova, founder of the top-tier Berkeley Math Circle, to skip undergrad at 14 and enrol straight in grad-level courses as she'd already taught her... (read more)

Reply111
Showing 3 of 4 replies (Click to show all)
George Ingebretsen2d80

Relatedly, Staknova’s Berkeley Math Circle program was recently shut down due to new stringent campus background check requirements. Very sad.

Also, she was my undergrad math professor last year and was great.

Reply1
1Shankar Sivarajan3d
This doesn't mean what you think it means. It's code for racial discrimination.
2Mo Putera3d
Yeah wonder what Tabarrok meant by that, he'd obviously know this.
Wei Dai's Shortform
Wei Dai18h*2919

Having finally experienced the LW author moderation system firsthand by being banned from an author's posts, I want to make two arguments against it that may have been overlooked: the heavy psychological cost inflicted on a commenter like me, and a structural reason why the site admins are likely to underweight this harm and its downstream consequences.

(Edit: To prevent a possible misunderstanding, this is not meant to be a complaint about Tsvi, but about the LW system. I understand that he was just doing what he thought the LW system expected him to do. I... (read more)

Reply11
Showing 3 of 19 replies (Click to show all)
MondSemmel2h41

Thanks, that was a clear way to describe both perspectives here. Very helpful.

Reply
2dr_s3h
I think the answer to this is, "because the post, specifically, is the author's private space". So they get to decide how to conduct discussion there (for reference, I always set moderation to Easy Going on mine, but I can see a point even to Reign of Terror if the topic is spicy enough). The free space for responses and rebuttals isn't supposed to be the comments of the post, but the ability to write a different post in reply. I do agree that in general if it comes to that - authors banning each other from comments and answering just via new posts - then maybe things have already gotten a bit too far into "internet drama" land and everyone could use some cooling down. And it's generally probably easier to keep discussions on a post in the comments of the post. But I don't think the principle is inherently unfair; you have the same exact rights as the other person and can always respond symmetrically, that's fairness.
2Wei Dai2h
I think that's the official explanation, but even the site admins don't take it seriously. Because if this is supposed to be true, then why am I allowed to write and post replies directly from the front page Feed, where all the posts and comments from different authors are mixed together, and authors' moderation policies are not shown anywhere? Can you, looking at that UI, infer that those posts and comments actually belong to different "private spaces" with different moderators and moderation policies?
Tapatakt's Shortform
Tapatakt3d2-2

Does anyone pushing writing for raising awareness about AI risks to be more simple?

Not inferential-distance-simple, but stylistically-simple.

I translate online materials for IABIED into Russian. It has sentences like this:

The wonder of natural selection is not its robust error-correction covering every pathway that might go wrong; now that we’re dying less often to starvation and injury, most of modern medicine is treating pieces of human biology that randomly blow up in the absence of external trauma.

This is not cherrypicked at all. It's from the last pag... (read more)

Reply1
Viliam2h20

Who is the target audience? If general population, it is bad. If educated people who identify as "I am very smart", it is good.

Reply
the gears to ascenscion's Shortform
the gears to ascension2h21

"I opened lesswrong to look something up and it overwrote my brain state, oops"

This is a sentence I just said and want to not say many more times in my life. I will think later about what to do about it.

Reply
Alex_Altair's Shortform
Alex_Altair1d430

A hot math take

As I learn mathematics I try to deeply question everything, and pay attention to which assumptions are really necessary for the results that we care about. Over time I have accumulated a bunch of “hot takes” or opinions about how conventional math should be done differently. I essentially never have time to fully work out whether these takes end up with consistent alternative theories, but I keep them around.

In this quick-takes post, I’m just going to really quickly write out my thoughts about one of these hot takes. That’s because I’m doing... (read more)

Reply211
Showing 3 of 10 replies (Click to show all)
tailcalled5h42

The distributivity property is closely related to multiplication being repeated addition. If you break one of the numbers apart into a sum of 1s and then distribute over the sum, you get repeated addition.

Reply
9TsviBT11h
My guess would be that we actually want to view there as being multiple basic/intuitive cognitive starting points, and they'd correspond to different formal models. As an example, consider steps / walking. It's pretty intuitive that if you're on a straight path, facing in one fixed direction, there's two types of actions--walk forward a step, walk backward a step--and that these cancel out. This corresponds to addition and subtraction, or addition of positive numbers and addition of negative numbers. In this case, I would say that it's a bit closer to the intuitive picture if we say that "take 3 steps backward" is an action, and doing actions one after the other is addition, and so that action would be the object "-3"; and then you get the integers. I think there just are multiple overlapping ways to think of this, including multiple basic intuitive ones. This is a strange phenomenon, one which Sam has pointed out. I would say it's kinda similar to how sometimes you can refactor a codebase infinitely, or rather, there's several different systemic ways to factor it, and they are each individually coherent and useful for some niche, but there's not necessarily a clear way to just get one system that has all the goodnesses of all of them and is also a single coherent system. (Or maybe there is, IDK. Or maybe there's some elegant way to have it all.) Another example might be "addition as combining two continuous quantities" (e.g. adding some liquid to some other liquid, or concatenating two lengths). In this case, the unit is NOT basic, and the basic intuition is of pure quantity; so we really start with R.
4Sam Marks13h
It sounds like you might be looking for Peano's axioms for arithmetic (which essentially formalize addition as being repeated "add 1" and multiplication as being repeated addition) or perhaps explicit constructions of various number systems (like those described here). The drawback of these definitions is that they don't properly situate these numbers systems as "core" examples of rings. For example, one way to define the integers is to first define a ring and then define the integers to be the "smallest" or "simplest" ring (formally: the initial object in the category of rings). From this, you can deduce that all integers can be formed by repeatedly summing 1s or −1s (else you could make a smaller ring by getting rid of the elements that aren't sums of 1s and −1s) and that multiplication is repeated addition (because a⋅b=a⋅(1+⋯+1)=a+⋯+a where there are b terms in these sums). (It's worth noting that it's not the case in all rings that multiplication, addition, and "plus 1" are related in these ways. E.g. it would be rough to argue that if A and B are matrices then the product AB corresponds to summing A with itself B times. So I think it's a reasonable perspective that multiplication and addition are independent "in general" but the simplicity of the integers forces them to be intertwined). Some other notes: 1. Defining −a to be the additive inverse of a is the same as defining it as the solution to a+x=0. No matter which approach you take, you need to prove the same theorems to show that the notion makes sense (e.g. you need to prove that −a+−b=−(a+b)). 2. Similarly, taking Q to be the field of fractions of Z is equivalent to adding insisting that all equations ax=b have a solution, and the set of theorems you need to prove to make sure this is reasonable are the same. 3. In general, note that giving a definition doesn't mean that there's actually any object that actually satisfies that definition. E.g. I can perfectly well define α to be an integer such th
Brendan Long's Shortform
Brendan Long2d*70

I got to approximately my goal weight (18% body fat) and wanted to start gaining muscle[1] instead, so I stopped taking retatrutide to see what would happen. Nothing changed for about two weeks and then suddenly I was completely ravenous and ended up just wanting snack food. It's weird because I definitely used to always feel that way, and it was just "normal". I mostly kept the weight gain at bay with constant willpower.

I'm going to try taking around a quarter of my previous dose and see if it makes it easier to stay at approximately this weight and ... (read more)

Reply
2MichaelDickens19h
Are you also lifting weights? I'm quite confident that you can gain muscle while taking retatrutide if you lift weights. IIRC GLP-1 agonists cause more muscle loss than "old-fashioned" dieting, but the effect of resistance training far outweigh the extra muscle loss.
Brendan Long7h20

Yeah muscle loss hasn't been a problem for me. I can do more pull-ups, push-ups and hike longer and faster than when I started. Progress was really slow with a significant calorie deficit.

I'm trying a much lower dose now to see if I can build muscle without rapidly regaining the weight.

Separately, I'm just really bad at dealing with the complexity of weights. I'm going to see if Crossfit helps this week.

Reply
Rachel Shu's Shortform
Rachel Shu7h-10

I just co-wrote some highly novel metaphysics with my good buddy Claude, what could possibly go wrong?

Reply
Daniel Paleka's Shortform
Daniel Paleka11h10

Four views of AI automation: model, researcher, lab, economy

Every serious AI lab wants to automate themselves. I believe this sentence to hold predictive power over AI timelines and all other predictions about the future. In particular, I believe taking the AI-lab-centric view is the right way to think about automation.

In this post, I want to present the different levels of abstraction at which AI automation can be thought of:

  1. Model: The model is being optimized by the lab. The right measure of acceleration is how much a model can autonomously do on tasks o
... (read more)
Reply
Jemist's Shortform
J Bostock14h60

Spitballing:

Deep learning understood as a process of up- and down-weighting circuits is incredibly similar conceptually to logical induction.

Pre- and post-training LLMs is like juicing the market so that all the wealthy traders are different human personas, then giving extra liquidity to the ones we want.

I expect that the process of an agent cohering from a set of drives into a single thing is similar to the process of a predictor inferring the (simplicity-weighted) goals of an agent by observing it. RLVR is like rewarding traders which successfully predic... (read more)

Reply
Wei Dai's Shortform
Wei Dai6dΩ590

An update on this 2010 position of mine, which seems to have become conventional wisdom on LW:

In my posts, I've argued that indexical uncertainty like this shouldn't be represented using probabilities. Instead, I suggest that you consider yourself to be all of the many copies of you, i.e., both the ones in the ancestor simulations and the one in 2010, making decisions for all of them. Depending on your preferences, you might consider the consequences of the decisions of the copy in 2010 to be the most important and far-reaching, and therefore act mostly

... (read more)
Reply
2niplav17h
What made you believe that? I find it hard to even conceptualize how to think through something like that, including the anthropics, which computationally powerful universes to admit, &c. My intuition is that allowing universes with hypercomputation puts us in a dovetailer being run almost surely somewhere in the most computationally powerful universes, but that this all introduces a ton of difficulties into reasoning about the multiverse and our position inside of it.
Wei Dai17h40

Yeah, my intuition is similar to yours, and it seems very difficult to reason about all of this. That just represents my best guess.

Reply1
GradientDissenter's Shortform
GradientDissenter3d*718

The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.

This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value... (read more)

Reply11
Showing 3 of 15 replies (Click to show all)
8RationalElf1d
Tone note: I really don't like people responding to other people's claims with content like "No. Bad... Bad naive consequentialism" (I'm totally fine with "Really not what I support. Strong disagree."). It reads quite strongly to me as trying to scold someone or socially punish them using social status for a claim that you disagree with; they feel continuous with some kind of frame that's like "habryka is the arbiter of the Good"  
MichaelDickens17h40

FWIW I think Habryka was right to call out that some parts of my comment were bad, and the scolding got me to think more carefully about it.

Reply2
7habryka1d
It sounds like scolding someone because it is! Like, IDK, sometimes that's the thing you want to do? I mean, I am not the "arbiter of the good", but like, many things are distasteful and should be reacted to as such. I react similarly to people posting LLM slop on LW (usually more in the form of "wtf, come on man, please at least write a response yourself, don't copy paste from an LLM") and many other things I see as norm violations.  I definitely consider the thing I interpreted Michael to be saying a norm violation of LessWrong, and endorse lending my weight to norm enforcement of that (he then clarified in a way that I think largely diffused the situation, but I think I was pretty justified in my initial reaction). Not all spaces I participate in are places where I feel fine participating in norm enforcement, but of course LessWrong is one such place!  Now, I think there are fine arguments to be made that norm enforcement should also happen at the explicit intellectual level and shouldn't involve more expressive forms of speech. IDK, I am a bit sympathetic to that, but feel reasonably good about my choices here, especially given that Michael's comment started with "I agree", therefore implying that the things he was saying were somehow reflective of my personal opinion. It seems eminently natural that when you approach someone and say "hey, I totally agree with you that <X>" where X is something they vehemently disagree with (like, IDK imagine someone coming to you and saying "hey, I totally agree with you that child pornography should be legal" when you absolutely do not believe this), that they respond the kind of way I did. Overall, feedback is still appreciated, but I think I would still write roughly the same comment in a similar situation!
julius vidal's Shortform
[+]julius vidal22h-6-2
Jemist's Shortform
J Bostock21h50

Two Kinds of Empathy

Seems like there's two strands of empathy that humans can use.

The first kind is emotional empathy, where you put yourself in someone's place and imagine what you would feel. This one usually leads to sympathy, giving material assistance, comforting.

The second kind is agentic empathy, where you put yourself in someone's place and imagine what you would do. This one more often leads to giving advice.

A common kind of problem occurs when we deploy one type of empathy but not the other. John Wentworth has written about how (probably due to l... (read more)

Reply
Cole Wyeth's Shortform
Cole Wyeth1d181

Inkhaven is an interesting time where any engagement with e.g. Wentworth, Demski, or habryka empirically earns a full post in reply :)

Reply61
testingthewaters's Shortform
testingthewaters1d20

Follow up to https://vitalik.eth.limo/general/2025/11/07/galaxybrain.html

Here is a galaxy brain argument I see a lot:

"We should do [X], because people who are [bad quality] are trying to do [X] and if they succeed the consequences will be disastrous."

Usually [X] is some dual use strategy (acquire wealth and power, lie to their audience, build or use dangerous tech) and [bad quality] is something like being reckless, malicious, psychopathic etc. Sometimes the consequence is zero sum (they get more power to use to do Bad Things relative to us, the Good Peopl... (read more)

Reply
leogao's Shortform
leogao2d453

creating surprising adversarial attacks using our recent paper on circuit sparsity for interpretability

we train a model with sparse weights and isolate a tiny subset of the model (our "circuit") that does this bracket counting task where the model has to predict whether to output ] or ]]. It's simple enough that we can manually understand everything about it, every single weight and activation involved, and even ablate away everything else without destroying task performance.

(this diagram is for a slightly different task because i spent an embarassingly la... (read more)

Reply
Showing 3 of 4 replies (Click to show all)
30Thane Ruthenis1d
Aside: For me, this paper is potentially the most exciting interpretability result of the past several years (since SAEs). Scaling it to GPT-3 and beyond seems like a very promising direction. Great job!
habryka1d13-6

I agree! I admit I am not optimistic, but I am still very glad to see this.

Reply
5leogao2d
i don't have a graph for it. the corresponding number is p(correct) = 0.25 at 63 elements for the one dense model i ran this on. (the number is not in the paper yet because this last result came in approximately an hour ago) the other relevant result in the paper for answering the question of how similar our sparse models are to dense models is figure 33
Wei Dai's Shortform
Wei Dai12d402

The Inhumanity of AI Safety

A: Hey, I just learned about this idea of artificial superintelligence. With it, we can achieve incredible material abundance with no further human effort!

B: Thanks for telling me! After a long slog and incredible effort, I'm now a published AI researcher!

A: No wait! Don't work on AI capabilities, that's actually negative EV!

B: What?! Ok, fine, at huge personal cost, I've switched to AI safety.

A: No! The problem you chose is too legible!

B: WTF! Alright you win, I'll give up my sunken costs yet again, and pick something illegible.... (read more)

Reply
Showing 3 of 9 replies (Click to show all)
9Richard_Ngo10d
If a person is courageous enough to actually try to solve a problem (like AI safety), and high-integrity enough to avoid distorting their research due to social incentives (like incentives towards getting more citations), and honest enough to avoid self-deception about how to interpret their research, then I expect that they will tend towards doing "illegible" research even if they're not explicitly aware of the legible/illegible distinction. One basic mechanism is that they start pursuing lines of thinking that don't immediately make much sense to other people, and the more cutting-edge research they do the more their ontology will diverge from the mainstream ontology.
8Wei Dai9d
This has pretty low argumentative/persuasive force in my mind. Why? I'm not seeing the logic of how your premises lead to this conclusion. And even if there is this tendency, what if someone isn't smart enough to come up with a new line of illegible research, but does see some legible problem with an existing approach that they can contribute to? What would cause them to avoid this? And even the hypothetical virtuous person who starts doing illegible research on their own, what happens when other people catch up to him and the problem becomes legible to leaders/policymakers? How would they know to stop working on that problem and switch to another problem that is still illegible?
Richard_Ngo2d20

This has pretty low argumentative/persuasive force in my mind.

Note that my comment was not optimized for argumentative force about the overarching point. Rather, you asked how they "can" still benefit the world, so I was trying to give a central example.

In the second half of this comment I'll give a couple more central examples of how virtues can allow people to avoid the traps you named. You shouldn't consider these to be optimized for argumentative force either, because they'll seem ad-hoc to you. However, they might still be useful as datapoints.

Figurin... (read more)

Reply
Load More