MATS scholars have gotten much better over time according to statistics like mentor feedback, CodeSignal scores and acceptance rate. However, some people don't think this is true and believe MATS scholars have actually gotten worse.
So where are they coming from? I might have a special view on MATS applications since I did MATS 4.0 and 8.0. I think in both cohorts, the heavily x-risk AGI-pilled participants were more of an exception than the rule.
"at the end of a MATS program half of the people couldn't really tell...
Disagree somewhat strongly with a few points:
Intuitively it seems to me that people with zero technical skill but high understanding are more valuable to AI safety than somebody with good skills who has zero understanding of AI safety.
IMO not true. Maybe early on we needed really good conceptual work, and so wanted people who could clearly articulate pros / cons of Paul Christiano and Yudkowsky's alignment strategies, etc. So it would have made sense to test accordingly. But I think this is less true now - most senior researchers have more good ideas...
Habryka responding to Ryan Kidd:
...> the bar at MATS has raised every program for 4 years now
What?! Something terrible must be going on in your mechanisms for evaluating people (which to be clear, isn't surprising, indeed, you are the central target of the optimization that is happening here, but like, to me it illustrates the risks here quite cleanly).
It is very very obvious to me that median MATS participant quality has gone down continuously for the last few cohorts. I thought this was somewhat clear to y'all and you thought it was worth the trade
You are right that I am being a bit reductive. Maybe it would be better to say it assumes some kind of ideal combination of innovation, markets and technocratic governance would be enough to prevent catastrophe?
And to be clear I do think its much better for people to be working on defensive technologies, than not to. And its not impossible that the right combination of defensive entrepreneurs and technocratic government incentives could genuinely solve a problem.
But I think this kind of faith in business as usual but a bit better can lead to a kind of complacency where you conflate working on good things with actually making a difference.
A sad example of what Scott Aaronson called bureaucratic blankface: Hannah Cairo, who at 17 published a counterexample to the longstanding Mizohata-Takeuchi conjecture which electrified harmonic analysis experts the world over, decided after completing the proof to apply to 10 graduate programs. 6 rejected her because she didn't have a graduate degree nor a high school diploma (she'd been advised by Zvezdelina Stankova, founder of the top-tier Berkeley Math Circle, to skip undergrad at 14 and enrol straight in grad-level courses as she'd already taught her...
Relatedly, Staknova’s Berkeley Math Circle program was recently shut down due to new stringent campus background check requirements. Very sad.
Also, she was my undergrad math professor last year and was great.
Having finally experienced the LW author moderation system firsthand by being banned from an author's posts, I want to make two arguments against it that may have been overlooked: the heavy psychological cost inflicted on a commenter like me, and a structural reason why the site admins are likely to underweight this harm and its downstream consequences.
(Edit: To prevent a possible misunderstanding, this is not meant to be a complaint about Tsvi, but about the LW system. I understand that he was just doing what he thought the LW system expected him to do. I...
Thanks, that was a clear way to describe both perspectives here. Very helpful.
Not inferential-distance-simple, but stylistically-simple.
I translate online materials for IABIED into Russian. It has sentences like this:
The wonder of natural selection is not its robust error-correction covering every pathway that might go wrong; now that we’re dying less often to starvation and injury, most of modern medicine is treating pieces of human biology that randomly blow up in the absence of external trauma.
This is not cherrypicked at all. It's from the last pag...
Who is the target audience? If general population, it is bad. If educated people who identify as "I am very smart", it is good.
As I learn mathematics I try to deeply question everything, and pay attention to which assumptions are really necessary for the results that we care about. Over time I have accumulated a bunch of “hot takes” or opinions about how conventional math should be done differently. I essentially never have time to fully work out whether these takes end up with consistent alternative theories, but I keep them around.
In this quick-takes post, I’m just going to really quickly write out my thoughts about one of these hot takes. That’s because I’m doing...
The distributivity property is closely related to multiplication being repeated addition. If you break one of the numbers apart into a sum of 1s and then distribute over the sum, you get repeated addition.
I got to approximately my goal weight (18% body fat) and wanted to start gaining muscle[1] instead, so I stopped taking retatrutide to see what would happen. Nothing changed for about two weeks and then suddenly I was completely ravenous and ended up just wanting snack food. It's weird because I definitely used to always feel that way, and it was just "normal". I mostly kept the weight gain at bay with constant willpower.
I'm going to try taking around a quarter of my previous dose and see if it makes it easier to stay at approximately this weight and ...
Yeah muscle loss hasn't been a problem for me. I can do more pull-ups, push-ups and hike longer and faster than when I started. Progress was really slow with a significant calorie deficit.
I'm trying a much lower dose now to see if I can build muscle without rapidly regaining the weight.
Separately, I'm just really bad at dealing with the complexity of weights. I'm going to see if Crossfit helps this week.
Every serious AI lab wants to automate themselves. I believe this sentence to hold predictive power over AI timelines and all other predictions about the future. In particular, I believe taking the AI-lab-centric view is the right way to think about automation.
In this post, I want to present the different levels of abstraction at which AI automation can be thought of:
Spitballing:
Deep learning understood as a process of up- and down-weighting circuits is incredibly similar conceptually to logical induction.
Pre- and post-training LLMs is like juicing the market so that all the wealthy traders are different human personas, then giving extra liquidity to the ones we want.
I expect that the process of an agent cohering from a set of drives into a single thing is similar to the process of a predictor inferring the (simplicity-weighted) goals of an agent by observing it. RLVR is like rewarding traders which successfully predic...
An update on this 2010 position of mine, which seems to have become conventional wisdom on LW:
...In my posts, I've argued that indexical uncertainty like this shouldn't be represented using probabilities. Instead, I suggest that you consider yourself to be all of the many copies of you, i.e., both the ones in the ancestor simulations and the one in 2010, making decisions for all of them. Depending on your preferences, you might consider the consequences of the decisions of the copy in 2010 to be the most important and far-reaching, and therefore act mostly
Yeah, my intuition is similar to yours, and it seems very difficult to reason about all of this. That just represents my best guess.
The advice and techniques from the rationality community seem to work well at avoiding a specific type of high-level mistake: they help you notice weird ideas that might otherwise get dismissed and take them seriously. Things like AI being on a trajectory to automate all intellectual labor and perhaps take over the world, animal suffering, longevity, cryonics. The list goes on.
This is a very valuable skill and causes people to do things like pivot their careers to areas that are ten times better. But once you’ve had your ~3-5 revelations, I think the value...
FWIW I think Habryka was right to call out that some parts of my comment were bad, and the scolding got me to think more carefully about it.
Seems like there's two strands of empathy that humans can use.
The first kind is emotional empathy, where you put yourself in someone's place and imagine what you would feel. This one usually leads to sympathy, giving material assistance, comforting.
The second kind is agentic empathy, where you put yourself in someone's place and imagine what you would do. This one more often leads to giving advice.
A common kind of problem occurs when we deploy one type of empathy but not the other. John Wentworth has written about how (probably due to l...
Follow up to https://vitalik.eth.limo/general/2025/11/07/galaxybrain.html
Here is a galaxy brain argument I see a lot:
"We should do [X], because people who are [bad quality] are trying to do [X] and if they succeed the consequences will be disastrous."
Usually [X] is some dual use strategy (acquire wealth and power, lie to their audience, build or use dangerous tech) and [bad quality] is something like being reckless, malicious, psychopathic etc. Sometimes the consequence is zero sum (they get more power to use to do Bad Things relative to us, the Good Peopl...
creating surprising adversarial attacks using our recent paper on circuit sparsity for interpretability
we train a model with sparse weights and isolate a tiny subset of the model (our "circuit") that does this bracket counting task where the model has to predict whether to output ] or ]]. It's simple enough that we can manually understand everything about it, every single weight and activation involved, and even ablate away everything else without destroying task performance.
(this diagram is for a slightly different task because i spent an embarassingly la...
I agree! I admit I am not optimistic, but I am still very glad to see this.
The Inhumanity of AI Safety
A: Hey, I just learned about this idea of artificial superintelligence. With it, we can achieve incredible material abundance with no further human effort!
B: Thanks for telling me! After a long slog and incredible effort, I'm now a published AI researcher!
A: No wait! Don't work on AI capabilities, that's actually negative EV!
B: What?! Ok, fine, at huge personal cost, I've switched to AI safety.
A: No! The problem you chose is too legible!
B: WTF! Alright you win, I'll give up my sunken costs yet again, and pick something illegible....
This has pretty low argumentative/persuasive force in my mind.
Note that my comment was not optimized for argumentative force about the overarching point. Rather, you asked how they "can" still benefit the world, so I was trying to give a central example.
In the second half of this comment I'll give a couple more central examples of how virtues can allow people to avoid the traps you named. You shouldn't consider these to be optimized for argumentative force either, because they'll seem ad-hoc to you. However, they might still be useful as datapoints.
Figurin...