Parv Mahajan

15d

The evening after Claude’s new constitution was published, about 15 AI safety FTEs and Astra fellows discussed the constitution, its weaknesses, and its implications. After the discussion, I compiled some of their most compelling recommendations:

Increase transparency about the character training process.
Much of the document is purposefully hedged and vague in its exact prescriptions; therefore, the training process used to instill the constitution is extremely load-bearing. We wish more of this information was in the accompanying blog post and supplementary material. We think it’s unlikely this leaks any trade secrets, because even a blogpost-level overview, the kind given with the constitution in 2023, would provide valuable information to external researchers.

High-level overview of Constitutional... (read 344 more words →)

Replying toYou will be OK

You will be OK

To clarify, the original post was not meant to be resigned or maximally doomerish. I intend to win in worlds where winning is possible, and I was trying to get across the feeling of doing that while recognizing things are likely(?) to not be okay.

I agree that being in the daily, fight-or-flight, anxiety-inducing super-emergency mode of thought that thinking about x-risk can induce is very bad. But it's important to note you can internalize the risks and probable futures very deeply, including emotionally, while still being productive, happy, sane, etc. High distaste for drama, forgiving yourself and picking yourself up, etc.

This is what I was trying to gesture at, and I think what Boaz is aiming at as well.

Replying toTurning 20 in the probable pre-apocalypse

I think relative impact is an important measure (e.g., for comparing yourself/your org to others in a reference class), but worry about relative-impact-as-a-morale-booster leading to a belief-in-belief. It can be true that I am a better sprinter than my neighbor, but we will both lose to a 747, and it is important for me to internalize that. I think you can be happy/sane while internalizing that!

Replying toTurning 20 in the probable pre-apocalypse

Thanks for the link and advice! Based on some reactions here + initial takes from friends, I think the tone of this post came off much more burn-outy and depressed than I wanted; I feel pretty happy most days, even as I recognize things are Very Strange and grieve more than the median. I also am lucky enough to have a very high bar for burnout, and have made many plans and canaries of what to do in case that day comes.

I think for me, and people in my cluster, getting out of the fight-and-flight mode like you mentioned is very important, but it's also very important to recognize the oddity and urgency of the situation. Psychological pain is not a necessary reaction to the situation we find ourselves in, but it is, in moderation and properly handled, a reasonable one. I worry somewhat about a feeling of Deep Okayness leading to an unfounded belief in "it's all going to be okay."

Hope you're doing well :)

Replying toTurning 20 in the probable pre-apocalypse

Probably not completely - I suspect this is a mix of non-AI things in my life and the fact that there is a very small circle of folks near me that care/internalize this kind of thing. However, I’d bet that the farther you get from traditional tech circles (e.g., SF), the stronger this feeling is among folks that work on AI safety.

Replying toTurning 20 in the probable pre-apocalypse

I don’t know enough about 00s activism to comment on it confidently, but I would be highly confused if MIRI started a govt/bought sovereign land because it doesn’t seem to align with counterfactually reducing AI takeover risk, and probably fails in the takeover scenarios they’re concerned about anyway. I also get the impression MIRI/OP made somewhat reasonable decisions in the face of high uncertainty, but feel much less confident about that.

That being said, I‘m lucky to have an extremely high bar for burnout and high capacity for many projects at once. I’ve of course made plans of what to loudly give up on in case of burnout, but don’t expect those to be used in the near future. Like I gestured at in the post, I think today’s tools are quite good at multiplying effective output in a way that’s very fun and burnout-reducing!

Replying toTurning 20 in the probable pre-apocalypse

Yes, I think most of this is good advice, except I think 1% is perhaps a reasonable target (I think it’s reasonable that Ryan Kidd or Neel Nanda have 1%-level impacts, maybe?).

Also, yes, of course one must simply try their best. Extraordinary times call for extraordinary effort and all that. I do want to caution against trying to believe in order to raise general morale. Belief-in-belief is how you get incorrect assessments of the risks from key stakeholders; I think the goal is a culture like „yes, this probably won’t help enough, but we make a valiant effort because this is highly impactful on the margin and we intend to win in worlds where it’s possible to win.“

Maybe in general I find it unconvincing that despair precludes effort; things are not yet literally hopeless.

Replying toTurning 20 in the probable pre-apocalypse

That's funny, I was going to mention the same Jacob Geller video you linked to! It's a really evocative title; probably has inspired lots of similar essays. "Intangible distress" and especially "alienation" are really good at capturing the mood in a lot of CS departments right now.

Replying toTurning 20 in the probable pre-apocalypse

Thank you, I'm glad(?) it resonated. I liked "Mourning a life without AI" a lot and reading that encouraged me to publish this.

Replying toTurning 20 in the probable pre-apocalypse

Thanks! I'm surprised it was emotionally impactful, but can definitely see it being relatable. I've found a lot of (especially early-career) AIS folks dealing with this "my friends and family don't internalize this," but I think this will change once job losses start hitting (thus the "permanent underclass" discourse).

Seriously, use text expansions

2mo

Master version of this on https://parvmahajan.com/2025/12/21/turning-20.html

I turn 20 in January, and the world looks very strange. Probably, things will change very quickly. Maybe, one of those things is whether or not we’re still here.

This moment seems very fragile, and perhaps more than most moments will never happen again. I want to capture a little bit of what it feels like to be alive right now.

I.

Everywhere around me there is this incredible sense of freefall and of grasping. I realize with excitement and horror that over a semester Claude went from not understanding my homework to easily solving it, and I recognize this is the most normal things will ever be. Suddenly, the... (read 657 more words →)

410

METR uses the 10x researcher speedup as described by Ryan Greenblatt below as an important threshold of concern. The 10x constant seems quite important here because METR reports, compared to most other independent model assessments, seem very likely to influence lab + govt policy decisions.

Is there any work explaining why 10x, and not 15x or 5x?

Greenblatt laying out 3x, 5x, 10x acceleration qualitatively: https://www.alignmentforum.org/posts/LjgcRbptarrRfJWtR/a-breakdown-of-ai-capability-levels-focused-on-ai-r-and-d

METR report relying on 10x: https://evaluations.metr.org/gpt-5-1-codex-max-report/#extrapolating-on-trend-improvements-in-next-6-months

TastyBench: Toward Measuring Research Taste in LLM

2mo

You probably type some things a lot, especially if you do non-trivial amounts of administrative or communicative work. There are also probably things you would type more if it was easier to! For instance:

Link to your personal website
- Your email address or phone number
- Your LinkedIn
- Your calendar booking link
- Your GitHub profile
- Your Google Scholar
The website/email/contact of an organization you represent
- “I am contacting you on behalf of XYZ org. We work on…”
Command aliases across servers/terminals
- Commit message templates
- SSH incantations
Common phrases/sentence fragments
- “Just wanted to follow up on this!”
- “Please reach out to”
- “Please don’t hesitate to reach out if you have any questions or concerns.”
LLM prompts
- Something to the effect of “think really carefully about this”
- Style/content guidelines for deep research, specialized

... (read 185 more words →)

o3 writing sample (representative)

Parv Mahajan, Yilin, yix

3mo

This is an early stage research update. We love feedback and comments!

TL;DR:

It’s important to benchmark frontier models on non-engineering skills required for AI R&D in order to comprehensively understand progress towards full automation in frontier labs.
One of these skills is research taste, which includes the ability to choose good projects (e.g., those that accelerate AI progress). In TastyBench, we operationalize this as citation velocity - the rate at which a paper receives citations.
Based on pairwise rankings from summarized papers, we find ~frontier models are quite bad at predicting citation velocity, and conclude they do not have superhuman research taste.
We suspect citation proxy is a flawed proxy and are continuing to explore non-engineering

... (read 1729 more words →)

We're starting to get awfully close to a "system capable of writing things that we deem worth reading." Recent models quietly got really good at independent creative writing, especially if you have a slightly fleshed-out initial idea.

Reading LLM writing captures the broader feeling of "maybe close to TAI" better than SWE-bench et. al alone.

Arthur Neegan shifted in the acceleration couch, the webbing rasping against his coverall. In front of him a thumb‑sized display, bolted to the bulkhead with four utilitarian screws, scrolled through diagnostics in lurid green letters:

ATTITUDE 000.03°

REACTION MASS

78 % CABIN pO₂

20.8 kPa

CREW: 3 + 1 GUEST

Guest, Arthur thought sourly. More like conscript.

Across the narrow aisle two Antares “diplomatic aides”

... (read 550 more words →)

Parv Mahajan's Shortform