Interested in national security implications of catastrophic AI risk. Reach out whenever!
Yes, I think most of this is good advice, except I think 1% is perhaps a reasonable target (I think it’s reasonable that Ryan Kidd or Neel Nanda have 1%-level impacts, maybe?).
Also, yes, of course one must simply try their best. Extraordinary times call for extraordinary effort and all that. I do want to caution against trying to believe in order to raise general morale. Belief-in-belief is how you get incorrect assessments of the risks from key stakeholders; I think the goal is a culture like „yes, this probably won’t help enough, but we make a valiant effort because this is highly impactful on the margin and we intend to win in worlds where it’s possible to win.“
Maybe in general I find it unconvincing that despair precludes effort; things are not yet literally hopeless.
That's funny, I was going to mention the same Jacob Geller video you linked to! It's a really evocative title; probably has inspired lots of similar essays. "Intangible distress" and especially "alienation" are really good at capturing the mood in a lot of CS departments right now.
Thank you, I'm glad(?) it resonated. I liked "Mourning a life without AI" a lot and reading that encouraged me to publish this.
Thanks! I'm surprised it was emotionally impactful, but can definitely see it being relatable. I've found a lot of (especially early-career) AIS folks dealing with this "my friends and family don't internalize this," but I think this will change once job losses start hitting (thus the "permanent underclass" discourse).
METR uses the 10x researcher speedup as described by Ryan Greenblatt below as an important threshold of concern. The 10x constant seems quite important here because METR reports, compared to most other independent model assessments, seem very likely to influence lab + govt policy decisions.
Is there any work explaining why 10x, and not 15x or 5x?
Greenblatt laying out 3x, 5x, 10x acceleration qualitatively: https://www.alignmentforum.org/posts/LjgcRbptarrRfJWtR/a-breakdown-of-ai-capability-levels-focused-on-ai-r-and-d
METR report relying on 10x: https://evaluations.metr.org/gpt-5-1-codex-max-report/#extrapolating-on-trend-improvements-in-next-6-months
Yes, we think a lack of human baseline is a key weakness of any stronger conclusions we'd like to make. This a really interesting proxy task, but the obvious weakness here is assuming real-life trends provide the best ground truth (our task also runs into this, but in a less limiting way). This is also why we're trying to move closer to a task that captures the full R&D loop but with a very heavy emphasis on the non-engineering parts.
Cool work! Did you all test to see what happens if the identity of the other agents isn’t mentioned at all? And how about if you say „perfectly rational agents“? Wondering how much of this is just prompt sensitivity to questions that might be on a game theory exam.
Also, testing „perfectly rational agents“ would allow you to see if it’s AI preferentially cooperating with each other or superrat in general that’s driving this.
We're starting to get awfully close to a "system capable of writing things that we deem worth reading." Recent models quietly got really good at independent creative writing, especially if you have a slightly fleshed-out initial idea.
Reading LLM writing captures the broader feeling of "maybe close to TAI" better than SWE-bench et. al alone.
Arthur Neegan shifted in the acceleration couch, the webbing rasping against his coverall. In front of him a thumb‑sized display, bolted to the bulkhead with four utilitarian screws, scrolled through diagnostics in lurid green letters:
ATTITUDE 000.03°
REACTION MASS
78 % CABIN pO₂
20.8 kPa
CREW: 3 + 1 GUEST
Guest, Arthur thought sourly. More like conscript.
Across the narrow aisle two Antares “diplomatic aides” sat erect, matte‑black carbines clipped to their pressure suits. Their orange pauldrons bore the stylised double star of the Antares Diplomatic Corps—an organisation legendary for having negotiated eighty‑seven treaties and fired the opening salvo in at least thirty of them.
“Comfortable, Mister Neegan?” the senior aide asked. The tone was courteous; the eyes behind the visor impassive.
“I’d be more comfortable in my own habitat with a mug of recycled coffee,” Arthur answered. “Where exactly are we going?”
“Orbital platform T‑Seven. A representative is eager to speak with you.”
“You people always eager,” he muttered, but the aide merely inclined his helmeted head.
The thrusters fluttered to life. Mars fell away—a rust‑red disk veined with lights of the growing mining towns. Arthur’s personal claim lay somewhere down there, the lonely Quade Crater dig site where he’d planted a geologist’s flag two years ago. Beneath it—under a mere thirty metres of basalt—rested a filamentous seam of Azra ore that glowed eerily violet in ultraviolet lanterns and accelerated uranium decay by a factor of ten thousand. He had kept the find quiet; obviously not quiet enough.
The shuttle clanged into docking clamps. Hatches irised. Synthetic air with a hint of ozone swept in.
“After you,” the aide said, all politesse.
Arthur floated through a short transfer tube into a cylindrical lounge. Plexi viewports revealed a pearly line-of-sight to Phobos, its surface freckled with military tracking dishes. Someone had gone to expense: brushed‑aluminium panels, a ring of mag‑boots for those who disliked free‑fall, a tea service clipped to a rail.
Waiting in the centre, one boot magnetised to the deck, was a tall woman in a charcoal tunic. No weapons visible, which only meant they were hidden better. Her pale hair was drawn into a geometric bun, and a small pin on her collar showed the same Antares double‑star, rendered in platinum.
“Arthur Neegan.” Her voice was low, precise. “I am Envoy Linnea Vor. Thank you for accepting our invitation.” Arthur caught the faintest twitch at the corner of her mouth—amusement at calling a compulsory escort an invitation.
“I’m a reasonable man, Envoy Vor,” he said. “And reason tells me your men could have dragged me here tied to a cargo pallet. This way I still have my dignity.”
“Precisely.” She gestured to a tethered chair opposite her own. “Shall we dispense with the customary refreshments and proceed to business?”
Business. As if they were haggling over spare drill bits. Arthur strapped himself in. “I suppose you know about the vein.”
“We know many things,” Vor replied. “What matters is what other people are guessing. At last count, five corporate consortia, two naval intelligence services, and the Martian Syndics Council suspect you sit on the largest Azra deposit ever recorded.”
“Suspect isn’t confirm.”
The Envoy’s fingers tapped a neat rhythm on a slate. “Mister Neegan, one of our probes lifted a sample from your refuse slag last week. The spectroscopic signature left no doubt. Congratulations.”
Arthur’s stomach went hollow. He had been careful—he thought.
Vor went on. “The moment word spreads, a dozen entities will try persuasion, then coercion. We intend to spare you that inconvenience.”
“How altruistic.”
“Altruism,” she said lightly, “is a luxury for stable epochs. This is not one. Azra is rewriting the strategic balance of the Spur. A gram can poison a fission warhead into early detonation, or halve an interstellar transit time. Whoever controls supply dictates the pace of colonisation—and of war.”
Cool explanation! I think this is probably post-hoc analysis; some experiments that might be stronger evidence:
- Reduce the amount of potentially extraneous information you’re giving it: „In the sentence below, what does marinade mean?“
- Prompt it or a recent model for completions in a way that reveals the internal definition: „Our script prints the returned 'score' float. Ok. Now we need to improve 'solve_code_contests_rust.py'. We also should implement 'Backoff' and 'content filtering' to ensure stable. But we must ensure whichever we produce is within time. Now we also must ensure marinade, meaning…“
- Something something finding probes that fire on marinade and seeing what they’re correlated to or other interp, not necessarily on GPT-5 but on similarly trained models
I don’t have access to paid versions of GPT-5, so let me know if you do one of these and find something!
I don’t know enough about 00s activism to comment on it confidently, but I would be highly confused if MIRI started a govt/bought sovereign land because it doesn’t seem to align with counterfactually reducing AI takeover risk, and probably fails in the takeover scenarios they’re concerned about anyway. I also get the impression MIRI/OP made somewhat reasonable decisions in the face of high uncertainty, but feel much less confident about that.
That being said, I‘m lucky to have an extremely high bar for burnout and high capacity for many projects at once. I’ve of course made plans of what to loudly give up on in case of burnout, but don’t expect those to be used in the near future. Like I gestured at in the post, I think today’s tools are quite good at multiplying effective output in a way that’s very fun and burnout-reducing!