LESSWRONG
LW

Quinn — LessWrong

I don't have much to add. Just wanted to say hi and say that my AI security strategy routes mostly through secure program synthesis (which I sometimes abbreviate as SPS), but I'm focused on formal verification. And the fact that it appears 12 CVEs in OpenSSL (i.e., probably the coolest thing going on in SPS right now) happened without a single formal method ^[1] feels like it should make me drastically alter my plans.

i don't know for sure that there is zero "formal methods" in the pipeline, but it seems that way from what's been said. ↩︎

Replying toHow to Hire a Team

Quinn12d

How to Hire a Team

W9 work seems to be gaining in popularity, I think possibly for this reason.

(W9 is the USA tax form for "independent contractor", as opposed to W2 which has a slightly(?) tougher compliance burden about how to go about firing) (there are other words for this in other jurisdictions, probably?)

Quinn22dQuick Take

Me: was chatting a bunch with Gemini and Claude the past week or two about flops accounting, and shipped this prototype for software-side estimation via the CuTe layouts that i don't expect to be that useful in an adversarial setting but i'm open minded and just learning. I'm not gonna do this for Apart's hackathon next weekend, but someone could do some of these basic exercises involving hardware-level flops accounting primitives and related attacks, given simply a gamer laptop.

Gemini:

Hackathon: Adversarial FLOPs Accounting & SAGE Defense

Scope: Weekend / Gamer Laptop (NVIDIA RTX Series) Core Theme: Measuring, bypassing, and defending GPU compute integrity.

🛠 Prerequisites

Hardware: NVIDIA GPU (Ampere/RTX 30-series or newer preferred).
Environment: Linux (WSL2 works) or Windows

... (read 372 more words →)

-1

Quinn25dQuick Take

I want to do a full post on "taking ownership of a niche" and against "if you're good at something never do it for free", and that'll come later I hope. Today, I just wanted to let you know that Gemini had this banger quote when I was consolidating my notes on this topic:

ownership is bought with the "inefficient" hours you spend doing what you know is right before anyone else is smart enough to pay you for it.

Lies, Damned Lies, and Proofs: Formal Methods are not Slopless

Quinn

Quinn, Max von Hippel

1mo

We appreciate comments from Christopher Henson, Zeke Medley, Ankit Kumar, and Pete Manolios. This post was initialized by Max’s twitter thread.

Introduction

There's been a lot of chatter recently on HN and elsewhere about how formal verification is the obvious use-case for AI. While we broadly agree, we think much of the discourse is kinda wrong because it incorrectly presumes formal = slopless.^[1]Over the years, we have written our fair share of good and bad formal code. In this post, we hope to convince you that formal code can be sloppy, and that this has serious implications for anyone who hopes to bootstrap superintelligence by using formality to reinforce “good” reasoning.

A mainstay on the... (read 2041 more words →)

101

•••

Quinn1moQuick Take

Effortposts I keep not getting around to

the ROI of specialization is dominated by a term that's uncorrelated with the upsides what you specifically choose to specialize in
steganography-free certificates: review no-go theorems from information theory in the general case, inspect how onerous the assumptions you need to bound steg capacity are. Is real life a special case where the no-go theorems don't apply?
taelin, lafont, higher order computing, and AI safety. some logic programming module (which is performant and parallel for complicated linear logic reasons) enables a hybrid architecture to supercharge AI capabilities, does this architecture have greater or fewer natural safety properties than transformer+RL?

Replying toBeliefs about formal methods and AI safety

Quinn1mo

Beliefs about formal methods and AI safety

A horizon fellow told me last night that "predeployment and postdeployment" are the actual words i should be using instead of compiletime and runtime.

Quinn1moQuick Take

(Finally read If Anyone Builds It): the fable about defensive acceleration in biotech spooked me pretty good, insofar as I think synthesizing an SL5 grade cloud stack is a good idea. This idea of "we think we're doing monotonic defensive acceleration, and in a very real sense we actually are, but nevertheless the gameboard inexorably marches toward Everyone Dies routing through that very defensive acceleration" could soooooo easily be applied to cybersecurity.

Quinn1moQuick Take

step-function research and monotonic research.

Some agendas are step-function. Maybe interuniversal teichmuller theory is an example, if it reaches the threshold where it solves ABC then it pays out, and if it doesn't its payout is approximately zero ^[1] . Something something MIRI's logic team ^[2] .

Other agendas are monotonic. The more you do, the more you pay out. I think what I do, program synthesis security (which Mike Dodds calls "scalable formal oversight") for defensive acceleration, is like this. I might be able to reduce some monotonic function of 40% of the attack surface by doing 40% of my job, its better to do more but not a waste of time to do less. This is related to, but distinct from, what John and David sometimes talk about re multi-theory-of-change robustness (i.e. that robustness can be a route toward an agenda being monotonic).

Not sure how to reason about "linear vs superlinear vs sublinear" here. How would you even tell?

caveat, mathematicians don't make a habit of making bets on which tool stacks will prove useful for what purposes. ITT may or may not end up helping with something other than the ABC conjecture (which it failed at), MIRI's logic team outputs may end up (and, IMO at least in terms of my own epistemic development, have) helping with something other than solving AGI alignment. ↩︎
tonal clarification, I have a deep respect for the MIRI logic team, I think they displayed both courage and intellectual prowess that are aeons beyond me. I wouldn't call them out here if I didn't think many of them agree that what they were trying to do was mostly step-function. ↩︎

Replying toToss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

Quinn1mo

Toss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

Thanks for your reply

$1k today.

•••

Replying toToss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

Quinn1mo

Toss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

I'm confused about how I relate to this personally.

I've been it feels almost wildly successful lately, or at least more successful than I thought I'd be, and its all downstream of a paper. And I'm not sure how that paper exists without Lighthaven (met coauthor as a guest/office visitor during a MATS semester, and it was at Lighthaven that Buck argued that we should pivot from novel architecture proposal to eval). To say nothing of what I owe to LessWrong! I really owe a lot to you guys, intellectually and emotionally but also you could operationalize it as literal cash. It would be a grievous free rider problem for me to donate... (read more)

Can We Secure AI With Formal Methods? November-December 2025

Quinn

3mo

We did the rebrand! The previous thumbnail was a baseball metaphor, but it was very clearly someone getting out, not safe. I was testing all of you and each of you FAILED.

Here’s the prompt for the new thumbnail:

Can We Secure AI With Formal Methods? is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

i’m keeping AI in a box, doing AI CConfinement (like in yampolskiy 2012), using formal verification / formal methods. That’s my whole thing. I need art for my newsletter on these topics. I like the percival story from troyes/wagner and i like tolkien, but if you take from those elements

... (read 3191 more words →)

is there a quick link I can point someone to if they don't speak Berkelese and I want to say "bayes points"?

Formal confinement prototype

Quinn

3mo

This whitepaper was a preliminary approach to using proof certificates in a secure program synthesis protocol last summer.

Abstract

We would like to put the AI in a box. We show how to create an interface between the box and the world out of specifications in Lean. It is the AI's responsibility to provide a proof that its (restricted) output abides by the spec. The runnable prototype is at https://github.com/for-all-dev/formal-confinement.

Related Work (excerpt)

In the seminal "A note on the confinement problem", Lampson 1973 states confinement rules to reason about when a program is or isn't confined.

Total isolation or transitivity: either it does not call any other program or if it calls another program that program

... (read 155 more words →)

Please Measure Verification Burden

Quinn

3mo

I appreciate Theodore Ehrenborg's and Max von Hippel's comments.

Introduction

In beliefs about formal methods and AI safety, we established that formal methods is a source of swiss cheese and is useful in boxing/interfaces. A core premise of the AI control literature is that the blue team is computationally poorer than the red team^[1]. There wouldn’t be much of an interesting problem if you could spend as much on audits (whether that’s $/mtoken at inference time, raw token count at inference time, or any other notion of compute) as the untrusted model spends.

As a proof engineer and FMxAI/GSAI movement strategist, it’s important to me that formal methodsicians are presenting their work in a way... (read 957 more words →)

sorry about the spam from my profile. the automated RSS ingest freaked out when I changed my substack domain and sent all those duplicates.

Social graph density leads to millions of acquaintances and few close friends, because you don’t need to treasure each other

Beliefs about formal methods and AI safety

Quinn

4mo

I appreciate Theodore Ehrenborg's comments.

As a wee lad, I heard about mathematical certainty of computer programs. Let’s go over what I currently believe and don’t believe.

First: what is formal verification

Sometimes you get pwned because of the spec-implementation gap. The computer did not do what it should’ve done. Other times, you get pwned by the world-spec gap. The computer wasn’t wrong, your “shoulds” were.

Expanding the domain of compiletime knowledge

A compiler tells you the problem with your code when it is, in some sense, “wrong”. When you can define the sense in which your code can be “wrong”, you have circumscribed some domain of compiletime knowledge. In other words, you’ve characterized the kinds of things you... (read 1411 more words →)

July-October 2025 Progress in Guaranteed Safe AI

Quinn

4mo

Yall, I really do apologize for radio silence. It has mostly to do with breaking my ankle in three places, but I’m walking again.

This edition of the newsletter looks a bit more like movement happenings and announcements, which isn’t to say that there weren’t more papers or technical results I could’ve included, just that my mind wasn’t on them over the summer. I feel like I should be working on strategic clarity right now! Watch this space, etc.

Verilib launch/demo on the 23rd (SOON)

The flagship product of the Beneficial AI Foundation is publicly launching! More on this in next edition of the newsletter.

Theorem blog post

I remain turbo impressed with Theorem’s tech. Formal methods... (read 1824 more words →)

•••

epistemic status: jotting down casually to make sure I have a specific "I Told You So" hyperlink in a year or so

I think 9-10 figures have gone into math automation in 2025, across VC, philanthropy, and a percentage of frontier company expenditure (though if we want to look at the latter, a proper fermstimate would I think get much wider than if you were just counting up all the VC bucks). In the startup case, it looks an awful lot like creating press releases to attract funding and talent, with not a lot of product clarity.

I have been guilty in the past of saying "something something curryhoward" is a good reason to... (read more)

Grizzly Man screening, tacos, carlsmith discussion

Quinn

7mo

Please read any subset of Carlsmith's sequence "otherness and control in the age of AGI" https://www.lesswrong.com/s/BbAvHtorCZqp97X9W We will watch Herzog's Grizzly Man and I'll make vegan tacos.

Movie rolls promptly at 7p. Discussion roughly 9-10p but I wont be jonesin to kick everyone out right at 10.

Consider bringing your favorite seltzer, financial contribution to the tacos, or a desert if you like--- or not. I dont really care.

May-June 2025 Progress in Guaranteed Safe AI

Quinn

8mo

There will be a AIxFM conference in the Bay Area Q4, according to a little birdie.

Morph ships big autoformalization result in 3599 lines of Lean

They have human decomposition in the latex/lean blueprint, into 67 lemmas with human spotchecking. Still, I’m impressed with their system (called Trinity).

I’d like to know how expensive (in tokens, or some other compute metric) it was to do this!

On Verified Superintelligence

I of course have opinions on their blog post Verified Superintelligence.

Today's most advanced AI systems—reasoning LLMs trained with supervised RL—have hit a fundamental wall. They can only improve on problems where we can verify the (known) answers. Every math problem needs a known solution. Every coding challenge requires

... (read 1196 more words →)

i'm hearing the new movie "the mountainhead' has thinly veiled musk, altman characters. can anyone confirm or offer takes? I might watch it.

Trouble at Miningtown: Prologue

Quinn

10mo

In late 2019 I wrote a TTRPG.

The theme was alien sentience (or perhaps sapience is the technical term), both "organic" (extra terrestrial) and "artificial" (AI/robots).

This is the prologue that kicks off play.

I found it really fascinating to look back on late 2019 and how I was thinking about some of these topics.

Earthdate March 2022. Two months ago Corporation installed Miningtown in an asteroid cluster n kilometers away. Staffed by trillions of agents, most of whom produced in an early stage of the installation, Miningtown was to be a productive zone free of sentience, free of distraction from the one goal, which was rocks. Rocks, and more rocks, sent back to earth on

... (read 1055 more words →)

March-April 2025 Progress in Guaranteed Safe AI

Quinn

10mo

Say hi at ICSE in Ottawa, I’ll be at the reception Thursday, this colocated event on Friday, and the LLM4Code workshop on Saturday.

As usual there are no benefits to the paid subscription.

Sorry for consolidating two months into one post again after I said I wouldn’t.

Subscribe now

Fermstimate of the cost of patching all security relevant open source software

Niplav writes

So, a proposal: Whenever someone claims that LLMs will d/acc us out of AI takeover by fixing our infrastructure, they will also have to specify who will pay the costs of setting up this project and running it.

I’m almost centrally the guy claiming LLMs will d/acc us out of AI takeover by fixing... (read 1043 more words →)

Lies, Damned Lies, and Proofs: Formal Methods are not Slopless

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

Takeaways from the Intelligence Rising RPG

Announcing the Technical AI Safety Podcast

Quinn

Lies, Damned Lies, and Proofs: Formal Methods are not Slopless

Can We Secure AI With Formal Methods? November-December 2025

Formal confinement prototype

Please Measure Verification Burden

Beliefs about formal methods and AI safety

July-October 2025 Progress in Guaranteed Safe AI

May-June 2025 Progress in Guaranteed Safe AI

Cruxes for AI Control via Proof Carrying Code at the End of 2025

Lies, Damned Lies, and Proofs: Formal Methods are not Slopless

Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof

Takeaways from the Intelligence Rising RPG

Announcing the Technical AI Safety Podcast

Quinn

Lies, Damned Lies, and Proofs: Formal Methods are not Slopless

Can We Secure AI With Formal Methods? November-December 2025

Formal confinement prototype

Please Measure Verification Burden

Beliefs about formal methods and AI safety

July-October 2025 Progress in Guaranteed Safe AI

May-June 2025 Progress in Guaranteed Safe AI

Cruxes for AI Control via Proof Carrying Code at the End of 2025

Hackathon: Adversarial FLOPs Accounting & SAGE Defense

🛠 Prerequisites

Introduction

Effortposts I keep not getting around to

step-function research and monotonic research.

Abstract

Related Work (excerpt)

Introduction

First: what is formal verification

Expanding the domain of compiletime knowledge

Verilib launch/demo on the 23rd (SOON)

On Verified Superintelligence