Zac Hatfield-Dodds

Technical staff at Anthropic, previously #3ainstitute; interdisciplinary, interested in everything; ongoing PhD in CS (learning / testing / verification), open sourcerer, more at zhd.dev

Posts

Sorted by New

2Zac Hatfield Dodds's Shortform

Ω

3y

Ω

3

119Simple probes can catch sleeper agents

Ω

25d

Ω

17

11Third-party testing as a key ingredient of AI policy

Ω

2mo

Ω

1

85Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy

Ω

7mo

Ω

1

286Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Ω

7mo

Ω

21

90Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust

Ω

8mo

Ω

23

181Anthropic's Core Views on AI Safety

Ω

1y

Ω

39

101Concrete Reasons for Hope about AI

Ω

1y

Ω

13

35In Defence of Spock

3y

5

2Zac Hatfield Dodds's Shortform

Ω

3y

Ω

3

Wiki Contributions

Comments

AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data

Zac Hatfield-Dodds2d20

While some companies, such as OpenAI and Anthropic, have publicly advocated for AI regulation, Time reports that in closed-door meetings, these same companies "tend to advocate for very permissive or voluntary regulations."

I think that dropping the intermediate text which describes 'more established big tech companies' such as Microsoft substantially changes the meaning of this quote - "these same companies" is not "OpenAI and Anthropic". Full context:

Executives from the newer companies that have developed the most advanced AI models, such as OpenAI CEO Sam Altman and Anthropic CEO Dario Amodei, have called for regulation when testifying at hearings and attending Insight Forums. Executives from the more established big technology companies have made similar statements. For example, Microsoft vice chair and president Brad Smith has called for a federal licensing regime and a new agency to regulate powerful AI platforms. Both the newer AI firms and the more established tech giants signed White House-organized voluntary commitments aimed at mitigating the risks posed by AI systems. But in closed door meetings with Congressional offices, the same companies are often less supportive of certain regulatory approaches

AI lab watch makes it easy to get some background information by comparing committments made by OpenAI, Anthropic, Microsoft, and some other established big tech companies.

Reply

"Open Source AI" is a lie, but it doesn't have to be

Zac Hatfield-Dodds6d40

Meta’s Llama3 model is also *not *open source, despite the Chief AI Scientist at the company, Yann LeCun, frequently proclaiming that it is.

This is particularly annoying because he knows better: the latter two of those three tweets are from January 2024, and here's video of his testimony under oath in September 2023: "the Llama system was not made open-source".

Reply

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Zac Hatfield-Dodds23d20

It's a sparse autoencoder because part of the loss function is an L1 penalty encouraging sparsity in the hidden layer. Otherwise, it would indeed learn a simple identity map!

Reply

Scenario planning for AI x-risk

Zac Hatfield-Dodds25d20

Tom Davidson's work on a compute-centric framework for takeoff speed is excellent, IMO.

Reply

What is the best way to talk about probabilities you expect to change with evidence/experiments?

Zac Hatfield-Dodds1mo20

you CAN predict that there will be evidence with equal probability of each direction.

More precisely the expected value of upwards and downwards updates should be the same; it's nonetheless possible to be very confident that you'll update in a particular direction - offset by a much larger and proportionately less likely update in the other.

For example, I have some chance of winning. lottery this year, not much lower than if I actually bought a ticket. I'm very confident that each day I'll give somewhat lower odds (as there's less time remaining), but being credibly informed that I've won would radically change the odds such that the expectation balances out.

Reply

Tamsin Leake's Shortform

Zac Hatfield-Dodds2moΩ92611

I agree that there's no substitute for thinking about this for yourself, but I think that morally or socially counting "spending thousands of dollars on yourself, an AI researcher" as a donation would be an apalling norm. There are already far too many unmanaged conflicts of interest and trust-me-it's-good funding arrangements in this space for me, and I think it leads to poor epistemic norms as well as social and organizational dysfunction. I think it's very easy for donating to people or organizations in your social circle to have substantial negative expected value.

I'm glad that funding for AI safety projects exists, but the >10% of my income I donate will continue going to GiveWell.

Reply

2

1

'Empiricism!' as Anti-Epistemology

Zac Hatfield-Dodds2mo62

Trivially true to the extent that you are about equally likely to observe a thing throughout that timespan; and the Lindy Effect is at least regularly talked of.

But there are classes of observations for which this is systematically wrong: for example, most people who see a ship part-way through a voyage will do so while it's either departing or arriving in port. Investment schemes are just such a class, because markets are usually up to the task of consuming alpha and tend to be better when the idea is widely known - even Buffett's returns have oscillated around the index over the last few years!

Reply

Is anyone working on formally verified AI toolchains?

Answer by Zac Hatfield-DoddsMar 13, 2024122

Safety properties aren't the kind of properties you can prove; they're statements about the world, not about mathematical objects. I very strongly encourage anyone reading this comment to go read Leveson's Engineering a Safer World (free pdf from author) through to the end of chapter three - it's the best introduction to systems safety that I know of and a standard reference for anyone working with life-critical systems. how.complexsystems.fail is the short-and-quotable catechism.

I'm not really sure what you mean by "AI toolchain", nor what threat model would have a race-condition present an existential risk. More generally, formal verification is a research topic - there's some neat demonstration systems and they're used in certain niches with relatively small amounts of code and compute, simple hardware, and where high development times are acceptable. None of those are true of AI systems, or even libraries such as Pytorch.

For flavor, some of the most exciting developments in formal methods: I expect the Lean FRO to improve usability, and 'autoformalization' tricks like Proofster (pdf) might also help - but it's still niche, and "proven correct" software can still have bugs from under-specified components, incorrect axioms, or outright hardware issues (e.g. Spectre, Rowhammer, cosmic rays, etc.). The seL4 microkernel is great, but you still have to supply an operating system and application layer, and then ensure the composition is still safe. To test an entire application stack, I'd instead turn to Antithesis, which is amazing so long as you can run everything in an x86 hypervisor (with no GPUs).

(as always, opinions my own)

Reply

1

philh's Shortform

Zac Hatfield-Dodds2mo20

I think he's actually quite confused here - I imagine saying

Hang on - you say that (a) we can think, and (b) we are the instantiations of any number of computer programs. Wouldn't instantiating one of those programs be a sufficient condition of understanding? Surely if two things are isomorphic even in their implementation, either both can think, or neither.

(the Turing test suggests 'indistinguishable in input/output behaviour', which I think is much too weak)

Reply

Decaeneus's Shortform

Zac Hatfield-Dodds3mo20

See e.g. https://mschloegel.me/paper/schloegel2024sokfuzzevals.pdf

Fuzzing is a generally pretty healthy subfield, but even there most peer-reviewed papers in top venues are still are completely useless! Importantly, "a 'working' github repo" is really not enough to ensure that your results are reproducible, let alone ensure external validity.

Reply