LESSWRONG
LW

2220
Zac Hatfield-Dodds
3230Ω398104070
Message
Dialogue
Subscribe

Technical staff at Anthropic (views my own), previously #3ainstitute; interdisciplinary, interested in everything, ongoing PhD in CS, bets tax bullshit, open sourcerer, more at zhd.dev

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Zac Hatfield Dodds's Shortform
Ω
5y
Ω
13
Anthropic's leading researchers acted as moderate accelerationists
Zac Hatfield-Dodds9d4-5

The Time article is materially wrong about a bunch of stuff - for example, there is a large difference between incentives and duties; all board members have the same duties but LTBT appointees are likely to have a very different equity stake to whoever is in the CEO board seat.

I really don't want to get into pedantic details, but there's no "supposed to" time for LTBT board appointments, I think you're counting from the first day they were legally able to appoint someone. Also https://www.anthropic.com/company lists five board members out of five seats, and four Trustees out of a maximum five. IMO it's fine to take a few months to make sure you've found the right person!


More broadly, the corporate governance discussions (not just about Anthropic) I see on LessWrong and in the EA community are very deeply frustrating, because almost nobody seems to understand how these structures normally function or why they're designed that way or the failure modes that occur in practise. Personally, I spent about a decade serving on nonprofit boards, oversight committes which appointed nonprofit boards, and set up the goverance for a for-profit company I founded.

I know we love first-principles thinking around here, but this is a domain with an enormous depth of practice, crystalized from long experience of (often) very smart people in sometimes-adversarial situations.

In any case, I think I'm done with this thread.

Reply1
Anthropic's leading researchers acted as moderate accelerationists
Zac Hatfield-Dodds9d9-17

I think it is simply false that Anthropic leadership (excluding the LTB Trustees) have control over board appointments. You may argue they have influence, to the extent that the Trustees defer to their impressions or trust their advice, but formal control of the board is a very different thing. The class T shares held by the LTBT are entitled to appoint a majority of the board, and that cannot change without the approval of the LTBT.[1]

Delaware law gives the board of a PBC substantial discretion in how they should balance shareholder profits, impacts on the public, and the mission of the organization. Again, I trust current leadership, but think it is extremely important that there is a legally and practically binding mechanism to avoid that balance being set increasingly towards shareholders rather than the long-term benefit of humanity - even as the years go by, financial stakes rise, and new people take leadership roles.

In addition to appointing a majority of the board, the LTBT is consulted on RSP policy changes (ultimately approved by the LTBT-controlled board), and they receive Capability Reports and Safeguards Reports before the company moves forward with a model release. IMO it's pretty reasonable to call this meaningful oversight - the LTBT is a backstop to ensure that the company continues to prioritize the mission rather than a day-to-day management group, and I haven't seen any problems with that.


  1. or making some extremely difficult amendments to the Trust arrangements; you can read Anthropic's certificate of incorporation for details. I'm not linking to it here though, because the commentary I've seen here previously has misunderstood basic parts like "who has what kind of shares" pretty badly. ↩︎

Reply1
Anthropic's leading researchers acted as moderate accelerationists
Zac Hatfield-Dodds10d410

These are personal committments which I wrote down before I joined, or when the topic (e.g. RSP and LTBT) arose later. Some are 'hard' lines (if $event happens); others are 'soft' (if in my best judgement ...) and may say something about the basis for that judgement - most obviously that I won't count my pay or pledged donations as a reason to avoid leaving or speaking out.

I'm not comfortable giving a full or exact list (cf), but a sample of things that would lead me to quit:

  • If I thought that Anthropic was on net bad for the world.
  • If the LTBT was abolished without a good replacement.
  • Severe or willful violation of our RSP, or misleading the public about it.
  • Losing trust in the integrity of leadership.
Reply2
Anthropic's leading researchers acted as moderate accelerationists
Zac Hatfield-Dodds11d10410

I joined Anthropic in 2021 because I thought it was an extraordinarily good way to help make AI go well for humanity, and I have continued to think so. If that changed, or if any of my written lines were crossed, I'd quit.

I think many of the factual claims in this essay are wrong (for example, neither Karen Hao nor Max Tegmark are in my experience reliable sources on Anthropic); we also seem to disagree on more basic questions like "has Anthropic published any important safety and interpretability research", and whether commercial success could be part of a good AI Safety strategy. Overall this essay feels sufficiently one-sided and uncharitable that I don't really have much to say beyond "I strongly disagree, and would have quit and spoken out years ago otherwise".

I regret that I don't have the time or energy for a more detailed response, but thought it was worth noting the bare fact that I have detailed views on these issues (including a lot of non-public information) and still strongly disagree.

Reply333
Against "Model Welfare" in 2025
Zac Hatfield-Dodds16d40

I recommend carefully reading Taking AI Welfare Seriously; it seems to me that you're arguing against a position which I haven't seen anyone arguing for.

Reply
Open weights != Open source
Zac Hatfield-Dodds1mo118

No, the preferred form for modifying a model is a copy of the weights, plus open source code for training and inference. "Training a similar model from scratch" is wildly more expensive and less convenient, and not even modification!

If the model weights are available under an OSI-approved open source license, and so is code suitable for fine-tuning, I consider the model to be open source. Llama models definitely aren't; most Chinese models are.

Reply
leogao's Shortform
Zac Hatfield-Dodds2mo70

like imagine if "pter" were a single character in words like helicopter and pterodactyl both contain "pter", but you'd probably think of "helicopter" as an atomic unit with its own unique identity

I often do chunk them, but if you've picked up a bit of taxonomic Greek pter means 'wing', so we have helico-pter 'spiral/rotating wing' and ptero-dactyl 'wing fingers' - both cases where breaking down the name tells you something about what the things are!

Reply1
AI #124: Grokless Interlude
Zac Hatfield-Dodds2mo51

it would be very good if the main chat services like ChatGPT, Claude and Gemini offered branching (or cloning) and undoing within chats, so you can experiment with different continuations.

claude.ai does! If you edit one of your messages, it creates a branch and you can go back and forth between them, or even continue in parallel using multiple browser tabs.

Reply
The next wave of model improvements will be due to data quality
Zac Hatfield-Dodds3mo20

Of course Google and Anthropic have their own version of these features that will provide them with data as well.

I think this is substantially wrong about Anthropic; see e.g. here.

Reply
Histograms are to CDFs as calibration plots are to...
Zac Hatfield-Dodds3mo72
  • I like the idea, but with n>100 points a histogram seems better, and for few points it's hard to draw conclusions. e.g., I can't work out an interpretation of the stdev lines that I find helpful.
  • I'd make the starting point p=0.5, and use logits for the x-axis; that's a more natural representation of probability to me. Optionally reflect p<0.5 about the y-axis to represent the symmetry of predicting likely things will happen vs unlikely things won't.
Reply
Load More
No wikitag contributions to display.
38Anthropic's updated Responsible Scaling Policy
Ω
11mo
Ω
3
30Anthropic: Reflections on our Responsible Scaling Policy
1y
21
133Simple probes can catch sleeper agents
Ω
1y
Ω
21
11Third-party testing as a key ingredient of AI policy
Ω
1y
Ω
1
83Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Ω
2y
Ω
1
289Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Ω
2y
Ω
22
85Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Ω
2y
Ω
26
173Anthropic's Core Views on AI Safety
Ω
3y
Ω
39
94Concrete Reasons for Hope about AI
Ω
3y
Ω
13
37In Defence of Spock
4y
5
Load More