Zac Hatfield-Dodds

Technical staff at Anthropic, previously #3ainstitute; interdisciplinary, interested in everything; ongoing PhD in CS (learning / testing / verification), open sourcerer, more at zhd.dev

Wiki Contributions

Comments

Re:

Meta’s commitment to open source

Note that only one of Meta's recent releases have been open-source.

  • For Llama, the code is available under GPL-3 (ie open source), but the model itself and weights are under a bespoke non-commercial license.
  • Segment Anything is truly open source under the Apache 2.0 license.
  • Massively Multilingual Speech is CC-BY-NC, a standard but noncommercial and thus non-open-source license.

I'm an open-source maintainer myself, though not an absolutist or convinced that eg Llama should have been open-sourced. I do however find it pretty frustrating when these models are incorrectly described as open source (including by Yann LeCun, who ought to know better). As is, we collectively get many of the research benefits, all the misuse, but very little of the commercial innovation or neat product improvements that open-sourcing would bring.

It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant.

Would you say the same of a steam engine, or Stockfish, or Mathematica? All of those vastly exceed human performance in various ways!

I don't see much reason to think that very very capable AI systems are necessarily personlike or conscious, or have something-it-is-like-to-be-them - even if we imagine that they are designed and/or trained to behave in ways compatible with and promoting of human values and flourishing. Of course if an AI system does have these things I would also consider it a moral patient, but I'd prefer that our AI systems just aren't moral patients until humanity has sorted out a lot more of our confusions.

I think you're looking for Solomonoff Induction, which is the first half of AIXI.

On 'violet teaming': I think this phrase is a careless analogy which drags attention in an unhelpful direction, and dislike the way the phrase analogizes this to red-teaming - they're actually very different kinds of activities and occur at different levels of abstraction / system-scope and use different skills.

Working to make institutions etc. more resilient seems great to me, but I wouldn't want to assume that using the same technology is necessarily the best way to do that. Set up a great 'resilience incubator' and let them use whatever tools and approaches they want!

It's also quite plausible to me that carefully prompted language models, with a few dozen carefully explained examples and detailed instructions on the decision criteria, would do a good job at this specific moderation task. Less clear what the payoff period of such an investment would be so I'm not actually recommending it, but it's an option worth considering IMO.

As Jack notes here, the Policy team was omitted for brevity and focus. You can read that comment for more about the Policy team, including how we aim to give impartial, technically informed advice and share insights with policymakers.

(Zac's note: I'm posting this on behalf of Jack Clark, who is unfortunately unwell today.  Everything below is his words.)

Hi there, I’m Jack and I lead our policy team. The primary reason it’s not discussed in the post is that the post was already quite long and we wanted to keep the focus on safety - I did some help editing bits of the post and couldn’t figure out a way to shoehorn in stuff about policy without it feeling inelegant / orthogonal.

You do, however, raise a good point, in that we haven’t spent much time publicly explaining what we’re up to as a team. One of my goals for 2023 is to do a long writeup here. But since you asked, here’s some information:

You can generally think of the Anthropic policy team as doing three primary things:

  1. Trying to proactively educate policymakers about the scaling trends of AI systems and their relation to safety. Myself and my colleague Deep Ganguli (Societal Impacts) basically co-wrote this paper https://arxiv.org/abs/2202.07785 - you can think of us as generally briefing out a lot of the narrative in here.
  2. Pushing a few specific things that we care about. We think evals/measures for safety of AI systems aren’t very good [Zac: i.e. should be improved!], so we’ve spent a lot of time engaging with NIST’s ‘Risk Management Framework’ for AI systems as a way to create more useful policy institutions here - while we expect labs in private sector and academia will do much of this research, NIST is one of the best institutions to take these insights and a) standardize some of them and b) circulate insights across government. We’ve also spent time on the National AI Research Resource as we see it as a path to have a larger set of people do safety-oriented analysis of increasingly large models.
  3. Responding to interest. An increasing amount of our work is reactive (huge uptick in interest in past few months since launch of ChatGPT). By reactive I mean that policymakers reach out to us and ask for our thoughts on things. We generally aim to give impartial, technically informed advice, including pointing out things that aren’t favorable to Anthropic to point out (like emphasizing the very significant safety concerns with large models). We do this because a) we think we’re well positioned to give policymakers good information and b) as the stakes get higher, we expect policymakers will tend to put higher weight on the advice of labs which ‘showed up’ for them before it was strategic to do so. Therefore we tend to spend a lot of time doing a lot of meetings to help out policymakers, no matter how ‘important’ they or their country/organization are - we basically ignore hierarchy and try to satisfy all requests that come in at this stage.

More broadly, we try to be transparent on the micro level, but haven’t invested yet in being transparent on the macro. What I mean by that is many of our RFIs, talks, and ideas are public, but we haven’t yet done a single writeup that gives an overview of our work. I am hoping to do this with the team this year!

Some other desiderata that may be useful:

Our wonderful colleagues on the ‘Societal Impacts’ team led this work on Red Teaming and we (Policy) helped out on the paper and some of the research. We generally think red teaming is a great idea to push to policymakers re AI systems; it’s one of those things that is ‘shovel ready’ for the systems of today but, we think, has some decent chance of helping out in future with increasingly large models.

  1. "AI is more dangerous the more different it is from us" seems wrong to me: it is very different and likely to be very dangerous, but that doesn't imply that making it somewhat more like us would make it less dangerous. I don't think brain emulation can be developed in time, replaying evolution seems unhelpful to me, and both seem likely to cause enormous suffering (aka mindcrime).

  2. See my colleague Ethan Perez's comment here on upcoming research, including studying situational awareness as a risk factor for deceptive misalignment.

Relatedly, have you considered organizing the company as a Public Benefit Corporation, so that the mission and impact is legally protected alongside shareholder interests?

Load More