Technical staff at Anthropic, previously #3ainstitute; interdisciplinary, interested in everything; ongoing PhD in CS (learning / testing / verification), open sourcerer, more at zhd.dev
It seems strange to create something so far beyond ourselves, and have its values be ultimately that of a child or a servant.
Would you say the same of a steam engine, or Stockfish, or Mathematica? All of those vastly exceed human performance in various ways!
I don't see much reason to think that very very capable AI systems are necessarily personlike or conscious, or have something-it-is-like-to-be-them - even if we imagine that they are designed and/or trained to behave in ways compatible with and promoting of human values and flourishing. Of course if an AI system does have these things I would also consider it a moral patient, but I'd prefer that our AI systems just aren't moral patients until humanity has sorted out a lot more of our confusions.
See Ends Don't Justify Means (Among Humans) for the standard consequentialist rejection of this view.
On 'violet teaming': I think this phrase is a careless analogy which drags attention in an unhelpful direction, and dislike the way the phrase analogizes this to red-teaming - they're actually very different kinds of activities and occur at different levels of abstraction / system-scope and use different skills.
Working to make institutions etc. more resilient seems great to me, but I wouldn't want to assume that using the same technology is necessarily the best way to do that. Set up a great 'resilience incubator' and let them use whatever tools and approaches they want!
It's also quite plausible to me that carefully prompted language models, with a few dozen carefully explained examples and detailed instructions on the decision criteria, would do a good job at this specific moderation task. Less clear what the payoff period of such an investment would be so I'm not actually recommending it, but it's an option worth considering IMO.
As Jack notes here, the Policy team was omitted for brevity and focus. You can read that comment for more about the Policy team, including how we aim to give impartial, technically informed advice and share insights with policymakers.
(Zac's note: I'm posting this on behalf of Jack Clark, who is unfortunately unwell today. Everything below is his words.)
Hi there, I’m Jack and I lead our policy team. The primary reason it’s not discussed in the post is that the post was already quite long and we wanted to keep the focus on safety - I did some help editing bits of the post and couldn’t figure out a way to shoehorn in stuff about policy without it feeling inelegant / orthogonal.
You do, however, raise a good point, in that we haven’t spent much time publicly explaining what we’re up to as a team. One of my goals for 2023 is to do a long writeup here. But since you asked, here’s some information:
You can generally think of the Anthropic policy team as doing three primary things:
More broadly, we try to be transparent on the micro level, but haven’t invested yet in being transparent on the macro. What I mean by that is many of our RFIs, talks, and ideas are public, but we haven’t yet done a single writeup that gives an overview of our work. I am hoping to do this with the team this year!
Some other desiderata that may be useful:
Our wonderful colleagues on the ‘Societal Impacts’ team led this work on Red Teaming and we (Policy) helped out on the paper and some of the research. We generally think red teaming is a great idea to push to policymakers re AI systems; it’s one of those things that is ‘shovel ready’ for the systems of today but, we think, has some decent chance of helping out in future with increasingly large models.
"AI is more dangerous the more different it is from us" seems wrong to me: it is very different and likely to be very dangerous, but that doesn't imply that making it somewhat more like us would make it less dangerous. I don't think brain emulation can be developed in time, replaying evolution seems unhelpful to me, and both seem likely to cause enormous suffering (aka mindcrime).
See my colleague Ethan Perez's comment here on upcoming research, including studying situational awareness as a risk factor for deceptive misalignment.
Relatedly, have you considered organizing the company as a Public Benefit Corporation, so that the mission and impact is legally protected alongside shareholder interests?
Re:
Note that only one of Meta's recent releases have been open-source.
I'm an open-source maintainer myself, though not an absolutist or convinced that eg Llama should have been open-sourced. I do however find it pretty frustrating when these models are incorrectly described as open source (including by Yann LeCun, who ought to know better). As is, we collectively get many of the research benefits, all the misuse, but very little of the commercial innovation or neat product improvements that open-sourcing would bring.