In response to Habryka's shortform, I can confirm that I signed a concealed non-disparagement as part of my Anthropic separation agreement. I worked there for 6 months and left in mid 2022. I received a cash payment as part of that agreement, with nothing shady going on a la threatening previous compensation (though I had no equity to threaten). In hindsight I undervalued my ability to speak freely, and didn't more seriously consider that I could just decline to sign the separation agreement and walk away, I'm not sure what I would do if doing it again.
I asked Anthropic to release me from this after the comment thread started, and they have now released me from both the non-disparagement clause, and the non-disclosure part, which was very nice of them - I would encourage anyone in a similar situation to reach out to hr[at]anthropic.com and legal[at]anthropic.com, though obviously can't guarantee that they'll release everyone. Feel free to DM or email for advice if you're in a similar situation.
I'll take advantage of my newfound freedoms to say that...
Idk, I don't really have anything too disparaging to say (though I dislike the use of concealed non-disparagements in general and am glad they say they're stopping!). I'm broadly a fan of Anthropic, think their heart is likely in the right place and they're trying to do what's best for the world (though could easily be making the wrong calls) and would seriously consider returning in the right circumstances. I've recommended that several friends of mine accept offers to do safety and interp work there, and feel good about this (though would feel much more hesitant about recommending someone joins a pure capabilities team there). My biggest critique is that I have concerns about their willingness to push the capabilities frontier and worsen race dynamics and, while I can imagine reasonable justifications, I think they're under valuing the importance of at least having clear public positions and rationales for this kind of thing and their clear shift in policies since Claude 1.0
EDIT: An additional detail that I genuinely appreciate is that Anthropic paid for me to have an independent lawyer to help explain the separation agreement and negotiate some changes on my behalf (I didn't push back on the concealed non-disparagement, but did alter some other parts). They recommended an independent lawyer, who I used, but were also happy to pay for a lawyer of my choice. As far as I'm aware, this was quite a non-standard thing for a company to do, and I appreciate it and think this was good and ethical in a way that wasn't obligatory.
EDIT 2: Someone asked that I share the terms of the agreement.
The non-disparagement clause:
Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff, which I don't think was disparaging or untrue, but to be safe], each party agrees that it will not make or publish or cause to be made or published any disparaging or untrue remark about the other party or, as the case may be, its directors, officers or employees. However, nothing in this clause or agreement will prevent any party to this agreement from (i) making a protected disclosure pursuant to Part IVA of the Employment Rights Act 1996 and/or (ii) reporting a criminal offence to any law enforcement agency and/or a regulatory breach to a regulatory authority and/or participating in any investigation or proceedings in either respect.
The non-disclosure clause:
Without prejudice to clause 6.3 [referring to my farewell letter to Anthropic staff] and 7 [about what kind of references Anthropic could provide for me], both Parties agree to keep the terms and existence of this agreement and the circumstances leading up to the termination of the Consultant's engagement and the completion of this agreement confidential save as [a bunch of legal boilerplate, and two bounded exceptions I asked for but would rather not publicly share. I don't think these change anything, but feel free to DM if you want to know]
How aware were you (as an employee) & are you (now) of their policy work? In a world model where policy is the most important stuff, it seems to me like it could tarnish very negatively Anthropic's net impact.
I don't quite understand the question. I've heard various bits of gossip, both as an employee and now. I wouldn't say I'm confident in my understanding of any of it. I was somewhat sad about Jack and Dario's public comments about thinking it's too early to regulate (if I understood them correctly), which I also found surprising as I thought they had fairly short timelines, but policy is not at all my area of expertise so I am not confident in this take.
I think it's totally plausible Anthropic has net negative impact, but the same is true for almost any significant actor in a complex situation. I agree that policy is one such way that their impact could be negative, though I'd generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.
I'm a bit worried about a dynamic where smart technical folks end up feeling like "well, I'm kind of disappointed in Anthropic's comms/policy stuff from what I hear, and I do wish they'd be more transparent, but policy is complicated and I'm not really a policy expert".
To be clear, this is a quite reasonable position for any given technical researcher to have– the problem is that this provides pretty little accountability. In a world where Anthropic was (hypothetically) dishonest, misleading, actively trying to undermine/weaken regulations, or putting its own interests above the interests of the "commons", it seems to me like many technical researchers (even Anthropic staff) would not be aware of this. Or they might get some negative vibes but then slip back into a "well, I'm not a policy person, and policy is complicated" mentality.
I'm not saying there's even necessarily a strong case that Anthropic is trying to sabotage policy efforts (though I am somewhat concerned about some of the rhetoric Anthropic uses, public comments about thinking its too early to regulate, rumors that they have taken actions to oppose SB 1047, and a lack of any real "positive" signals from their positive team like EG recommending or developing policy proposals that go beyond voluntary commitments or encouraging people to measure risks.)
But I think once upon a time there was some story that if Anthropic defected in major ways, a lot of technical researchers would get concerned and quit/whistleblow. I think Anthropic's current comms strategy, combined with the secrecy around a lot of policy things, combined with a general attitude (whether justified or unjustified) of "policy is complicated and I'm a technical person so I'm just going to defer to Dario/Jack" makes me concerned that safety-concerned people won't be able to hold Anthropic accountable even if it actively sabotages policy stuff.
I'm also not really sure if there's an easy solution to this problem, but I do imagine part of the solution involves technical people (especially at Anthropic) raising questions, asking people like Jack and Dario to explain their takes more, and being more willing to raise public & private discussions about Anthropic's role in the broader policy space.
Thanks for answering, that's very useful.
My concern is that as far as I understand, a decent number of safety researchers are thinking that policy is the most important area, but because, as you mentioned, they aren't policy experts and don't really know what's going on, they just assume that Anthropic policy work is way better than those actually working in policy judge it to be. I've heard from a surprisingly high number of people among the orgs that are doing the best AI policy work that Anthropic policy is mostly anti-helpful.
Somehow though, internal employees keep deferring to their policy team and don't update on that part/take their beliefs seriously.
I'd generally bet Anthropic will push more for policies I personally support than any other lab, even if they may not push as much as I want them to.
If it's true, it is probably true to an epsilon degree, and it might be wrong because of weird preferences of a non-safety industry actor. AFAIK, Anthropic has been pushing against all the AI regulation proposals to date. I've still to hear a positive example.
Separately, while I think the discussion around "is X net negative" can be useful, I think it ends up implicitly putting the frame on "can X justify that they are not net negative."
I suspect the quality of discourse– and society's chances to have positive futures– would improve if the frame were more commonly something like "what are the best actions for X to be taken" or "what are reasonable/high-value things that X could be doing."
And I think it's valid to think "X is net positive" while also thinking "I feel disappointed in X because I don't think it's using its power/resources in ways that would produce significantly better outcomes."
IDK what the bar should be for considering X a "responsible actor", but I imagine my personal bar is quite a bit higher than "(barely) net positive in expectation."
P.S. Both of these comments are on the opinionated side, so separately, I just wanted to say thank you Neel for speaking up & for offering your current takes on Anthropic. Strong upvoted!
A tip for anyone on the ML job/PhD market - people will plausibly be quickly skimming your google scholar to get a sense of "how impressive is this person/what is their deal" read (I do this fairly often), so I recommend polishing your Google scholar if you have publications! It can make a big difference.
I have a lot of weird citable artefacts that confuse Google Scholar, so here's some tips I've picked up:
To anyone currently going through NeurIPS rebuttals fun for the first time, some advice:
Firstly, if you're feeling down about reviews, remember that peer review has been officially shown to be a ridiculous random number generator in an RCT - half of spotlight papers are rejected by another review committee! Don't tie your self-worth to whether the roulette wheel landed on black or red. If their critiques don't make sense, they often don't (and were plausibly written by an LLM). And if they do make sense (and remember to control for your defensiveness), then this is great - you have valuable feedback that can improve the paper!
Cross posting one of my tweet threads that people here might enjoy
A recent dilemma of mine: how to eat less sweet food but still have it in moderation? I don't want to spend the willpower required to cut it out entirely, or to agonise every time about whether something is really worth it
My surprisingly elegant solution: Randomise! Have it with probability 2/3 (or probability of your choice) Abiding by the RNG is far easier than resisting temptation!
This is surprisingly general! Probabilistic dieting? Probabilistic vegetarianism? Half the moral benefits, far easier. Well, at least personally, I would find "toss a coin at each meal for whether to have meat" more than twice as easy as cutting it entirely.
I would also be very curious if this helps people cut back on drugs like alcohol/tobacco/etc.
You can also change the probability over time, eg if giving something up feels really hard, you can do it at 90% each day, and reduce that by 1% every day until it reaches 0 in 3 months.
Note: this doesn't work if you can re-roll immediately after so you need restrictions on when you can pick a new random number - for snacks I can have any time I do one random pick a day, one a meal is also fine
I also recommend carrying dice around in your pocket, or having a random number generator on your watch or phone - makes this way easier to do whenever. Bonus points if you use a quantum RNG.
This is also very useful for analysis paralysis, eg what to eat for dinner or wear
That's sounds like an interesting trick. However:
I don't want to spend the willpower required to cut it out entirely, or to agonise every time about whether something is really worth it
If you cut it out entirely, you get used to it, and no longer need a lot of willpower after a while. Though it's probably less realistic to cut out sugar entirely than to quit some drug entirely.
If you cut it out entirely, you get used to it
Your experience may vary but I've done 12-week weight loss cycles where I ate no sweets and I never lost my desire to eat sweets. I'm on week 6 of a 6-week weight loss cycle right now, I had pretty strong cravings on week 2–3 and they significantly subsided by week 4 but they're still there.
I do still eat fruit, which may be enough to maintain my sugar cravings, but if your goal is to improve health then I think it's a bad idea to cut out fruit. And anyway I don't get cravings for fruit, I get cravings for artificially-sweetened foods.
I've heard at least one person report that they entirely lost their sugar cravings when they stopped eating sugar. So it works for some people it just doesn't work for me.
Oh, that's disappointing. I once got rid of my craving for sweet drinks just by completely quitting drinks with sugar and sweeteners for a while. Unfortunately I since had a relapse. It's easy to get addicted again, especially when another drug is involved, as in energy drinks. The randomization (gamification?) approach may work better in some cases.
If you cut something out entirely, that's hard at first, but basically free later, when you became unaddicted. Just reducing consumption to medium level probably doesn't cause you to get unaddicted in this way, so this requires some degree of long-term willpower. I assume this is why alcoholics try to stay completely "dry", not just reduce their consumption.
I do this often, inspired by the novel “The Dice Man”. helps break inner conflicts in what feels like a fair, fully endorsed way. @Richard_Ngo has a theory that this “random dictatorship” model of decision making has uniquely good properties as a fallback when negotiation fails / is too expensive & why active inference involves probabilistic distributions over goal state not atomic goal states.
I was about to try this, but then realized the Internal Double Crux was a better tool for my specific dilemma. I guess here's a reminder to everyone that IDC exists.
My suggestion: use every meal as a reward for something.
Here is an excerpt from an old piece of mine, not very LessWrongish, but you may find some ideas interesting:
The Theoretical Discussion section looks into the causes of the obesity problem and expands its
scope to a more general topic of addictions. Its first subsection, Hunger Recognition entertains the
idea that the availability of digestion capacity may get mistaken for real hunger.
Overeating is not the only bad habit that people struggle to overcome. Studying the similarities
and differences among various bad habits and addictions helps us better understand their nature
and fight them. Decision Fatigue subsection opens discussion on habits. Priority Bias digs into
causes of poor decisions, and Commitment with Mindfulness talks about sustainable solutions.