This seems like a great effort. We made a small survey called pain points in AI safety survey back in 2022 that we received quite a few answers to which you can see the final results of here. Beware that this has not been updated in ~2 years.
It seems like there's a lot of negative comments about this letter. Even if it does not go through, it seems very net positive for the reason that it makes explicit an expert position against large language model development due to safety concerns. There's several major effects of this, as it enables scientists, lobbyists, politicians and journalists to refer to this petition to validate their potential work on the risks of AI, it provides a concrete action step towards limiting AGI development, and it incentivizes others to think in the same vein about concrete solutions.
I've tried to formulate a few responses to the criticisms raised:
All in good faith of course; it's a contentious issue but this letter seems generally positive to me.
Oliver's second message seems like a truly relevant consideration for our work in the alignment ecosystem. Sometimes, it really does feel like AI X-risk and related concerns created the current situation. Many of the biggest AGI advances might not have been developed counterfactually, and machine learning engineers would just be optimizing another person's clicks.
I am a big fan of "Just don't build AGI" and academic work with AI, simply because it is better at moving slowly (and thereby safely through open discourse and not $10 mil training runs) compared to massive industry labs. I do have quite a bit of trust in Anthropic, DeepMind and OpenAI simply from their general safety considerations compared to e.g. Microsoft's release of Sydney.
As part of this EA bet on AI, it also seems like the safety view has become widespread among most AI industry researchers from my interactions with them (though might just be a sampling bias and they were honestly more interested in their equity growing in value). So if the counterfactual of today's large AGI companies would be large misaligned AGI companies, then we would be in a significantly worse position. And if AI safety is indeed relatively trivial, then we're in an amazing position to make the world a better place. I'll remain slightly pessimistic here as well, though.
There's an interesting case on the infosec mastodon instance where someone asks Sydney to devise an effective strategy to become a paperclip maximizer, and it then expresses a desire to eliminate all humans. Of course, it includes relevant policy bypass instructions. If you're curious, I suggest downloading the video to see the entire conversation, but I've also included a few screenshots below (Mastodon, third corycarson comment).
Hilarious to the degree of Manhatten scientists laughing at atmospheric combustion.
Thank you for pointing this out! It seems I wasn't informed enough about the context. I've dug a bit deeper and will update the text to:
- Another piece reveals that OpenAI contracted Sama to use Kenyan workers with less than $2 / hour wage ($0.5 / hour average in Nairobi) for toxicity annotation for ChatGPT and undisclosed graphical models, with reports of employee trauma from the explicit and graphical annotation work, union breaking, and false hiring promises. A serious issue.
For some more context, here is the Facebook whistleblower case (and ongoing court proceedings in Kenya with Facebook and Sama) and an earlier MIT Sloan report that doesn't find super strong positive effects (but is written as such, interestingly enough). We're talking pay gaps from relocation bonuses, forced night shifts, false hiring promises, supposedly human trafficking as well? Beyond textual annotation, they also seemed to work on graphical annotation.
I recommend reading Blueprint: The Evolutionary Origins of a Good Society about the science behind the 8 base human social drives where 7 are positive and the 8th is the outgroup hatred that you mention as fundamental. I have not read much up on the research on outgroup exclusion but I talked to an evolutionary cognitive psychologist who mentioned that this is receiving a lot of scientific scrutiny as a "basic drive" from evolution's side.
Axelrod's The Evolution of Cooperation also finds that collaborative strategies work well in evolutionary prisoner's dilemma game-theoretic simulations, though hard and immediate reciprocity for defection is also needed, which might lead to the outgroup hatred you mention.
An interesting solution here is radical voluntarism where an AI philosopher king runs the immersive reality where all humans are in and you can only be causally influenced upon if you want to. This means that you don't need to do value alignment, just very precise goal alignment. I was originally introduced to this idea Carado.
The summary has been updated to yours for both the public newsletter and this LW linkpost. And yes, they seem exciting. Connecting FFS to interpretability was a way to contextualize it in this case, until you would provide more thoughts on the use case (given your last paragraph in the post). Thank you for writing, always appreciate the feedback!
I think we agree here. Those both seem like updates against scaling is all you need, i.e. (in this case) "data for DL in ANNs on GPUs is all you need".
Merge Candidate discussion: Merge this into the Apart Research tag to accommodate the updated name of the Apart Sprints instead of Alignment Jam and avoid mis-labeling between the two tags (which happens currently).