I’ve selected to opt-out of Patrov Day, not because I don’t want to participate, but because I think this is the most optimal strategy. The more people who opt-out, the less likely the button will be pushed.
Great analysis! I’m curious about the disagreement with needing a pivotal act. Is this disagreement more epistemic or normative? That is to say do you think they assign a very low probability of needing a pivotal act to prevent misaligned AGI? Or do they have concerns about the potential consequences of this mentality? (people competing with each other to create powerful AGI, accidentally creating a misaligned AGI as a result, public opinion, etc.)
Yes, this surprised me to. Perhaps it was the phrasing that they disagreed with? If you asked them about all possible intelligences in mindspace, and asked them if they thought AGI would fall very close to most human minds, maybe their answer would be different.
There’s a part of your argument I am confused about. The sharp left turn is a sudden change in capabilities. Even if you can see if things are trending one way or the other, how can you see sharp left turns coming? At the end, you clarify that we can’t predict when a left turn will occur, so how do these findings pertain to them? This seems to be more of an attempt to track trends of alignment/misalignment, but I don’t see what new insights it gives us about sharp left turns specifically.
I agree that we need clear wins, but I also think that most people in the AI Safety community agree that we need clear wins. Would you be interested in taking ownership of this, speaking with various people in the community, and write up a blog post with what you think would characterize a clear action plan, with transparent benchmarks for progress? I think this would be very beneficial, both on the Alignment side and the Governance side.
This has caused me to reconsider what intelligence is and what an AGI could be. It’s difficult to determine if this makes me more or leas optimistic about the future. A question: are humans essentially like GPT? We seem to be running simulations with the attempt to reduce predictive loss. Yes, we have agency; but this that human “agent” actually the intelligence or just generated by it?
Could you explain the rational behind the "Open" in OpenAI? I can understand the rational of trying to beat more reckless companies to achieving AGI first (albeit, this mentality is potentially extremely dangerous too), but what is the rational behind releasing your research? This will enable companies that do not prioritize safety to speed ahead with you, perhaps just a few years behind. And, if OpenAI hesitates to progress, due to concerns over safety, the more risk-taking orgs will likely speed ahead of OpenAI in capabilities. The bottomline is I'm concerned your efforts to achieve AGI might not do much to ensure an aligned AGI is actually created, but instead only speed-up the timeline toward achieving AGI by years or even decades.
Very insightful piece! One small quibble, you state the disclaimer that you’re not assuming only Naive Safety measures is realistic many, many times. While I think doing this might be needed when writing for a more general audience, I think for the audience of this writing, only stating it once or twice is necessary.
One possible idea I had. What if, when training Alex based on human feedback, the first team of human evaluators were intentionally picked to be less knowledgeable, more prone to manipulation, and less likely to question answers Alex gave them. Then, you introduce a second team of the most thoughtful, knowledgeable, and skeptical researchers to evaluate Alex. If Alex was acting deceptively, it might not recognize the change fast-enough, and manipulation that worked in the first team might be caught by the second team. Yes, after a while Alex would probably catch on and improve its tactics, but by then the deceptive behavior would have already been exposed.
"By the time AI systems can double the pace of AI research, it seems like they can greatly accelerate the pace of alignment research."
I think this assumption is unlikely. From what we know of human-lead research, accelerating AI capabilities is much easier than accelerating progress in alignment. I don't see why it would be different for an AI.
I'd be mindful of information hazards. All you need is one person doing this too soon, and likely failing, for talking about the dangers of AI to become taboo in the public eye.