Can you say what happens if the superintelligence is successfully optimizing for maximizing the subjective feeling of happiness in humans?
Can you say why we can’t do even that?
Would your main point be that if successful, this leads to wireheading? And/or that we don't know how to align a superintelligence to such a value anyway?
Yep, pretty much! (As an example of the motion you would want to be able to make in more general/less obvious settings.)
I think I’m pretty good at convincing people about AI dangers. This post talks about the basics of speaking convincingly about AI dangers to people.
In 2022, at a CFAR workshop, I was introduced to circling.
It is multi-player meditation. People sit in a circle and have a conversation, but the content of the conversation is mostly focused on the meta: what someone says or expresses causes in you, how you relate to other people, and what’s going on in the circle as a group.
It is sometimes a great experience; but more importantly, it allows you to (1) pay attention to what’s going on in other people and to your models of them and (2) explicitize your hypotheses, ask people what’s actually going on, and be surprised by how you’re very wrong about other people! This is awesome, and quickly updates you to learn to be very attentive to other people and to see them, in a far less wrong way.
(Related: Circling as Cousin to Rationality.)
I think step 1 of talking well about AI dangers to people is to learn to try to see them, to notice where they’re coming from, their world model, what their experience is like, what generates their questions and arguments.
(Circling is a great way to get much better at this if you go in with the intention of paying attention to what’s going on in other people and how they relate to others and to what’s going on in the moment, explicitly turning intuitive ideas and hunches about that into predictions, and learning which ones are wrong.)
This will make it much more intuitive to you what arguments you can make that will move the person from where they are to seeing what the world is currently like.
Don’t do corporate-speak and don’t hedge. Always be honest about why you believe in it. Don’t understand ML and defer to experts and think it’s insane that the world is racing to superintelligence while Geoffrey Hinton regrets his life’s work and thinks the chance of everyone dying is >50%? Don’t shy away from that.
Use valid arguments. Use real facts. If you’re not sure of something, be honest you’re unsure.
The goal is to get everyone closer to the actual state of the world. Dishonesty will both end up backfiring in real life and won’t work as well as the truth. The truth has detail and foundation, and truthful arguments ring differently.
When you really, really want to immediately blurt out something that came to your mind as a perfect response, stop yourself to ask: is it actually true and valid?
Learn the skill of noticing when you don’t quite actually entirely believe in what you’re saying and retreating or course-correcting, and be happy every time you successfully do this, and make sure to retract anything that you realize wasn’t exactly perfectly representing reality.
Finally, honesty and truth are our advantages. In a competition of getting to present dishonest/misleading/flawed arguments that are most persuasive, the other side would win, because it has many more resources to pour into making and presenting flawed persuasive arguments. When both sides get to present arguments, our side can win only because the truth is on our side and we can point at it.
So: be honest.
(See also: A case for courage, when speaking of AI danger.)
You want to grok the deeper generators of the arguments, not just the arguments themselves: you’ll then be able to answer questions and counterarguments in ways that both meet people where they are/adjust for them, and remain valid and convincing and tied to reality.
Read the Arbital articles on AI Alignment, the 2022 posts from Yudkowsky and Soares, some of their 2023 posts (e.g., Deep Deceptiveness), If Anyone Builds It, and its supplemental materials.
See if you can understand the deeper lessons. Did you get a sense of the Sharp Left Turn problem being about...
Look at current mainstream explanations and opinions: what do WaitButWhy, Oprah, and Snoop Dogg say on the threat from superintelligent AI?
Very optional, but can be quite useful if it works out: try to develop a bit of a sense of a security mindset. Go through some simple CTFs/learn the idea behind SQL injections and XSS in a way that you can generalize to spotting where systems fail under adverse optimization.
Can you say what happens if the superintelligence is successfully optimizing for maximizing the subjective feeling of happiness in humans?
Can you say why we can’t do even that?
Look for proposals for aligning superintelligence. Can you see the specifics of why they would fail or don’t address the hard bits of the problem?
(in 1:1 settings)
Assume the person you’re talking to is smart, and the reason why they don’t think AI is likely to kill everyone on the default trajectory is that they are not aware of some facts about the real world (or have not yet made some interferences because they haven’t yet stumbled across a train of thought that leads them there). A goal is to keep figuring out and updating what the diff between what they know and the real world is, and what things can be said that would bring them quickly to bridging the gap from their current state of mind to understanding the fact that matters.
(Promoting the idea to a seriously considered takes more bits than giving it a lot of weight. What are these bits? What can inspire their curiosity?
It’s valid if the main thing they’re missing is that many scientists and experts say it is a very concerning threat that AI might literally kill everyone, but make sure they’re curious about why and you can know what about why it is a real threat that they’re missing.
Sometimes, it’s helpful simply if you’re greatly worried AI is likely to kill everyone on the planet while being dressed well (though don’t sell your soul over that; I’ve terrified diplomats with what the current situation is while being dressed very nerdy) or while having legible credibility/trustworthiness. Unexpected -> lots of bits, makes people curious, and sometimes makes it easier for them to trust your words.)
Keep trying to constantly see the other person. Your focus is on them. What matters is their state, their curiosities, their background, their instinctive reactions. To give explanations that they would find most intuitive and understandable and that would efficiently bring their picture of the real world closer to what the real world is, you need to keep paying a lot of system-1 attention to all of these things.
Look out for repeating yourself, saying the same things more than once. Often, finding yourself repeating things is fine: maybe you stopped yourself from completing the point the first time because you noticed they didn’t know a prerequisite, and now they’re ready to hear it; or maybe they missed it the first time, and now, after a while, they want to go back to it and hear it. Other times, though, something’s gone very wrong; you should never be in a cycle. Pause, if it looks like it could be a cycle; you might be losing at the goal of helping them understand the current situation! Listen to them carefully and figure out why (and if) it’s helpful to be saying what you’re saying. Orient towards what they need to understand and what they’re curious about. Only communicate what they actively want to learn, in the moment. Ponder if you’re wrong about them, or about the words you need to say; try to reset and go towards an independent branch, or look for and address deeper reasons or parts of their model.
The goal is to help them learn the information they were missing and want to learn. So: say what is helpful for them to hear, according to their values, and not to what you want to say.
(Don’t let this stop you from being authentic! If something excites you, or terrifies you, you can go on a tangent, and share it. But focus on what’s important to them; if they’re asking you about something, answer what they’re actually asking, and not a related question you think is good to talk about. What matters is their reasons for believing in things, with their background; not your reasons for believing in things, with your background.
A curious Uber driver? Your university professor? A friend or a family member?
You interact with so many people, close and distant.
Talk to them. Pay attention. See what works; what, when you say it, inspires curiosity.
Develop your skills. Experiment and explore. Convince.
Good luck!