I believe we need a fire alarm.
People were scared of nuclear weapons since 1945, but no one restricted the arms race until The Cuban Missile Crisis in 1961.
We know for sure that the crisis really scared both Soviet and US high commands, and the first document to restrict nukes was signed the next year, 1962.
What kind of fire alarm it might be? That is The Question.
I think an important thing to get people convinced of the importance of AI safety is to find proper "Gateway drug" ideas that already bother that person, so they are likely to accept this idea, and through it get interested in AI safety.
For example, if a person is concerned about the rights of minorities, you might tell them about how we don't know how LLMs work, and this causes bias and discrimination, or how it will increase inequality.
If a person cares about privacy and is afraid of government surveillance, then you might tell them about how AI might make all these problems much worse.
Eh. It's sad if this problem is really so complex.
Thank you. At this point, I feel like I have to stick to some way to align AGI, even if it has not that big chance to succeed, because it looks like there are not that many options.
Thanks for your elaborate response!
But why do you think that this project will take so much time? Why can't it be implemented faster?
Do you have any plans for inter-lab communications based on your evals?
I think, your evals might be a good place for AGI labs to standardize protocols for safety measures.
I think this Wizard of Oz problem in large part is about being mindful and honest with oneself.
Wishful thinking is somewhat the default state for people. It's hard to be critical to own ideas and wishes. Especially, when things like money or career advancement are at stake.
Thank you! The idea of inter-temporal coordination looks interesting
Can you elaborate on your comment?
It seems so intriguing to me, and I would love to learn more about "Why it's a bad strategy if our AGI timeline is 5 years or less"?
Why do you think that it will not be competitive with other approaches?
For example, it took 10 years to sequence the first human genome. After nearly 7 years of work, another competitor started an alternative human genome project using completely another technology, and both projects were finished approximately at the same time.
I think, we are entering a black swan and it's hard to predict anything.
I absolutely agree with the conclusion. Everything is moving so fast.
I hope, these advances will cause massive interest in the alignment problem from all sorts of actors, and even if OpenAI are talking about safety (and recently they started talking about it quite often) in a large part because of PR reasons, it still means that they think, society is concerned about the progress which is a good sign.
What are examples of “knowledge of building systems that are broadly beneficial and safe while operating in the human capabilities regime?”
I assume the mentioned systems are institutions like courts, government, corporations, or universities
Charlotte thinks that humans and advanced AIs are universal Turing machines, so predicting capabilities is not about whether a capability is present at all, but whether it is feasible in finite time with a low enough error rate.
I have a similar thought. If AI has human-level capabilities, and a part of its job is ...
Thanks for your view on doomerism and your thoughts on the framing of a hopre
One thing helping me to preserve hope is the fact that there are so many unknown variables about AGI and how humanity will respond to it, that I don't think that any current-day prediction is worth a lot.
Although I must admit, doomers like Connor Leahy and Eliezer Yudkovsky might be extremely persuasive but they also don't know many important things about the future and they are also full of cognitive biases. All of this makes me tell myself a mantra "There is still hope that we might win".
I am not sure whether this is the best way to think about these risks but I feel like if I'll give it up, it is a straightforward path to existential anxiety and misery, so I try not to question it too much.
I agree. We have problems with emotional attachment to humans all the time, but humans are more or less predictable, not too powerful, and usually not so great at manipulations
Thank you for your comment and everything you mentioned in it. I am a psychologist entering the field of AI policy-making, and I am starving for content like this
It does, and it causes a lot of problems, so I would prefer to avoid such problems with AIs
Also, I believe that an advanced AI will be much more capable in terms of deception and manipulation than an average human
I 100% agree with you.
I am a person entering the field right now, I also know several people in a position similar to mine, and there are just no positions for people like me, even though I think I am very proactive and have valuable experience
Good post, but there is a big disbalance in human-ants relationships.
If people could communicate with ants, nothing would stop humans to make ants suffer if it made the deal better for humans because of a power disbalance.
For example, domesticated chickens live in very crowded and stinky conditions, and their average lifespan is a month after which they are killed. Not a particularly good living conditions.
People just care about profitability do it just because they can.
I have similar thoughts. I believe that at one moment, fears about TAI will spread like a wildfire, and the field will get a giant stream of people, money and policies, and it is hard to feel from today
First, your article is very insightful and well-structured, and totally like it.
But there is one thing that bugs me.
I am a person new to AI alignment field, and recently, I realized (maybe by mistake) that there is very hard to find a long-term financially stable full-time job in AI field-building.
For me, it basically means that only a tiny amount of people consider AI alignment important enough to pay money to decrease P(doom). And at the same time, here we are talking about possibility of doom within next 10 or 20 years. For me it is all a bi...
ChatGPT was recently launched, and it is so powerful, that it made me think that the problem of a misuse of a powerful AI It's a very powerful tool. No one really knows how to use it, but I am sure, we will soon see it used as a tool for unpleasant things
But I also see more and more of perception of AI as a live entity with agency. People are having conversations with ChatGPT as with a human
I agree that fearmongering is thin ice, and can easily backfire, and it must be done carefully and ethically, but is it worse than the alternative in which people are unaware of AGI-related risks? I don't think that anybody can say with certainty
The reactor meltdown on a Soviet submarine was not posing an existential threat. In the worst case, it would be a little version of Chernobyl. We might compare it to an AI which causes some serious problems, like a stock market crash, but not existential ones. And the movie is not a threat at all.
"The question is how plausible it is to generate situations that are scary enough to be useful, but under enough control to be safe."
That is a great summary of what I wanted to say!
In my opinion, this methodology will be a great way for a model to learn how to persuade humans and exploit their biases because this way model might learn these biases not just from the data it collected but also fine-tune its understanding by testing its own hypotheses
I totally agree that it might be good to have such a fire alarm as soon as possible, and looking at how fast people make GPT-4 more and more powerful makes me think that this is only a matter of time.