Yes, thank you, I think that's it exactly. I don't think that people are communicating this well when they are reporting predictions.
Are we misreporting p(doom)s?
I usually say that my p(doom) is 50%, but that doesn't mean the same thing that it does in a weather forecast.
In weather forecasts, the percentage states that they ran a series of simulations, and that percentage of simulations produced that result. A forecast of a 100% chance of rain, then, does not mean that there is near a 100% chance of rain. Forecasts still have error bars; 10 days out, a forecast will be wrong 50% of the time. Therefore, a 10 forecast of 100% chance of rain means that there is actually a 50%.
In my me... (read more)
Are you assuming that avoiding doom in this way will require a pivotal act? It seem absent policy intervention and societal change, even if some firms exhibit a proper amount of concern many others will not.
A similar principle I have about this situation is: Don't get too clever.
Don't do anything questionable or too complicated. If you do, you're just as likely to cause harm as to cause good. The psychological warfare campaign you've envisioned against OpenAI is going to backfire on you and undermine your team.
Keep it simple. Promote alignment research. Persuade your friends. Volunteer on one of the many relevant projects.
Upvoted, I agree with the gist of what you saying, with some caveats. I think I would have expected the two posts to end up with a score of 0 to 5, but there is a world of difference between a 5 and a -12.
It's worth noting that the example explainer you linked to doesn't appeal to me at all. And that's fine. It doesn't mean that there's something wrong with the argument, or with you, or with me. But it's important to note that it demonstrates a gap. I've read all the alignment material, and I still see huge chunks of the populati... (read more)
Under the tag of AI Safety Materials, 48 posts come up. There are exactly two posts by sprouts:
An example elevator pitch for AI doom Score: -8
On urgency, priority and collective reaction to AI-Risks: Part I Score: -12
These are also the only two posts with negative scores.
In both cases, it was the user's first post. For Denreik in particular you can tell that he suffered over it and put many hours into it.
Is it counterproductive to discourage new arrivals attempting to assist in the AI alignment effort?
Is there a systemic bias ag... (read more)
Denreik, I think this is a quality post and I know you spent a lot of time on it. I found your paragraphs on threat complexity enlightening - it is in hindsight an obvious point that a sufficiently complex or subtle threat will be ignored by most people regardless of its certainty, and that is an important feature of the current situation.
I agree that there are many situations where this cannot be used. But there appears at least to be a gap that arguments like this can fill that is missed by the existing explanations.
I find those first two and Lethalities to be too long and complicated for convincing an uninitiated, marginally interested person. Zvi's Basics is actually my current preference along with stories like It Looks Like You're Trying To Take Over The World (Clippy).
The best primer that I have found so far is Basics of AI Wiping Out All Value in the Universe by Zvi. It's certainly not going to pass peer review, but it's very accessible, compact, covers the breadth of the topics, and links to several other useful references. It has the downside of being buried in a very long article, though the link above should take you to the correct section.
Let's not bury this comment. Here is someone we have failed: there are comprehensive, well-argued explanations for all of this, and this person couldn't find them. Even the responses to the parent comment don't conclusively answer this - let's make sure that everyone can find excellent arguments with little effort.
Thank you for pointing this perspective out. Although Eliezer is from the west, I assure you he cares nothing for that sort of politics. The whole point is that the ban would have to be universally supported, with a tight alliance between US, China, Russia, and ideally every other country in the world. No one wants to do any airstrikes and, you're right, they are distracting from the real conversation.
That's a very interesting observation. As far as I understand as well, deep neural networks have completely unlimited rewirability - a particular "function" can exist anywhere in the network, in multiple places, or spread out between and within layers. It can be duplicated in multiple places. And if you retrain that same network, it will then be found in another place in another form. It makes it seem like you need something like a CNN to be able to successfully identify functional groups within another model, if it's even possible.
Thank you Arthur. I'd like to offer my help on continuing to develop this project, and helping any of the other teams (@ccstan99, @johnathan, and others) on their projects. We're all working towards the same thing. PM me, and let me know if there are any other forums (Discord, Slack, etc) where people are actively working on or need programming help for AI risk mitigation.
I think we need to move public opinion first, which hopefully is slowly starting to happen. We need one of two things to happen:
A strike does not currently help either of those.
Edit: Actually, I do agree that if you could get ALL AI researchers - a general strike - that would serve the purpose of delay, and I would be in favor. I do not think that is realistic. A lesser strike might also serve to drum up attention; I was initially afraid that it might drum up negative attention.
I have a well functioning offline Python pipeline that integrates the OpenAI API and the entire alignment research dataset. If this is still needed, I need to consider how to make this online and accessible without tying it to my API key. Perhaps I should switch to using the new OpenAI plugins instead. Suggestions welcomed.
It's easy to construct alternate examples of the Monty Fall problem that clearly weren't in the training data. For example, from my experience GPT-4 and Bing Chat in all modes always get this prompt wrong:Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You know that the car is always behind door number 1. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?