All of laserfiche's Comments + Replies

Yes, thank you, I think that's it exactly. I don't think that people are communicating this well when they are reporting predictions.

Are we misreporting p(doom)s?

I usually say that my p(doom) is 50%, but that doesn't mean the same thing that it does in a weather forecast.

In weather forecasts, the percentage states that they ran a series of simulations, and that percentage of simulations produced that result. A forecast of a 100% chance of rain, then, does not mean that there is near a 100% chance of rain. Forecasts still have error bars; 10 days out, a forecast will be wrong 50% of the time. Therefore, a 10 forecast of 100% chance of rain means that there is actually a 50%.

In my me... (read more)

3mako yass6mo
Possibly ?

Are you assuming that avoiding doom in this way will require a pivotal act? It seem absent policy intervention and societal change, even if some firms exhibit a proper amount of concern many others will not.

It's unclear whether some people being cautious and some people being incautious leads to an AI takeover. In this hypothetical, I'm including AI developers selling AI systems to law enforcement and militaries, which are used to enforce the law and win wars against competitors using AI. But I'm assuming we wouldn't pass a bunch of new anti-AI laws (and that AI developers don't become paramilitaries).

A similar principle I have about this situation is: Don't get too clever.

Don't do anything questionable or too complicated. If you do, you're just as likely to cause harm as to cause good. The psychological warfare campaign you've envisioned against OpenAI is going to backfire on you and undermine your team.

Keep it simple. Promote alignment research. Persuade your friends. Volunteer on one of the many relevant projects.

Upvoted, I agree with the gist of what you saying, with some caveats. I think I would have expected the two posts to end up with a score of 0 to 5, but there is a world of difference between a 5 and a -12.

It's worth noting that the example explainer you linked to doesn't appeal to me at all.  And that's fine.  It doesn't mean that there's something wrong with the argument, or with you, or with me.  But it's important to note that it demonstrates a gap.  I've read all the alignment material[1], and I still see huge chunks of the populati... (read more)

Huh, I see. Agree about the 0-5 vs. -12 (in this case -8) difference. I don't see myself in the business of making good explainer material for the general public, so I'll defer to you on that (since you have read more of the introductions than I have). Also, I guess posting that google doc here would probably would be upvoted?

Under the tag of AI Safety Materials, 48 posts come up.  There are exactly two posts by sprouts:

An example elevator pitch for AI doom Score: -8[1]

On urgency, priority and collective reaction to AI-Risks: Part I Score: -12

These are also the only two posts with negative scores.  

In both cases, it was the user's first post.  For Denreik in particular you can tell that he suffered over it and put many hours into it. 

Is it counterproductive to discourage new arrivals attempting to assist in the AI alignment effort?

Is there a systemic bias ag... (read more)

I mostly don't want new people to contribute to public materials efforts. I want people to have thought concretely about the problem and fleshed out their understanding of it before focusing on communicating it to others. I do want people who are entering the space to have a good experience. I'm mulling over some posts that give newcomers a clearer set of handholds on what to do to get started.
Man I have conflicting opinions about this. "People want to help" is a good thing. But then the upvote/downvote mechanism is not about the poster but about the post and has the function of ranking things that others find helpful. And both posts you linked just…aren't that great? Yours doesn't deserve getting downvoted, but it also doesn't really deserve getting upvoted all that much imho—there's so much AI alignment intro material out there, from popular articles to youtube videos to book-length explainers from so many people…and e.g. this one fits pretty well into your desiderata? As for Denreiks post: It doesn't smell like a thing I'd want to read (no paragraph breaks, no clear statement of the conclusion at the top, slightly confusing writing…), and while I haven't read it (and therefore didn't vote either way), such things are unfortunately a reliable indicator. Then again: I'd love it if there was some way of showing someone "Hey, I like that you're trying to help! Maybe lurk moar (a lot moar, maybe ratio of 100:1 or 1000:1 for contributing/reading), start by commenting or shortforming". But also there needs to be some mechanism of ranking content.

Denreik, I think this is a quality post and I know you spent a lot of time on it. I found your paragraphs on threat complexity enlightening - it is in hindsight an obvious point that a sufficiently complex or subtle threat will be ignored by most people regardless of its certainty, and that is an important feature of the current situation.

Thank you. I set to write something clear and easy to read that could serve as a good cornerstone to decisive actions later on and I still think I accomplished that fairly well. 

I agree that there are many situations where this cannot be used. But there appears at least to be a gap that arguments like this can fill that is missed by the existing explanations.

I find those first two and Lethalities to be too long and complicated for convincing an uninitiated, marginally interested person. Zvi's Basics is actually my current preference along with stories like It Looks Like You're Trying To Take Over The World (Clippy).

The best primer that I have found so far is Basics of AI Wiping Out All Value in the Universe by Zvi.  It's certainly not going to pass peer review, but it's very accessible, compact, covers the breadth of the topics, and links to several other useful references.  It has the downside of being buried in a very long article, though the link above should take you to the correct section.

What does that mean? I notice that it doesn't actually prove that AI will definitely kill us all. I've never seen anything else that does, either. You can't distill what never existed.

Let's not bury this comment. Here is someone we have failed: there are comprehensive, well-argued explanations for all of this, and this person couldn't find them. Even the responses to the parent comment don't conclusively answer this - let's make sure that everyone can find excellent arguments with little effort.

1M. Y. Zuo10mo
Is this written for a different comment and accidentally posted here?

Thank you for pointing this perspective out. Although Eliezer is from the west, I assure you he cares nothing for that sort of politics. The whole point is that the ban would have to be universally supported, with a tight alliance between US, China, Russia, and ideally every other country in the world. No one wants to do any airstrikes and, you're right, they are distracting from the real conversation.

That's a very interesting observation. As far as I understand as well, deep neural networks have completely unlimited rewirability - a particular "function" can exist anywhere in the network, in multiple places, or spread out between and within layers. It can be duplicated in multiple places. And if you retrain that same network, it will then be found in another place in another form. It makes it seem like you need something like a CNN to be able to successfully identify functional groups within another model, if it's even possible.

Thank you Arthur.  I'd like to offer my help on continuing to develop this project, and helping any of the other teams (@ccstan99, @johnathan, and others) on their projects.  We're all working towards the same thing.  PM me, and let me know if there are any other forums (Discord, Slack, etc) where people are actively working on or need programming help for AI risk mitigation.

I think we need to move public opinion first, which hopefully is slowly starting to happen.  We need one of two things to happen:

  1. A breakthrough in AI alignment research
  2. Major shifts in policy

A strike does not currently help either of those.  

Edit:  Actually, I do agree that if you could get ALL AI researchers - a general strike - that would serve the purpose of delay, and I would be in favor.  I do not think that is realistic.  A lesser strike might also serve to drum up attention; I was initially afraid that it might drum up negative attention.

[This comment is no longer endorsed by its author]Reply
4Stephen Fowler1y
It increases the amount of time we have to make those breakthroughs
I think if it happens, it'll help shift policy because it'll be a strong argument in policy discussions. "Look, many researchers aren't just making worried noises about safety but taking this major action."

I have a well functioning offline Python pipeline that integrates the OpenAI API and the entire alignment research dataset.  If this is still needed, I need to consider how to make this online and accessible without tying it to my API key.  Perhaps I should switch to using the new OpenAI plugins instead.  Suggestions welcomed.

It's easy to construct alternate examples of the Monty Fall problem that clearly weren't in the training data.  For example, from my experience GPT-4 and Bing Chat in all modes always get this prompt wrong:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You know that the car is always behind door number 1. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

I'm not sure if I'm missing something. This is first try after reading your comment: 
Notably, if you tell it to think step by step it gets the question right and answers with: This problem is known as the Monty Hall problem. In the original problem, the car is placed randomly behind one of the three doors, and the host always opens a door with a goat after you make your initial choice. However, in your variation, the car is always behind door number 1. Let's analyze your version step by step: You initially choose door No. 1, which always has the car behind it. The host, knowing what's behind the doors, will always open one of the other two doors, revealing a goat. In this case, he opens door No. 3. Now you are given the option to switch your choice to door No. 2. In this version of the problem, it is NOT to your advantage to switch your choice, because the car is always behind door No. 1, and you have already chosen that door. If you switch to door No. 2, you will end up with a goat instead of the car.   This is honestly pretty similar to how humans behave most of the time. They pattern match and don't really pay close attention. However, if you give cues that something is actually important, for example by putting them in an exam room and telling them a test will be graded, they can do somewhat better. Telling GPT-4 to think step by step does something similar.