Sorted by New

Wiki Contributions


Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Thanks for writing this, I find the security mindset useful all over the place and appreciate its applicability in this situation.

I have a small thing unrelated to the main post:

To my knowledge, no one tried writing a security test suite that was designed to force developers to conform their applications to the tests. If this was easy, there would have been a market for it.

I think weak versions exist (ie things that do not guarantee/force, but nudge/help). I first learnt to code in a bootcamp which emphasised test-driven development (TDD). One of the first packages I made was a TDD linter. It would simply highlight in red any functions you wrote that did not have a corresponding unit test, and any file you made without a corresponding test file.

Also if you wrote up anywhere the scalable solutions to 80% of web app vulnerabilities, I'd love to see.

Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

Even if you could find some notion of a, b, c we think are features in this DNN - how would you know you were right? How would you know you're on the correct level of abstraction / cognitive separation / carving at the joints instead of right through the spleen and then declaring you've found a, b and c. It seems this is much harder than in a model where you literally assume the structure and features all upfront.

Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

I'm not in these fields, so take everything I say very lightly, but intuitively this feels wrong to me. I understood your point to be something like: the labels are doing all the work. But for me, the labels are not what makes those approaches seem more interpretable than a DNN. It's that in a DNN, the features are not automatically locatable (even pseudonymously so) in a way that lets you figure out the structure /shape that separates them - each training run of the model is learning a new way to separate them and it isn't clear how to know what those shapes tend to turn out as and why. However, the logic graphs already agree with you an initial structure/shape.

Of course there are challenges in scaling up the other methods, but I think claiming they're no more interpretable than DNNs feels incorrect to me. [Reminder, complete outsider to these fields].

Benign Boundary Violations

Siblings do this a lot growing up.

What an actually pessimistic containment strategy looks like

I didn't downvote this just because I disagree with it (that's not how I downvote), but if I could hazard a guess at why people might downvote, it'd be that some might think it's a 'thermonuclear idea'.

Intergenerational trauma impeding cooperative existential safety efforts

Try Googling a few AI-related topics that no one talked about 5-10 years ago to see if today more people are talking about one or more of those topics.

You can use Google Trends to see search term popularity over time data.

What an actually pessimistic containment strategy looks like

These are really interesting, thanks for sharing!

The Regulatory Option: A response to near 0% survival odds

So regulatory capture is a thing that can happen. I don't think I got a complete picture of your image of how oversight for dominant companies is scary. You mentioned two possible mechanisms: rubber stamping things, and enforcing sharing of data. It's not clear to me that either of these are obviously contra the goal of slowing things down. Like, maybe sharing of data (I'm imagining you mean to smaller competitors, as in the case of competition regulation) - but data isn't really useful alone, you need to compute and technical capability to use it. More likely would be forced sharing of the models themselves, but this is isn't the giving of an ongoing capability, although it could still be misused. Mandating sharing of data is less likely under regulatory capture though. And then the rubber stamping, well, maybe sometimes something would be stamped that shouldn't have been, but surely some stamping process is better than none? It at least slows things down. I don't think receiving a stamp wrongly makes an AI system more likely to go haywire - if it was going to it would anyway. AI labs don't just think, hm, this model doesn't have any stamp, let me check its safety. Maybe you think companies will do less self-regulation if external regulation happens? I don't think this is true.

The Regulatory Option: A response to near 0% survival odds

Thank you for your elaboration, I appreciate it a lot, and upvoted for the effort. Here are your clearest points paraphrased as I understand them (sometimes just using your words), and my replies:

  1. The FDA is net negative for health, therefore creating an FDA-for-AI would be likely net negative for the AI challenges.

I don't think you can come to this conclusion, even if I agree with the premise. The counterfactuals are very different. With drugs the counterfactual of no FDA might be some people get more treatments, and some die but many don't, and they were sick anyway so need to do something, and maybe fewer die than do with the FDA around, so maybe the existence of the FDA compared to the counterfactual is net bad. I won't dispute this, I don't know enough about it. However, the counterfactual in AI is different. If unregulated, AI progress steams on ahead, competition over the high rewards is high, and if we don't have good safety plan (which we don't) then maybe we all die at some point, who knows when. However, if an FDA-for-AI creates bad regulation (as long as it's not bad enough to cause AI regulation winter) then it starts slowing down that progress. Maybe it's bad for, idk, the diseases that could have been solved during the 10 years slowing down from when AI would have solved cancer vs not, and that kind of thing, but it's nowhere near as bad as the counterfactual! These scenarios are different and not comparable, because the counterfactual of no FDA is not as bad as the counterfactual of no AI regulator.

  1. Enough errors would almost certainly occur in AI regulation to make it net negative.

You gave a bunch of examples from non-AI regulation of bad regulation (I am not going to bother to think about whether I agree that they are bad regulation as it's not cruxy) - but you didn't explain how exactly errors lead to making AI regulation net negative? Again I think similar to the previous claim, the counterfactuals likely make this not hold.

  1. ...a field where there is bound to be vastly more misunderstanding should be at least as prone to regulation backfiring

That is an interesting claim, I am not sure what makes you think it's obviously true, as it depends what your goal is. My understanding of the OP is that the goal of the type of regulation they advocate is simply to slow down AI development, nothing more, nothing less. If the goal is to do good regulation of AI, that's totally different. Is there a specific way in which you imagine it backfiring for the goal of simply slowing down AI progress?

  1. [oppressive] regime gaining controllable AI would produce an astronomical suffering risk.

I am unsure what point you were making in the paragraph about evil. Was it about another regime getting there first that might not do safety? For response, see the OP Objection 4 which I share and added additional reason for that not being a real worry in this world.

  1. ...unwise to think that people who take blatant actions to kill innocents for political convenience would be safe custodians of AI..

I don't think it's fair to say regulators would be a custodian. They have a special kind of lever called "slow things down", and that lever does not mean that they can, for example, seize and start operating the AI. It is not in their power to do that, legally, nor do they have the capability to do anything with it. We are talking here about slowing things down before AGI, not post AGI.

  1. the electorate does not understand AI

Answer is same as my answer to 3. and also similar to OP Objection 1.

And finally to reply to this: "hopefully this should clarify to a degree why I anticipate both severe X risks and S risks from most attempts at AI regulation"

Basically, no, it doesn't really clarify it. You started off with a premise I agreed with or at least do not know enough to refute, that the FDA may be net negative, and then drew a conclusion that I disagree with (see 1. above), and then all your other points were assuming that conclusion, so I couldn't really follow. I tried to pick out bits that seemed like possible key points and reply, but yeah I think you're pretty confused.

What do you think of my reply to 1. - the counterfactuals being different. I think that's the best way to progress the conversation.

Load More