Do any of the AI Risk evaluations focus on humans as the risk?

Nov 30, 2022

Around here, humans using AI to do bad things is referred to as "misuse risks", whereas "misaligned AI" is used exclusively to refer to the AI being the primary agent. There are many thought experiments where the AI convinces humans to do things which result in bad outcomes. "Execute this plan for me, human, but don't look at the details too hard please." This is still considered a case of misaligned AI.

If you break it down analytically, there needs to be two elements for bad things to happen: the will to do so and the power to do so. As Daniel notes, some humans have already had the power to do so for many decades, but fortunately none have had the will. AI is expected to be extremely powerful too, and AI will have its own will (including a will to power), so both misaligned AI and misuse risks are things to take seriously.

[-]jmh3y20

Thanks for noting the terminology, useful to have in mind.

I have a follow on comment and question in my response to Daniel that I would be interested in your response/reaction.

Daniel Kokotajlo

Nov 30, 2022

Is it possible that the AI risk from the emergence of a very powerful AI is not as likely since before that occurs some human with a less powerful AI ends the world first, or at least destroys modern human civilization and we're back to the stone age hunter gathering world before the AI gets powerful enough do do that for/to us?

It's definitely a possibility I and other people have thought about. My view is that takeoff will be fast enough that this outcome is unlikely; most humans don't want to destroy civilization and so before one of the exceptions gets their hands on AI powerful enough to destroy civilization when used deliberately for that purpose by humans, someone else will have their hands on AI that is even more powerful, powerful enough to destroy civilization 'on its own.'

Consider: Nukes and bio weapons can destroy the world already, but for decades the world has persisted, because none of the hundred or so actors capable of destroying the world have wanted to do so. Really I'm not relying on a fast takeoff assumption here, more like a not-multi-decade-long takeoff assumption.

[-]jmh3y40

Thanks. I was somewhat expecting the observation that humans do have the ability to pretty much end things now, and have for some time, but as yet have not done so. I do agree. I also agree that in general we have put in place preventative measures to be sure those that might or are willing to end the world don't have access or absolute ability to do so.

I think that intent might not be the only source, error and unintended consequences from using AI tools seem like they are part of the human risk profile. However, that seem so obvious I would think you hav... (read more)

2Daniel Kokotajlo3y

It would help if you gave examples of scenarios in which the world is destroyed by accidental use of AI tools (as opposed to AI agents). I haven't been able to think of plausible ones so far, but I haven't thought THAT much about it & so wouldn't be surprised if I've missed some. In case you are interested, here's some thinking I did two years ago on a related topic.

4jmh3y

I've done a bit of back and forth in my mind on examples and find the biggest challenge that of a plausible one, rather than mealy imaginable/possible. I think the best way to frame the concern (not quite an example but close) would be gain of function type research. Currently I think the vast majority of that work is conducted in expensive labs (probably BL3 or 4 but might be wrong on that) by people with a lot of time, effort and money invested in their educations. It's a complicated enough area that lacking the education makes even knowing how to start a challenge. However, I don't think the basic work requires all that much in the way of specialized lab equipment. Most of the equipment is probably more about result/finding productivity than about the actual attempted modification. On top of that we also have some limitation, legal/regulatory, on access to some materials. But I think that is more about specific items, e.g. anthrax, and not really a barrier to conducting gain of function type research. Everyone has access to lots of bacteria and viruses but most lack knowledge of isolation and identification techniques. Smart tools which embody the knowledge and experience, as well as include a good ML functions really would open the door to home hobbyists that got interested in just playing around with some "harmless" gain of function or other genetic engineering. But if they don't understand how to properly contain their experiments, or don't understand that robust testing is not just testing for a successful (however that might be defined) result but also testing for harmful outcomes, then risks have definitely increased if we do actually see an increase in such activity by lay people. I'm coming to the conclusion, though, that perhaps the way to address these type of risk are really outside the AI alignment focus as a fair amount of the mitigation is probably how we apply existing controls to evolution in smart tool use. Just as now, some things some th

3Daniel Kokotajlo3y

Yeah, I agree that one is fairly plausible. But still I'd put it as less likely than "classic" AGI takeover risk, because classic AGI takeover risk is so large and so soon. I think if I had 20-year timelines then I'd be much more concerned about gain-of-function type stuff than I currently am.

4jmh3y

The linked tool looks interesting; thanks for sharing! I have not done more than skim through the list of configuration options so don't have any good feedback for you (though don't guarantee I could offer good feedback after any complete review and testing ;-) ). A couple of the options do seem to touch on my question here I think. The one's related to medical and biotech. I think you're approach is successful efforts in those areas that change the future state of a realized AIG. I think my question would best be viewed an intersection of developing AI/ML work and work in those areas. I was trying to also provide an example as well but decided I should not just try to give an off the cuff type example so want to write something and then reread and probably rewrite. That's probably setting expectations way too high but I did want to make sure I can clearly describe a scenario rather than just dump some stream of consciousness blob on you. Still, I did want to thank you for the link and response.

LESSWRONG
LW

LESSWRONG
LW

10

[ Question ]

Do any of the AI Risk evaluations focus on humans as the risk?

10

10

2 Answers sorted by
top scoring

Nov 30, 2022

Nov 30, 2022

10

[ Question ]

Do any of the AI Risk evaluations focus on humans as the risk?

10

10

2 Answers sorted by top scoring

Nov 30, 2022

Nov 30, 2022

2 Answers sorted by
top scoring