Open-source LLMs may prove Bostrom's vulnerable world hypothesis

Roope Ahvenharju

1 Open-source LLMs may prove Bostrom's vulnerable world hypothesis

15th Apr 2023

2 min read

1

In short, Nick Bostrom's vulnerable world hypothesis states that humanity may in the future invent a technology that is potentially very destructive and very cheap to make and easily implementable. As a thought experiment, he uses something called "easier nukes" that would not have to be made from plutonium as in real life but could instead be made from battery and two pieces of glass. Bostrom says that if everyone could make nuclear weapons in their own home, civilization would be destroyed by default because terrorists, malcontent people and "folk who just want to see what would happen" would blow up most cities.

I can see similarities between Bostrom's hypothesis, and the way powerful LLM-models have recently been open sourced. It did not take long after the publication of ChatGPT and GPT-4 for several "open-source alternatives" to appear on the internet. Some of those you can even download to your own computer for offline use. And then we did have ChaosGPT ordering such a model to "destroy humanity". In my mind, he was one of the "folks who just wanted to see what would happen".

Recently I have been thinking that, in the long term, the biggest challenge in AI safety is the potential wide availability of the future AGI-systems. If we can create a safely aligned AGI, what would prevent some people from creating open-source alternative AGIs that are free from safety constraints and more powerful because of that? Advanced technology can't generally stay secret forever. And then we would have many future ChaosGPTs "who just want to see what would happen" and who could tell such a system to destroy humanity.

Old analogy used in AI safety communities about making a wish to genie and "getting what you asked but not what you wanted" would no longer be relevant in this scenario. Instead, the potential future ChaosGPTs would get exactly what they asked and perhaps even what they truly wanted.

Does the community here also think that this is a reasonable concern? I would like to know that in the comments and maybe start a discussion about the future priorities in AI safety. Because if, in the far future, practically everyone could be able to use open-sourced AGI-system and order it to destroy humanity, it would probably not be possible to completely prevent such malicious applications. Instead, the focus should be about increasing the societal resilience against deliberate AI attacks and takeover attempts. Perhaps that could be attempted through increased cybersecurity and trying to find ways to counter possible destructive technologies employed by AGI such as nanotechnology and genetically engineered pandemics. Aligned AGI, which hopefully would be created before the unaligned ones, would certainly help.