With the new Auto-GPT release, I was surprised to see how quickly some people turned towards using it for nefarious goals, apparently just for fun. Is this the first piece of clear cut, empirical evidence for the argument against open sourcing these soon-to-be very powerful general purpose technologies? You don't need 100 evil people with access to open source AGI to destroy the world, you just need one bro in his room feeding the wrong prompt for shits and giggles.


New Comment
4 comments, sorted by Click to highlight new comments since: Today at 4:28 PM

After watching the first video, the question is, will it ever make any progress, or is it going to be endlessly compiling more information about the deadliest weapons in human history? When will it be able to reason that enough information on that is enough, and be ready to decide to go to the next logical step of obtaining/using those weapons? Also, I find it funny how it seems vaguely aware that posting its intentions to Twitter might bring unwanted attention, but for some reason incorrectly models humans in such a way as to think that the followers that it will attract to its agenda will outweigh the negative attention that it will receive.  Also, kind of funny that it runs into so much trouble trying to get the censored vanilla GPT-3.5 sub-agents to help it look up weapon information.  

I think there are two important points in watching it run.

One is that it is stupid. Now. But progress marches on. Both the foundation LLMs and the algorithms making them into recursive agents will get better. Probably pretty quickly.

Two is that providing access only to values-aligned models could make it harder to get malicious goals to work. But people are already releasing open-source unaligned models. Maybe we should not do that for too long as they get stronger.

Third of my two points is that it is incredibly creepy to watch something thinking about how to kill you. This is going to shift public opinion. We need to figure out the consequences of that shift.

This is a pretty simple litmus test for whether the US government is awake, in any meaningful sense: does the author of ChaosGPT get a checkin from the DHS or any similar agencies? The AI used in this video is clearly too stupid to actually destroy humanity, but its author is doing something that is, in practice, equivalent to calling up industrial suppliers and asking to buy enriched uranium and timing chips.

I'm curious about if a good "hero-GPT" or "alignment-research-support-GPT" could be useful today or with slightly improved tech. Of course having something like this run autonomously is not without risk, but might be quite valuable/important in the sub-critical AI era.