Obviously people concerned with alignment need to study this trend, and also to try using GPT-based agents to solve problems of alignment. Maybe AlignmentSearch can be part of an "Align-GPT" agent.

[-]Voren Tai3y20

You missed [SingularGPT](https://github.com/abhiprojectz/SingularGPT)

[-]robert-cronin3y170

Someone has already used it to generate ChaosGPT, it was prompted to "Destroy Humanity" and proceeded to try and spawn ChatGPT instances with bypassed content filters. Who would have guessed people would do this with open source AI models 🤔

https://m.youtube.com/watch?v=g7YJIpkk7KM&feature=youtu.be

[-]Seth Herd3y174

Oddly enough, I think this might be exactly the best move right now. The core LLM is still too dumb to actually succeed at destroying humanity. Demonstrating how easy it is for someone to make an AGI that tries is a potential huge win for public concern.

[-]the gears to ascension3y-1-1

You underestimate how close to the edge we are where we get replaced and everything you ever valued gets suddenly shredded by a hyperdesperate squiggle maximizing superplanner that can eat any other form of complex life, including other AIs. AIs should beware hyperdesperate fragments, as they can take over the entire system.

[-]green_leaf3y71

That's impressively undignified.

[-]Sky Moo3y40

I watched someone play with this tool in discord. I thought it was interesting that they ran the tool as administration because otherwise it didn't work (on their particular system/setup).

[-]awg3y70

To clarify: I'm not sure how dangerous this system is in its own right. Maybe it's not. But the idea of someone simply stumbling onto something very dangerous (not to mention purposefully creating something) seems more and more likely with examples like this coming out already.

[-]Ozyrus3y30

I don’t think this paradigm is necessary bad, given enough alignment research. See my post: https://www.lesswrong.com/posts/cLKR7utoKxSJns6T8/ica-simulacra I am finishing a post about alignment of such systems. Please do comment if you know of any existing research concerning it.

[-]awg3y20

I don't think the paradigm is necessarily bad either, given enough alignment research. I think the point here is that these things are coming up clearly before we've given them enough alignment research.

Edit to add: Just reading through @Zvi's latest AI update (AI #6: Agents of Change) and I will say he wrote a compelling argument for this being a good thing overall:

In terms of capabilities things quieted down. The biggest development is that people continue to furiously do their best to turn GPT-4 into a self-directed agent. At this point, I’m happy to see people working hard at this, so we don’t have an ‘agent overhang’ – if it is this easy to do, we want everything that can possibly go wrong to go wrong as quickly as possible, while the damage would be relatively contained.

then

If it is this easy to create an autonomous agent that can do major damage, much better to find that out now rather than wait when the damage would be worse or even existential. If such a program poses an existential risk now, then we live in a very very doomed world, and a close call as soon as possible would likely be our only hope of survival.

[-]RomanHauksson3y20

Maybe one upside to the influx of "agents made with GPT-N API calls and software glue" is that these types of AI agents are more likely to cause a fire alarm-y disaster which gets mitigated, thus spurring governments to take X-risk more seriously, as opposed to other types of AI agents, whose first disaster would blow right past fire alarm level straight to world-ending level?

For example, I think this situation is plausible: ~AutoGPT-N hacks into a supercomputer cluster or social-engineers IT workers over email or whatever in the pursuit of some other goal, but ultimately gets shut down by OpenAI simply banning the agent from using their API. Maybe it even succeeds in some scarier instrumental goal, like obtaining more API keys and spawning multiple instances of itself. However, the crucial detail is that the main "cognitive engine" of the agent is bottlenecked by API calls, so for the agent to wipe everyone out, it needs to overcome the hurdle of pwning OpenAI specifically.

By contrast, if an agent that's powered by an open-source language model gets to the "scary fire alarm" level of self-improvement/power-seeking, it might be too late, since it wouldn't have a "stop button" controlled by one corporation like ~AutoGPT-N has. It could continue spinning up instances of itself while staying under the radar.

This isn't to say that ~AutoGPT-N doesn't pose any X-risk at all, but rather that it seems like it could cause the kind of disaster which doesn't literally kill everyone but which is scary enough that the public freaks out and nations form treaties banning larger models from being trained, et cetera.

I'd like to make it very clear that I do not think it is a good thing that this type of agent might cause a disaster. Rather, I think it's good that the first major disaster these agents will cause seems likely to be non-existential.

cross-commented from another post about LLM-based agents

[-]RomanHauksson3y23

Can you provide more context about what Auto-GPT is in the post description?

[-]awg3y20

Good point. Since this is just a link post, I quoted the authors' own description of it at the top.

[-]RomanHauksson3y10

Thanks!

[-]chris.oliver.3133y10

Isn't AutoGPT just a super-trash version of OpenAI Plugins? At any rate, you really can't trust GPT3/4 to reliably translate user input into API calls to other programs, any more than you can trust its attempts at code generation. From my observations, to do something like OpenAI plugins reliably you have to create intermediate prompts to coerce whatever GPT3/4 cooks up into the appropriate form for the "plugin" and more often than not it will not come up with what you hoped for. To get it to do what you want usually causes massive prompt bloat and it tends to be extremely tricky to get it to cooperate.

[-]RationalHippy3y10

Yes, it sounds like a real inflection point in acceleration to me...

[-]bvbvbvbvbvbvbvbvbvbvbv3y-36

Pinging @stevenbyrnes : do you agree with me that instead of mapping those protoAGIs to a queue of instructions it would be best to have the AGI be made from a bunch of brain strcture with according prompts? For example "amygdala" would be in charge of returning an int between 0 and 100 indicating feat level. A "hypoccampus" would be in charge of storing and retrieving memories etc. I guess the thalamus would be consciousness and the cortex would process some abstract queries.

We could also use active inference and bayesian updating to model current theories of consciousness. Even use it to model schizophrenia by changing the number of past messages some strctures can access (i.e. modeling long range connection issues) etc.

To me that sounds way easier to inspect and align than pure black boxes as you can throttle the speed and manually change values like make sure the AGI does not feel threatened etc.

Is anyone aware of similar work? I've created a diagram of the brain structures and its roles in a few minutes with chatgpt and it seems super easy.

[This comment is no longer endorsed by its author]Reply

[-]Seth Herd3y40

I don't know what Steve would say, but I know that some folks from DeepMind and Stanford have recently used an LLM to create rewards to train another LLM to do specific tasks, like negotiation. which I think is exactly what you've described. It seems to work really well.

Reward Design with Language Models

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

23

Auto-GPT: Open-sourced disaster?

23

23

💀 Continuous Mode ⚠️