I'm worried about x-risk from AI. But I'm not especially worried about Sora. Or even Sora_v10.

I'm worried about GPT-6 being agentic. I'm worried that GPT-6 will be able to act as a personal agent that can follow through on tasks such as; "Get Yanni a TV for less than $700, delivered by next Thursday, and organise someone to mount it on my wall."

Assume GPT-6 scales from GPT-4 like GPT-2 did to GPT-4. Throw in whatever Auto-GPT is running on, do we get the scenario I'm worried about?

(I am not technical, so non-technical answers are appreciated!)
 

Thanks!

Yanni

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Vladimir_Nesov

Feb 23, 2024

50

It's not entirely clear how and why GPT-4 (possibly a 2e25 FLOPs model) or Gemini Ultra 1.0 (possibly a 1e26 FLOPs model) don't work as autonomous agents, but it seems that they can't. So it's not clear that the next generation of LLMs built in a similar way will enable significant agency either. There are millions of AI GPUs currently being produced each year, and millions of GPUs can only support a 1e28-1e30 FLOPs training run (that doesn't individually take years to complete). There's (barely) enough text data for that.

GPT-2 would take about 1e20 FLOPs to train with modern methods, on the FLOPs log scale it's already further away from GPT-4 than GPT-4 is from whatever is feasible to build in the near future without significant breakthroughs. So there are only about two more generations of LLMs in the near future if most of what changes is scale. It's not clear that this is enough, and it's not clear that this is not enough.

With Sora, the underlying capability is not just video generation, it's also video perception, looking at the world instead of dreaming of it. A sufficiently capable video model might be able to act in the world by looking at it in the same way a chatbot acts in a conversation by reading it. Models that can understand images are already giving new ways of specifying tasks and offering feedback on performance in robotics, and models that can understand video will only do this better.

Feb 23, 2024

20

"agentic" is "give the model a goal and access to tools and it emits outputs intended to accomplish the goal.

Example: https://chat.openai.com/share/0f396757-0b81-4ace-af8c-a2cb37e0985d

This barely worked with gpt-3.5, works sometimes with gpt-4, and is supposedly better with Gemini 1.5 1M. It also requires tools, for example code interpreters, web search APIs, and image generators.

Why can this fail in a dangerous way?

  1. Very large scope goals, and a model that can run for a long time working on them. The classic one is "make as many paperclips as possible ".

Notice this hypothetical AI doesn't make a short list of paperclip factory equipment. Another session orders. In arrival another agent plans the assembly... Instead, it is an eternally running process, and it is fully in charge of all steps, and is allowed to do absolutely anything including order arbitrary commands to kill humans or order self improvement.

  1. State buildup. Almost all computer system failures are from accumulation of state. Because fine tuning is off for publicly available models, the most state that can build up is the size of the context buffer. This makes Gemini 1.5 1M more dangerous than prior models, but it's likely still far too weak to do damage.

  2. High competence. Current models fail often, and especially fail at skilled tasks that would be a problem, such as hacking. With short context buffers they soon forget every mitigation tried so far and are eternally stuck.

  3. Missing modalities. Current models don't have a memory component, online learning, robotic proprioception tokens as input, realtime video as input, or or robotic tokens as output. (List used to be much longer!)

  4. A series of multiple failures and extremely capable models that lead to the model developing its own self interested goals (bad) or successfully hiding its own goals (more bad) or collaborating secretly with many other models (catastrophic).

  5. A place to exist outside of human control (or secretly hiding below human detection) must exist in the world. Right now you need over 100 GPUs or a lot of TPUs to host a model. Some day a single card in a computer might be able to host a full AGI. Once compute is very dense and cheap and idle you have a "compute overhang". It might take years before that happens, 20+.

Summary : current models are already agents. But they aren't currently broken in the above ways.

You shouldn't worry yet, the models need to be far more capable.

Even then there are key mistakes, such as assigning too much responsibility to a single context, no short duration termination condition, and allowing unnecessary communication between running models before bad outcomes become possible.

A common argument is that doing the above is very convenient and efficient. Hopefully we get bad outcomes early when models are too weak to do any real damage.

You shouldn't worry yet, the models need to be far more capable.

The right time to start worrying is too early, otherwise it will be too late.

(I agree in the sense that current models very likely can't be made existentially dangerous, and in that sense "worrying" is incorrect, but the proper use of worrying is planning for the uncertain future, a different sense of "worrying".)