Super AGI - LessWrong

Will OpenAI also require a "Super Red Team Agent" for its "Superalignment" Project?

Yes, good context, thank you!

As human beings we will always try but won't be enough that's why open source is key.

Open source for which? Code? Training Data? Model weights? Either way, it does not seem like any of these are likely from "Open"AI.

Well, we know that red teaming is one of their priorities right now, having formed a red-teaming network already to test the current systems comprised of domain experts apart from researchers which previously they used to contact people every time they wanted to test a new model which makes me believe they are aware of the x-risks (by the way they higlighted on the blog including CBRN threats). Also, from the superalignment blog, the mandate is to:

> "to steer and control AI systems much smarter than us."

Companies should engage in Glad to see OpenAI engaged in such through their trust portal end external auditing for stuff like malicious actors.
Also, worth noting OAI hires a lot of cyber security roles like Security Engineer etc which is very pertinent for the infrastructure.

Agreed that their RTN, bugcrowd program, trust portal, etc. are all welcome additions. And, they seem sufficient while their, and other's, models are sub-AGI with limited capabilities.

But, your point about the rapidly evolving AI landscape is crucial. Will these efforts scale effectively with the size and features of future models and capabilities? Will they be able to scale to the levels needed to defend against other ASI level models?

So, either OAI will use the current Red-Teaming Network (RTN) or form a separate one dedicated to the superalignment team (not necessarily an agent).

It does seem like OpenAI acknowledges the limitations of a purely human approach to AI Alignment research, hence their "superhuman AI alignment agent" concept. But, it's interesting that they don't express the same need for a "superhuman level agent" for Red Teaming? At least for the time being.

Is it consistent, or even logical, to assume that, while human run AI Alignment Teams are insufficient to Align and ASI model, human-run "Red Teams" will be able to successfully validate that an ASI is not vulnerable to attack or compromise from a large scale AGI network or "less-aligned" ASI system? Probably not...

AI Rights: In your view, what would be required for an AGI to gain rights and protections from the various Governments of the World?

Super AGI1mo10

No thank you.

Foom seems unlikely in the current LLM training paradigm

Super AGI2mo10

Current LLMs require huge amounts of data and compute to be trained.

Well, newer/larger LLMs seem to unexpectedly gain new capabilities. So, it's possible that future LLMs (e.g., GPT-5, GPT-6, etc.) could have a vastly improved ability to understand how LLM weights map to functions and actions. Maybe the only reason why humans need to train new models "from scratch" is because Humans don't have the brainpower to understand how the weights in these LLMs work. Humans are naturally limited in their ability to conceptualize and manipulate massive multi-dimensional spaces, and maybe that's the bottleneck when it comes to interpretability?

Future LLMs could solve this problem, then be able to update their own weights or the weights of other LLMs. This ability could be used to quickly and efficiently expand training data, knowledge, understanding, and capabilities within itself or other LLM versions, and then... foom!

A model might figure out how to adjust its own weights in a targeted way. This would essentially mean that the model has solved interpretability. It seems unlikely to me that it is possible to get to this point without running a lot of compute-intensive experiments.

Yes, exactly this.

While it's true that this could require "a lot of compute-intensive experiments," that's not necessarily a barrier. OpenAI is already planning to reserve 20% of their GPUs for an LLM to do "Alignment" on other LLMs, as part of their Super Alignment project.

As part of this process, we can expect the Alignment LLM to be "running a lot of compute-intensive experiments" on another LLM. And, the Humans are not likely to have any idea what those "compute-intensive experiments" are doing? They could also be adjusting the other LLM's weights to vastly increase its training data, knowledge, intelligence, capabilities, etc. Along with the insights needed to similarly update the weights of other LLMs. Then, those gains could be fed back into the Super Alignment LLM, then back into the "Training" LLM... and back and forth, and... foom!

Super-human LLMs running RL(M)F and "alignment" on other LLMs, using only "synthetic" training data....
What could go wrong?

A thought experiment for comparing "biological" vs "digital" intelligence increase/explosion

Super AGI3mo10

I don't see any useful parallels - all the unknowns remain unknown.

Thank you for your comment! I agree with you in that in general, "all the unknowns remain unknown". And, I acknowledge the limitations of this simple thought experiment. Though, one main value here could be to help to explain the concept of deciding what to do in the face of an "intelligence explosion", with people that are not deeply engaged with AI and "digital intelligence" over all. I'll add a note about this into the "Intro" section. Thank you.

LLMs May Find It Hard to FOOM

Super AGI3mo61

so we would reasonable expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing

... so we would reasonably expect the foundation model of such a very capable LLM to also learn the superhuman ability to generate texts like these in a single pass without any editing

AI Rights: In your view, what would be required for an AGI to gain rights and protections from the various Governments of the World?

Super AGI6mo113

I would suggest that self-advocacy is the most important test. If they want rights, then it is likely unethical and potentially dangerous to deny them.

We don't know what they "want", we only know what they "say".

Would AI experts ever agree that AGI systems have attained "consciousness"?

Super AGI8mo10

Yes, agreed. Given the vast variety of intelligence, social interaction, and sensory perception among many animals (e.g. dogs, octopi, birds, mantis shrimp, elephants, whales, etc.), consciousness could be seen as a spectrum with entities possessing varying degrees of it. But, it could also be viewed as a much more multi-dimensional concept, including dimensions for self-awareness and multi-sensory perception, as well as dimensions for:

social awareness
problem-solving and adaptability
metacognition
emotional depth and variety
temporal awareness
imagination and creativity
moral and ethical reasoning

Some animals excel in certain dimensions, while others shine in entirely different areas, depending on the evolutionary advantages within their particular niches and environments.

One could also consider other dimensions of "consciousness" that AI/AGI could possess, potentially surpassing humans and other animals. For instance:

computational speed
memory capacity and recall
multitasking
rapid upgradability of perception and thought algorithms
rapid data ingestion and integration (learning)
advanced pattern recognition
universal language processing
scalability
endurance

Would AI experts ever agree that AGI systems have attained "consciousness"?

Super AGI8mo10

I tried asking a dog whether a Human is conscious and he continued to lick at my feet. He didn't mention much of anything on topic. Maybe I just picked a boring, unopinionated dog.

Yes, this is a common issue as the phrases for "human consciousness" and "lick my feet please" in dog sound very similar. Though, recent advancements in Human animal communications should soon be able to help you with this conversation?

E.g.

https://phys.org/news/2023-08-hacking-animal-communication-ai.html

https://www.scientificamerican.com/article/how-scientists-are-using-ai-to-talk-to-animals/

I asked Chatgpt-3.5 if humans are conscious and it said in part: 'Yes, humans are considered conscious beings. Consciousness is a complex and multifaceted phenomenon, and there is ongoing debate among scientists, philosophers, and scholars about its nature and the mechanisms that give rise to it. However, in general terms, consciousness refers to the state of being aware of one's thoughts, feelings, sensations, and the external world."

"Humans are considered conscious beings". Considered by whom, I wonder?

"Consciousness refers to the state of being aware of one's thoughts, feelings, sensations, and the external world."

True, though this also requires the "observer" to have the ability and intelligence to be able to recognize these traits in other entities. Which can be challenging when these entities are driven by "giant, inscrutable matrices of floating-point numbers" or other systems that are very opaque to the observer?

Have you ever considered taking the 'Turing Test' yourself?

Super AGI9mo10

Absolutely, for such tests to be effective, all participants would need to try to genuinely act as Humans. The XP system introduced by the site is a smart approach to encourage "correct" participation. However, there might be more effective incentive structures to consider?

For instance, advanced AI or AGI systems could leverage platforms like these to discern tactics and behaviors that make them more convincingly Human. If these AI or AGI entities are highly motivated to learn this information and have the funds, they could even pay Human participants to ensure honest and genuine interaction. These AI or AGI could then use this data to learn more useful and effective tactics to be able to pass as Humans (at least in certain scenarios).

Would you take a job making humanoid robots for an AGI?

Super AGI9mo1-2

I'm not likely to take a factory job per se. I have worked in robotics and robotic-adjacent software products (including cloud-side coordination of warehouse robots), and would do so again if the work seemed interesting and I liked my coworkers.

What about if/when all software based work has been mostly replaced by some AGI-like systems? E.g. As described here:

“Human workers are more valuable for their hands than their heads...”

-- https://youtu.be/_kRg-ZP1vQc?t=6469

Where your actions would be mostly directed by an AGI through a headset or AR type system? Would you take a job making robots for an AGI or other large Corporation at that point? Or, would you (attempt to) object to that type of work entirely?

I'm pretty sure that humanoid robots will never become all that common. It's really a bad design for a whole lot of things that humans currently do, and Moloch will continue to pressure all economic actors to optimize, rather than just recreating what exists. At least until there's a singular winning entity that doesn't have to compete for anything.

I would agree with your point about human-like appearance not being a necessity when we refer to "humanoid robots". Rather, a form that includes locomotion and the capability for complex manipulation, similar to Human arms and hands, would generally suffice. Humans also come with certain logistical requirements - time to sleep, food, water, certain working conditions, and so on. The elimination of these requirements would make robots a more appealing workforce for many tasks. (If not all tasks, eventually?)

Humans have an amazing generality, but a whole lot of that is that so many tasks have evolved to be done by humans. The vast majority of those will (over time) change to be done by non-humanoid robots, likely enough that there's never a need to make real humanoid robots. During the transition, it'll be far cheaper (in terms of whatever resources are scarce to the AI) to just use humans for things that are so long-tail that they haven't been converted to robot-doable.

Though, once these armed robots could easily be remotely controlled by some larger AGI type systems, then making the first generation of these new armed robots could be the last task that Humans will need to complete? As, once the first billion or so of these new armed robots are deployed, they could be used to make the next billion and so on?

As Mr. Shulman mentions in this interview, it would seem feasible for the current car industry to be converted to make ~10 billion general purpose robots within a few years or so.

“Converting the car industry to making Humanoid Robots.”
https://youtu.be/_kRg-ZP1vQc?t=6363

LESSWRONG
LW

Posts

Wiki Contributions

Comments