“Reframing Superintelligence” + LLMs + 4 years

[-]Roman Leventov2y112

I know of no persuasive argument for the superior value (or safety!) of powerful, unitary AI agents. Intellectual inertia, institutional inertia, convenient anthropomorphism (see below), and bragging rights are not good justifications for increasing existential risk.

You didn't mention the biggest reason why the discussion of unitary agents is still very much relevant: due to their economic (and, later, military, political, and even romantic) attractiveness and power, unitary agents will be created anyway (as you yourself admit), and to counteract them, some people (including myself) think there should be aligned "guardian" unitary agents around who can spin their OODA loop as quickly as potentially misaligned/rogue agents. The OODA iteration of an open agency will take more time.

We can debate this, or whether the latency of an OODA cycle is that important in offense-defense balance in an AI conflict, but I don't think these discussions are due to either "intellectual inertia, institutional inertia, convenient anthropomorphism, or bragging rights".

[-][anonymous]2y20

I think you should examine this claim in more detail because this is the crux of everything.

What you are trying to say, rephrased:

I am preparing for a war where I expect to have to attack rogue AIs.
I expect the rogues to either be operating on the land of weak countries, to be assisting allies with infections on their own territory, or to have to deal with rogues gaining control of hostile superpowers

So my choices are :

I use conventional weapons
I use AGI in very limited, controlled ways such as to exponentially manufacture more semi automated weapons with limited and safe onboard intelligence. (For example a missile driven by low level controllers is safe)
I use AGI in advisory data analysis roles and smarter weapons driven by onboard limited AI. (For example a missile or drone able to recognize targets but only after this control authority is unlocked after traveling to the target area)
I use AGI but in limited, clearly separated roles all throughout the war machine
I say yolo and assign the entire task of fighting the war to AGI to monolithic, self modifying systems even though I know this is how rogue AI was created. Even on testing the self modification makes their behavior inconsistent and they sometimes turn on their operators even in simulation.

The delta between 4 and 5 is vast. 5 is going to fail acceptance testing and is not consistent with conventional engineering practice because a self modifying system isn't static and you can't be certain the delivered product is the same as the tested one.

You would have to already be losing a war with rogue AI before someone would resort to 5.

I think part of the gap here is there's a large difference between what you might do to make an agent to help someone with their homework or for social media and what you would do when live bombs are involved. Engineering practices are very different and won't simply be thrown away simply because working AGI is new.

This is also true for construction, industry, and so on. What you need to do is very different.

[-]Trinley Goldenberg2y30

I need to build the option for #5 as a detterent. All it takes for someone else to gain a strategic advantage is for them to automate just a BIT more of their military than me via AGI, and suddenly they can disrupt my OODA loop.

Because of this, I need the capability to always automate as much or more than them, which in the limit is full automation of all systems.

[-]Roman Leventov2y10

On #5: I don't think self-modification is important here. Keeping the full operational picture in a unified context [of a DNN, let's say LLM even] and making decisions from this position is important.

Recursive self-improvement beyond something like IQ 200 level of military, strategic, and cyber security intelligence might not be useful in AI conflict because there is limited data to learn from, and even a modesty superhuman AI (such as IQ 200) may be able to build an optimal model from this data. The two remaining factors are latency of the OODA loop and coherence across space (coherent response on different fronts and in different spaces: physical and cyber) and time (coherent strategy). Both of these factors are the advantage of unitary agents, and they could both be "practically saturated" by not that far superhuman AI.

Caveat: the above is not true for psychological warfare, where the minds of people and AIs are the battlefield. Being skillful at this kind of warfare may benefit from much deeper and stronger intelligence than IQ 200, and so self-improvement during the conflict becomes relevant. But psychological warfare can only unfold on rather slow timescales so the higher latency of AI service agencies shouldn't be a handicap.

Footnote: some may think that cyber security (computer virus--antivirus arms race, for instance) also benefits from "unlimited" intelligence, e.g., an IQ 1000 AI might be able to develop viruses and cyber offense strategy more generally that an IQ 200 AI might not be able to protect from (or, even to recognise such an attack). I agree that this might be true (although I'm not sure of course, I'm not a cyber security expert, and as far as I heard even cybersec experts are not sure or disagree about this), but we can also charitably assume that the IT infrastructure will be hardened to make such attacks probably impossible (probably strong cryptography, probably strong sandboxing, etc.), and that already an IQ 200 AI (or even an "agency") could build up such defences.

[-]Ilio2y10

In my view he never said all discussions of unitary agent are useless. He said that’s almost always misleading.

As a concrete example, military officers don’t care for the smartest robotic war dog we can construct. They would rather have low-cost drone swarm for which it’s easy to scale up production.

What would be your preferred way to name the « unitary agent » failure mode? (without injecting the idea that it’s not a failure mode)

[-]Justin Bullock2y30

This discussion considers a relatively “flat”, dynamic organization of systems. The open-agency model^[13] considers flexible yet relatively stable patterns of delegation that more closely correspond to current developments.

I have a questions here that I'm curious about:

I wonder if you have any additional thoughts about the "structure" of the open agencies that you imagine here. Flexible and relatively stable patterns of delegation seem to be important dimensions. You mention here that the discussion focuses on "flat" organization of systems, but I'm wondering if we might expect more "hierarchical" relationships if we incorporate things like proposer/critic models as part of the role architecture.

[-]Review Bot2y*10

The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

[-]Christopher King2y10

A system of AI services is not equivalent to a utility maximizing agent

I think this section of the report would be stronger if you showed that CAIS or Open Agencies in particular are not equivalent to an utility maximizing agent. You're right that their are multi-agent systems (like CDTs in a prisoner's dilemma) with this property, but not every system of multiple agents is inequivalent to utility maximization.

[-]Roman Leventov2y10

The AGI-agent model, offers no compelling value compared to the CAIS model of general intelligence. The AGI-agent and CAIS models organize similar functions differently, but the CAIS model offers additional safety-relevant affordances.

In the current economy that incentivises rent-seeking, there is enormous value in an AGI agent which can be tasked with earning money in completely open-ended way, without or with very minimal supervision, on behalf of its "master". CAIS provide all the same functions for the actual productive economy, but not for an individual who wants more dollars in their bank account.

^{^}

Drexler, KE: “Reframing Superintelligence: Comprehensive AI Services as General Intelligence” Technical Report #2019-1, Future of Humanity Institute (2019).

^{^}

First-draft summaries were graciously contributed by ChatGPT-4. (The GPT-4 base model offered a few refinements while it was in a lucid and cooperative mood.)

^{^}

Please keep in mind that “Reframing Superintelligence” was written at FHI in an office next door to Nick Bostrom’s (Superintelligence: Paths, Dangers, Strategies, Oxford University Press. (2014)).

^{^}

“CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data” (2019) https://arxiv.org/abs/1911.00359

“Data Selection for Language Models via Importance Resampling” (2023) https://arxiv.org/abs/2302.03169

^{^}

“Training language models to follow instructions with human feedback” (2022) https://arxiv.org/abs/2203.02155

^{^}

“Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision” (2023) https://arxiv.org/abs/2305.03047

^{^}

“Dialog Inpainting: Turning Documents into Dialogs” (2022) https://arxiv.org/abs/2205.09073

^{^}

“Unnatural instructions: Tuning language models with (almost) no human labor” (2022) https://arxiv.org/abs/2212.09689

^{^}

“Dense Paraphrasing for Textual Enrichment” (2022) https://arxiv.org/abs/2210.11563

^{^}

“Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes” (2023) https://arxiv.org/abs/2305.02301

^{^}

“Orca: Progressive Learning from Complex Explanation Traces of GPT-4” (2023) https://arxiv.org/abs/2306.02707

“Textbooks Are All You Need” (2023) https://arxiv.org/abs/2306.11644

^{^}

“Distilling step-by-step outperforms LLMs by using much smaller task-specific models”

“Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes” (2023) https://arxiv.org/abs/2305.02301

^{^}

Drexler, KE: “The Open Agency Model”, AI Alignment Forum (February 2023)

^{^}

The GPT-4 base model is artificial and demonstrates intelligence, but it is not “an AI” in the sense of being an intelligent entity. In my experience, it is more likely to model the content of an internet message board than the behavior of a person. Unlike ChatGPT-4, the base model has no preferred or stable persona.

^{^}

“Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision” (2023) https://arxiv.org/abs/2305.03047

^{^}

“Sparks of Artificial General Intelligence: Early experiments with GPT-4” (2023) https://arxiv.org/abs/2303.12712

^{^}

“Modular Deep Learning” (2023) https://arxiv.org/abs/2302.11529

^{^}

“FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance” (2023) https://arxiv.org/abs/2305.05176

^{^}

Janus, “Simulators”, AI Alignment Forum (September 2022)

^{^}

Eliezer Yudkowsky rejects this.

^{^}

Nay, JJ: “AGI misalignment x-risk may be lower due to an overlooked goal specification technology”, (October 2022)

^{^}

Drexler, KE: “Role Architectures: Applying LLMs to consequential tasks”, AI Alignment Forum (March 2023)

^{^}

The proposed methodology bundles fuzzy comparisons into a single parameter and invites alternative estimates. The conclusion nonetheless seems robust.

118

“Reframing Superintelligence” + LLMs + 4 years

118

Ω 37

118

Ω 37

Background

Abstract of the Abstract

Section mini-summaries + updates

1. R&D automation provides the most direct path to an intelligence explosion

2. Standard definitions of “superintelligence” conflate learning with competence

3. To understand AI prospects, focus on services, not implementations

4. The AI-services model includes both descriptive and prescriptive aspects

5. Rational-agent models place intelligence in an implicitly anthropomorphic frame

6. A system of AI services is not equivalent to a utility maximizing agent

7. Training [reinforcement-learning] agents in human-like environments can provide useful, bounded services

8. Strong optimization can strongly constrain AI capabilities, behavior, and effects

9. Opaque algorithms are compatible with functional transparency and control

10. R&D automation dissociates recursive improvement from AI agency

11. Potential AGI-enabling technologies also enable comprehensive AI services

12. AGI agents offer no compelling value

13. AGI-agent models entail greater complexity than AI Services

14. The AI-services model brings ample risks

15 Development-oriented models align with deeply-structured AI systems

16. Aggregated experience and centralized learning support AI-agent applications

17. End-to-end reinforcement learning is compatible with the AI-services model

18. Reinforcement learning systems are not equivalent to reward-seeking agents

19. The orthogonality thesis undercuts the generality of instrumental convergence

20. Collusion among superintelligent oracles can readily be avoided

21. Broad world knowledge can support safe task performance

22. Machine learning can develop predictive models of human approval

23. AI development systems can support effective human guidance

24. Human oversight need not impede fast, recursive AI technology improvement

25. Optimized advice need not be optimized to induce its acceptance

26–37. Omitted sections

38. Broadly-capable systems coordinate narrower systems

39. Tiling task-space with AI services can provide general AI capabilities

40. Could 1 PFLOP/s systems exceed the basic functional capacity of the human brain?

Some expectations