You are viewing version 1.44.0 of this page. Click here to view the latest version.

Limits to Control

Edited by Remmelt last updated 2nd Jul 2025

You are viewing revision 1.44.0, last edited by Remmelt

The research field AGI Limits of Engineerable Control & Safety Impossibility Theorems (AGILECSIT) has the purpose of verifying (both the empirical soundness of premises and formal validity of reasoning of):

Theoretical limits of engineerable control of AGI.

Threat models of non-controllable convergent dynamics of AGI.

Impossibility theorems, by contradiction of 'long-term AGI safety' proposition with results 1. & 2.

~ ~ ~

Definitions and Distinctions

'Non-controllable convergent dynamics of AGI':

Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.

'Control:'

In theory, modelling control of system A over system B means that A can influence system B to achieve A’s desired subset of state space (Source: https://arxiv.org/pdf/2109.00484.pdf).

In practice, engineering control of AGI requires simulating or detecting any unsafe effects internally, and then preventing or correcting those effects externally.

'Long term':

In theory: into perpetuity.

In practice: over a thousand years.

'AGI safety':

Ambient conditions/contexts around planet Earth that are caused by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition).

'AGI':

That the notion of 'artificial intelligence' (AI) can be either "narrow" or "general":

That the notion of 'narrow AI' specifically implies:

a single domain of sense and action.

no possibility for self base-code modification.

a single well-defined meta-algorithm.

that all aspects of its own self agency/intention are fully defined by its builders/developers/creators.

That the notion of 'general AI' specifically implies:

multiple domains of sense/action.

intrinsic non-reducible possibility for self-modification;.

and that/therefore; that the meta-algorithm is effectively arbitrary; hence;.

that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.

(Source: https://mflb.com/ai_alignment_1/si_safety_qanda_out.html#p3)

Posts tagged Limits to Control

5

1y

0

5

13Deconfusing ‘AI’ and ‘evolution’

Remmelt