LESSWRONGTags
LW

Fundamental Controllability Limits

•

Applied to What if Alignment is Not Enough? by WillPetillo 2mo ago

Remmelt v1.56.0Dec 15th 2023 (+27/-71) 1

The research field ~~AGI~~Fundamental Controllability Limits ~~of Engineerable Control & Safety Impossibility Theorems (AGILECSIT)~~ has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):

•

Applied to Help me solve this problem: The basilisk isn't real, but people are by canary_itm 5mo ago

•

Applied to Projects I would like to see (possibly at AI Safety Camp) by Linda Linsefors 7mo ago

•

Applied to Inference-Time Intervention: Eliciting Truthful Answers from a Language Model by likenneth 11mo ago

•

Applied to The Control Problem: Unsolved or Unsolvable? by Remmelt 11mo ago

•

Applied to On the possibility of impossibility of AGI Long-Term Safety by Remmelt 1y ago

•

Applied to Challenge to the notion that anything is (maybe) possible with AGI by Remmelt 1y ago

•

Applied to Presumptive Listening: sticking to familiar concepts and missing the outer reasoning paths by Remmelt 1y ago

•

Applied to How 'Human-Human' dynamics give way to 'Human-AI' and then 'AI-AI' dynamics by Remmelt 1y ago

•

Applied to List #3: Why not to assume on prior that AGI-alignment workarounds are available by Remmelt 1y ago

•

Applied to The limited upside of interpretability by Remmelt 1y ago

•

Applied to Why mechanistic interpretability does not and cannot contribute to long-term AGI safety (from messages with a friend) by Remmelt 1y ago

•

Applied to Limits to the Controllability of AGI by Roman_Yampolskiy 1y ago

Remmelt v1.55.0Nov 11th 2022 (+7/-8) 1

multiple domains of sense/~~action.~~action;
intrinsic non-reducible possibility for self-modification;.
and that/therefore; that the meta-algorithm is effectively arbitrary; hence;
that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.

Remmelt v1.54.0Nov 11th 2022 (+14/-20) 1

In theory, ~~modelling~~the control of system A over system B means that A can influence system B to achieve A’s desired subset of state space [Source: https://arxiv.org/pdf/2109.00484.pdf].
In practice, ~~engineering~~to engineer control of AGI requires simulating or detecting any unsafe effects internally, and then preventing or correcting those effects externally.