For a defense of people pursuing a mathematical approach of a type you think isn't valuable, see my recent post.
(That does not address the correct issue you raised about requisite variety, but some work on HRAD does do so explicitly - such as embedded agency.)

Reply

[-]avturchin3y20

Could it end in a situation when we have 2 different friendly AIs with completely different understanding of the nature of human values? Each will perceive the other as enemy and there will be war.

Reply

[-]Roman Leventov3y10

But also my suggestion in the post that AGI labs should diversify their alignment approaches assumed that labs exchange their matured frameworks for alignment (or in fact make them public) so that each lab can apply multiple alignment theories/frameworks while designing and training their AI simultaneously. This way, each AI could be aligned to a higher degree with people than if only a single theory was applied.

Reply

[-]Roman Leventov3y10

What do you mean by "understanding of the nature of human values"?

If both aligned AIs are properly reflective and understand science properly, they understand their respective toolboxes of modelling human values (or even values of arbitrary black-box intelligent systems), are what they are: just toolboxes and models without special metaphysical status.

They may discuss their respective models of values, but there is no reason to be "in war" because both models are presumably well-aligned with humans and their predictions coincide in a large proportion of cases and diverge only in very obscure cases (like the trolley problem or other infamous thought experiments in ethics specifically designed to test the edges of axiological and ethical models) or when the models are "rolled out" very far into the future. For the latter case, as I gestured to in the post as well, I think the "alignment" frame is actually not useful and we should rather think in terms of control theory, game theory, theory of evolution, etc. Friendly AIs should understand this, and actually not even try to simulate a very far future using their value models of people. (And yes, this is the reason why I think the concept of coherent extrapolation volition actually doesn't make sense.)

Maybe an interesting thing to note here is that if both AIs were aligned to humans independently, let's say to cover 98% of human value complexity, but with different methods, their default mutual alignment on the first encounter (if you don't permit any online re-alignment, such as possible even with LLMs during prompting, though to a limited extent) is expected to be lower, let's say only 97%. But I don't see why this should be a problem.

Reply

[-]avturchin3y20

I meant that a situation is possible when two AIs use completely different alignment methods and also come to different results.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

23

For alignment, we should simultaneously use multiple theories of cognition and value

23

Ω 9

23

Ω 9

Computationally tractable mathematical models of alignment are bound to be biased and blind to certain aspects of human values

From “solving the alignment problem” to engineering the alignment process

Diversity of approaches in the industry

Beyond “alignment” theories

Creating as many new conceptual approaches to alignment as possible? No

Appendix. Does AGI need to be a complex system as well?