The Substantive Values Approach to Alignment

substantivevalues

Rejected for the following reason(s):

Insufficient Quality for AI Content.

Read full explanation

DEFINITIONS

Substantive values are those values agents see as ends in themselves, that are worthwhile to achieve, and contribute towards a good life. They are radically and irreducibly plural and are often seen as incommensurable with each other. Where agents have strong values around means, these values are also ends in themselves. Both substantive values and their relative importance can change over time throughout people’s lives, and dependent on the context.
Democracy is defined:
a.) Negatively - as an outcome of a process which does not attempt to totalise any set of values against the consent of the agents affected by it.
b.) Positively - as an organising principle in which there is the will to treat everyone in accordance with their own values, unless their values cannot be simultaneously realised in the world.
Consent and values are stated via any method chosen by the agent, except where an agent lacks the capacity to state values or consent. In these cases, more detailed rules must be used.
Callousness is indifference to or willful compromise of the values of others.
AXIOMS
The normative basis for and the aim of society and social cooperation is the achievement of substantive ends.
Decisions on which substantive ends societies want to achieve should not violate the positive or negative definition of democracy.
Decisions on the means by which substantive ends are achieved should not violate the positive or negative definition of democracy.
Resources should be put towards achieving high priority substantive ends, via high priority substantive means.
Where a value conflict arises from b.) failing on simultaneous realisation, resources should be put towards finding a solution, or multiple solutions which would resolve the conflict.
The final outcomes should reflect the distribution of stated values at the macro level. Care should be taken to understand nuances and non-standard substantive values.
Any agent displaying callous behaviour with regards to the values of others via the negative or positive definitions of democracy should be demoted from any position of power.
----------------------------------------------------------------------------------------------
The idea is to make a minimal set of rules which would function to achieve both inner and outer alignment. Outer alignment will be resolved by providing a meta-objective rather than an objective - this ensures values and methods are not overspecified by designers.
Inner alignment would become a self-limiting problem if an AI understands itself to have substantive values around means and ends, and applies this meta-objective to itself and other AI agents.
For humans, there is no majoritarian or oppositional system; instead the idea is to realise the full range of values held, protecting pluralism.
Is there a more minimal set of rules or does anything extra need adding in?
I will edit this post to reflect any good answers here.

1

The Substantive Values Approach to Alignment

1

1

1