New location of previous file on github: https://github.com/Alexhb61/Alignment/blob/main/committees/Control_by_committee.md
The main criticism I imagined of this piece was the disassembly risk.
By adding a latter step to this outer alignment approach (transforming a committee into a modified learning problem), I believe I have a much stronger angle on outer alignment.
I will continue to write about it here:
https://github.com/Alexhb61/Alignment/blob/main/committees/outer_alignment.md
I think I have an interesting new research direction: Aligning Committees.
Make ensembles of agents and have them act together as only one agent in the world with some protocol for how they combine preferences.
Main Motivating Construction:
Given a target consequence T, we construct a committee out of 3 expected utility maximizing agents: Planner, Wanter, and Unwanter (which have different utility functions). Planner proposes a plan and then Wanter and Unwanter either signoff or veto said plan. A plan P which gets both to sign off is executed, otherwise the agent does the safe but useless null action
The Utility Functions
Then as the committee gets smarter and more powerful I believe this is a mild optimizer.
Questions:
1. Is this already part of someone else's research agenda? if so, who?
2. Is there anything I should definitely read before heading down this path?
3. Is the main motivating construction a mild optimizer? why or why not?
If this is interesting to you, please read the full post at the link.