Plan type 3: Does the brain have any self-aware course-correcting mechanism? The Anterior Cingulate Cortex?
Perhaps we find a way of proving the inner-goal of small NNs, and we train these NNs to have the goal of evaluating, and aligning, the emergent goals and subgoals of other AI systems, and find a way of deploying these NNs as modular 'saftey workers' that can be integrated into AGI systems in order to monitor, evaluate, and align the goals of the entire system.
But then I guess really you'd have to incorporate these safety workers as a part of the larger system from the very beginning, and in a way that gives these parts of the system some kind of architectural advantage.
Worth posting I think! I don't yet have the skills to conduct any of these effectively, but I can see a couple of them being useful projects for me to try my hand at eventually
Plan type 3: Does the brain have any self-aware course-correcting mechanism? The Anterior Cingulate Cortex?
Perhaps we find a way of proving the inner-goal of small NNs, and we train these NNs to have the goal of evaluating, and aligning, the emergent goals and subgoals of other AI systems, and find a way of deploying these NNs as modular 'saftey workers' that can be integrated into AGI systems in order to monitor, evaluate, and align the goals of the entire system.
But then I guess really you'd have to incorporate these safety workers as a part of the larger system from the very beginning, and in a way that gives these parts of the system some kind of architectural advantage.