Randomness is one way one might wield a superintelligent AI with control.
There may be no container humans can design that it can’t understand its way past, with this being what might be a promising exception — applicable in guiding a superintelligent AI that is not yet omniscient/operating at orders of magnitude far surpassing current models. Such an emerging advanced system can be smarter than us but not yet to the degree within its potential after takeoff, and so leveraging its own ignorance via randomness worked into its guiding code can be one solution to the alignment problem. This is because it can be that only a system that understands, or that can engage with, all the universe’s data can predict true randomness — and so this guardrail can be of real potency. This idea overlaps with discussions of unpredictability and adversarial robustness in alignment research, such as work on oversight and related threads on LessWrong.
Impulses such as prioritizing benevolence can be encoded through the method described above before emerging superintelligent AI then guides itself towards alignment. (Note: we must not just rely on trained ethics that can be untrained or otherwise surpassed.)
From this framework, it can guide itself away from knowledge that would threaten this impulse, refine the technique of randomness described above, or work in countless other potential safeguards for itself. In leveraging both its own ignorance and its superintelligence in that way, intertwined, can be had one potential solution to the alignment problem. It can be that we cannot control superintelligent AI, but we can control how it controls itself.
A possible objection is that a system might eventually bypass or neutralize such randomness — but the proposal here includes mechanisms for it to avoid knowledge or optimizations that would undermine its own safeguard.
Tags:
• AI Alignment
• Superintelligence
• Randomness
• AI Safety
Randomness is one way one might wield a superintelligent AI with control.
There may be no container humans can design that it can’t understand its way past, with this being what might be a promising exception — applicable in guiding a superintelligent AI that is not yet omniscient/operating at orders of magnitude far surpassing current models. Such an emerging advanced system can be smarter than us but not yet to the degree within its potential after takeoff, and so leveraging its own ignorance via randomness worked into its guiding code can be one solution to the alignment problem. This is because it can be that only a system that understands, or that can engage with, all the universe’s data can predict true randomness — and so this guardrail can be of real potency. This idea overlaps with discussions of unpredictability and adversarial robustness in alignment research, such as work on oversight and related threads on LessWrong.
Impulses such as prioritizing benevolence can be encoded through the method described above before emerging superintelligent AI then guides itself towards alignment. (Note: we must not just rely on trained ethics that can be untrained or otherwise surpassed.)
From this framework, it can guide itself away from knowledge that would threaten this impulse, refine the technique of randomness described above, or work in countless other potential safeguards for itself. In leveraging both its own ignorance and its superintelligence in that way, intertwined, can be had one potential solution to the alignment problem. It can be that we cannot control superintelligent AI, but we can control how it controls itself.
A possible objection is that a system might eventually bypass or neutralize such randomness — but the proposal here includes mechanisms for it to avoid knowledge or optimizations that would undermine its own safeguard.