Bengio's Alignment Proposal: "Towards a Cautious Scientist AI with Convergent Safety Bounds"
Yoshua Bengio recently posted a high-level overview of his alignment research agenda on his blog. I'm pasting the full text below since it's fairly short. > What can’t we afford with a future superintelligent AI? Among others, confidently wrong predictions about the harm that some actions could yield. Especially catastrophic harm. Especially if these actions could spell the end of humanity. > > How can we design an AI that will be highly capable and will not harm humans? In my opinion, we need to figure out this question – of controlling AI so that it behaves in really safe ways – before we reach human-level AI, aka AGI; and to be successful, we need all hands on deck. Economic and military pressures to accelerate advances in AI capabilities will continue to push forward even if we have not figured out how to make superintelligent AI safe. And even if some regulations and treaties are put into place to reduce the risks, it is plausible that human greed for power and wealth and the forces propelling competition between humans, corporations and countries, will continue to speed up dangerous technological advances. > > Right now, science has no clear answer to this question of AI control and how to align its intentions and behavior with democratically chosen values. It is a bit like in the “Don’t Look Up” movie. Some scientists have arguments about the plausibility of scenarios (e.g., see “Human Compatible“) where a planet-killing asteroid is headed straight towards us and may come close to the atmosphere. In the case of AI there is more uncertainty, first about the probability of different scenarios (including about future public policies) and about the timeline, which could be years or decades according to leading AI researchers. And there are no convincing scientific arguments which contradict these scenarios and reassure us for certain, nor is there any known method to “deflect the asteroid”, i.e., avoid catastrophic outcomes from future powerful AI systems. Wit
“Agent” is sort of similar. The top definition on google is “a person who acts on behalf of another person or group”, whereas in these parts we tend to use it for a thing that has its own goals.
Edit: I think Jon Richens pointed this out to me once.