Assume that given a hypercomputer and the magic utility function m, we could build an FAI F(m). Every TB of data encodes some program A(u) that takes a utility function u as input. For all A and u, ask F(u) if A(u) is aligned with F(u). (We must construct F not to vote strategically here.) Save that A' which gets approved by the largest fraction of F(u). Sanity check that this maximum fraction is very close to 1. Run A'(m).

AI Alignment Open Thread August 2019

by habryka 1mo4th Aug 201987 comments

37

Ω 12


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is an experiment in having an Open Thread dedicated to AI Alignment discussion, hopefully enabling researchers and upcoming researchers to ask small questions they are confused about, share very early stage ideas and have lower-key discussions.