goal-program bricks

posted on 2022-08-13 — also cross-posted on lesswrong, see there for comments

(this post has been written for the first Refine blog post day, at the end of the week of readings, discussions, and exercises about epistemology for doing good conceptual research)

this is the follow-up to the Insulated Goal-Program idea in which i suggest doing alignment by giving an AI a program to run as its ultimate goal, the running of which would hopefully realize our values. in this post, i talk about what pieces of software could be used to put together an appropriate goal-program, as well as some example of plans built out of them.

"ems": uploaded people, who could for example evaluate how much a given situation satisfies our values; if they are uploads of AI alignment researchers and engineers, they could also be put to work on alignment and AI software — all of this inside the goal-program.
"elves": neural net models, or patchworks of software likely containing those, designed to be a rough representation of our values, carry a rough subset of our skills, or be some other subset of the human mind. those might have to make do if running ems is either impossible due to for example brain scan technology being unavailable, or if running elves poses less of an S-risk than running ems in some situations.
collaborative environments, such as collaborative programming environments or full 3D virtual environments, for ems and/or elves to work in together. those are instrumental environments designed to let their users develop something.
"utopia infrastructure": pieces of software designed to robustly support beings living together in utopia, as i've previously designed for a video game idea (which i'm no longer working on). these are places designed for long-term (possibly forever-term) inhabitation by endless persons, under hopefully utopic conditions.
program searches: programs iterating through programspace, typically in order to find worlds or models or programs which match some criteria. just like "a bunch of ems and/or elves programming together", program searches can be used to produce more of the things in this list. that said, program search finds can find demons, which is something to look out for; a general program search utilizing its output for anything must either fully sanitize what it does use, or skip demonic programs to begin with.
observer programs: programs which consume a slice of computation (typically a world simulation) for examination, and maybe even editing, typically by an em or an elf.
a simulation of earth would be useful if it were somehow obtainable in reasonable computational time. it could serve to extract alignment researchers from it in order to spawn a simulation of them without having to figure out brain scanning; it could be used to create an alternate history where AI researchers are somehow influenced, possibly at an early date; it could also be used to recover the full population of earth in order to give them access to utopia once we have a satisfactory instance of it.
a dump of (as much as possible of) the internet, which could be useful to both locate the earth, or re-extrapolate things like humans or earth or maybe specific persons.

here are some naive examples of outlines for goal-program which seem like they could be okay:

a simulation of a bunch of researchers, with a lot of time to figure out alignment (as in the peerless).
a bunch of elves forever evaluating various light-cones of a program search for worlds, keeping ones with seemingly good contents and discarding ones with seemingly bad contents — although this idea is potentially quite vulnerable to demon-laden worlds.
a bunch of elves working to, using a copy of the internet, re-extrapolate ems which could then figure out AI alignment
any of these schemes, except with ems or elves checking at a level above that everything goes well, with the ability to abort or change plans

these feel like we could be getting somewhere in terms of figuring out actual goal-program that could contain to valuable outcomes; at the very least, it seems like a valuable avenue of investigation. in addition, unlike AGI, individual many pieces of the goal-program can be individually tested, iterated on, etc. in the usual engineering fashion.

posted on 2022-08-13 — also cross-posted on lesswrong, see there for comments