avatar

posted on 2022-08-13 — also cross-posted on lesswrong, see there for comments

(this post has been written for the first Refine blog post day, at the end of the week of readings, discussions, and exercises about epistemology for doing good conceptual research)

goal-program bricks

this is the follow-up to the Insulated Goal-Program idea in which i suggest doing alignment by giving an AI a program to run as its ultimate goal, the running of which would hopefully realize our values. in this post, i talk about what pieces of software could be used to put together an appropriate goal-program, as well as some example of plans built out of them.

here are some naive examples of outlines for goal-program which seem like they could be okay:

these feel like we could be getting somewhere in terms of figuring out actual goal-program that could contain to valuable outcomes; at the very least, it seems like a valuable avenue of investigation. in addition, unlike AGI, individual many pieces of the goal-program can be individually tested, iterated on, etc. in the usual engineering fashion.

posted on 2022-08-13 — also cross-posted on lesswrong, see there for comments

unless explicitely mentioned, all content on this site was created by me; not by others nor AI.