No, this is not something I can undertake -- however, the effort itself need not be very complicated. You've already got a list of Misalignment types in the form: create a google doc with definitions/descriptions of each of these, and put a link to that doc in this question.
It might be worth (someone) writing out what is meant by each kind of misalignment category, as used in the db. Objective misalignment, specific gaming, value misalignment all seem overlapping, and I'm not at all sure what physical misalignment is supposed to be pointing to.
Just wanted to poll the world about who we think are most able to reduce X-Risks (even if they currently aren't doing so). Folks who are most able to but aren't seem like the most crucial ones to reach out to.
Please add only one person (or collection of people) per answer.
Please check the list to see if your submission isn't already mentioned. If so, please upvote them rather than add them a second time.
I'm thinking of designing a reinforcement learning environment based on Conway's Game of Life (GoL). In it, at every timestep, an agent can change the state of some cells.
As is the case with most interesting RL problems, agent behaviour would be determined by the reward function.
In this scenario, I see some issues with simple reward functions:
1) Total life:
Something like this glider gun would represent a technically correct unbounded score, with a stream of lonely travellers in an interstaller abyss. Say we want to stick to a finite memory.
2) Highest density:
This is generation 50 of the Max spacefiller. The agent might find some way of maintaining the striped pattern from the middle.... (read more)
No, this is not something I can undertake -- however, the effort itself need not be very complicated. You've already got a list of Misalignment types in the form: create a google doc with definitions/descriptions of each of these, and put a link to that doc in this question.