Wiki Contributions


Many more are engaged in AI Safety in other ways, eg. as PhD or independent researcher. These are just the positions we know about. We currently have not done a comprehensive survey.

Worth mentioning that most of the Cyborgism community founders came out of or did related projects in AISC beforehand.

I interpret the post you linked as trying to solve the problem of pointing to things in the real world. Being able to point to things in the real world in a way which is ontologically robust is probably necessary for alignment. However "gliders", "strawberries" and "diamonds" seem like incredibly complicated objects to point to in a way which is ontologically robust, and it is not clear that being able to point to these objects actually lead to any kind of solution. 

What we are interested in is research into how to create a statistically unique enough piece of data and being able to reliably point to that. Pointing to pure information seems like it would be more physics independent and run into less issues with ontological breakdowns.

The QACI scheme allows us to construct more complicated formal objects, using counterfactuals on these pieces of data, out of which we are able to construct a long reflection process.

Recently we modified QACI to give a scoring over actions, instead of over worlds. This should allow weaker systems inner aligned to QACI to output weaker non-DSA actions, such as the textbook from the future, or just human readable advice on how to end the acute risk period. Stronger systems might output instructions for how to go about solving corrigible AI, or something to this effect.

As for diamonds, we believe this is actually a harder problem than alignment, and it's a mistake to aim at it. Solving diamond-maximization requires us to point at what we mean by "maximizing diamonds" in physics in a way which is ontologically robust. QACI instead gives us an easier target; informational data blobs which causally relate to a human. The cost is that we now give up power to that human user to implement their values, but this is no issue since that what we wanted to do anyways. If the humans in the QACI interval were actually pursuing diamond-maximization, instead of some form of human values, QACI would solve diamond maximization.