Also available on the EA Forum.Followed by: Encultured AI, Part 2 (forthcoming)
Hi! In case you’re new to Encultured AI, we’re a for-profit start-up with a public benefit mission: developing technologies promoting the long-term survival and flourishing of humanity and other sentient life. However, we also realize that AI poses an existential risk to humanity if not developed with adequate safety precautions. Given this, our goal is to develop products and services that help humanity steer toward the benefits and away from the risks of advanced AI systems. Per the “Principles” section of our homepage:
Our current main strategy involves building a platform usable for AI safety and alignment experiments, comprising a suite of environments, tasks, and tools for building more environments and tasks. The platform itself will be an interface to a number of consumer-facing products, so our researchers and collaborators will have back-end access to services with real-world users. Over the next decade or so, we expect an increasing number of researchers — both inside and outside our company — will transition to developing safety and alignment solutions for AI technology, and through our platform and products, we’re aiming to provide them with a rich and interesting testbed for increasingly challenging experiments and benchmarks.
In the following, we’ll describe the AI existential safety context that motivated us to found Encultured, and go into more detail about what we’re planning to do.
The technical areas below have begun to receive what we call “existential attention” from AI researchers, i.e., attention from professional AI researchers thinking explicitly about the impact of their work on existential safety:
In other words, the topics above lie in the intersection of the following Venn diagram:
See Appendix 1 for examples of research in these areas. More research in these areas is definitely warranted. A world where 20%+ of AI and ML researchers worldwide pivoted to focusing on the topics above would be a better world, in our opinion.
If our product is successful, we plan to grant access to researchers inside and outside our company for performing experiments in the areas above, interacting directly with users on our platform. And, our users will be aware of this ;) We’re planning on this not only because it will benefit the world, but because it will benefit our products directly: the most valuable tools and services are trustworthy, truthful, preference-sensitive, interpretable, and robust.
The following topics have received research attention from some researchers focused on existential safety, and AI research attention from other researchers, but to us the two groups don’t (yet) seem to overlap as much as for the ‘trending’ topics above.
Also see Appendix 2 for a breakdown of why we think these areas are “emerging” in AI x-safety.
While continuing to advocate for the above, we’ve asked ourselves: what seems to be completely missing from research and discourse on AI existential safety? The following areas are topics that have been examined from various perspectives in AI research, but little or not at all from the perspective of x-safety:
To make sure these aspects of safety can be addressed on our platform, we decided to start by working on a physics engine for high-bandwidth interactions between artificial agents and humans in a virtual environment.
We think we can create opportunities for humanity to safety-test future systems, by building a platform usable for testing opportunities. We're looking to enable testing for both popular and neglected safety issues, and we think we can make a platform that brings them all together.
In our next post, we'll talk about how and why we decided to provide a consumer-facing product as part of our platform.
Followed By: Encultured AI, Part 1 Appendix: Relevant Research ExamplesEncultured AI Pre-planning, Part 2: Providing a Service
First, great news on founding an alignment organization on your own. While I give this work a low chance of making progress, if you succeed the benefits would be vast.
I'll pre-register a prediction. You will fail with 90% probability, but potentially usefully fail. My reasons are as follows:
Inner alignment issues have a good chance of wrecking your plans. Specifically there are issues like instrumental convergence causing deception and power-seeking by default. I notice an implicit assumption where inner alignment is either not a problem or so easy to solve by default that it's not worth worrying about. This may hold, but I suspect more likely than not not to hold.
I suspect that cultural AI is only relevant in the below human and human regime, and once above the human regime happens, there's a fairly massive incentives to simply not care about humans culture the same way way that humans don't really care about the less powerful animals. Actually bettering less powerful being's lives is very hard.
> First, great news on founding an alignment organization on your own.
Actually I founded it with my cofounder, Nick Hay!https://www.encultured.ai/#team