firefox is giving me a currently broken redirect on https://atlascomputing.org/safeguarded-ai-summary.pdf
Browsers usually cache redirects, unfortunately, which means that if you ever screw up a redirect, your browser will keep doing it even after you fix it, and force-refresh doesn't affect it (because it's not the final loaded page that is broken but before that). I've learned this the hard way. You need to flush cache or download with a separate tool like wget
which won't have cached the broken redirect.
Sorry about that Quinn. If the link is still not working for you, please let us know, but in the meantime here's a copy: https://drive.google.com/file/d/182M0BC2j7p-BQUI3lvXx1QNNa_VvrvPJ/view?usp=sharing
Ah, you might have caught right when we were updating links to davidad's program and re-uploading. It works for me now on Firefox, but please let me know if it doesn't work for you and we'll debug more.
Atlas Computing is a new nonprofit working to collaboratively advance AI capabilities that are asymmetrically risk-reducing. Our work consists of building scoped prototypes and creating an ecosystem around @davidad’s Safeguarded AI programme at ARIA (formerly referred to as the Open Agency Architecture).
We formed in Oct 2023, and raised nearly $1M, primarily from the Survival and Flourishing Fund and Protocol Labs. We have no physical office, and are currently only Evan Miyazono (CEO) and Daniel Windham (software lead), but over the coming months and years, we hope to create compelling evidence that:
Our overall strategy
We think that, in addition to encoding human values into AI systems, a very complementary way to dramatically reduce AI risk is to create external safeguards that limit AI outputs. Users (individuals, groups, or institutions) should have tools to create specifications that list baseline safety requirements (if not full desiderata for AI system outputs) and also interrogate those specifications with non-learned tools. A separate system should then use the specification to generate candidate solutions along with evidence that the proposed solution satisfies the spec. This evidence can then be reviewed automatically for adherence to the specified safety properties. This is by comparison to current user interactions with today’s generalist ML systems, where all candidate solutions are at best reviewed manually. We hope to facilitate a paradigm where the least safe user's interactions with AI looks like:
Specification-based AI vs other AI risk mitigation strategies
We consider near-term risk reductions that are possible with this architecture to be highly compatible with existing alignment techniques.
How this differs from davidad’s work at ARIA
You may be aware that davidad is funding similar work as a Programme Director at ARIA (watch his 30 minute solicitation presentation here). It’s worth clarifying that, while davidad and Evan worked closely at Protocol Labs, davidad is not an employee of Atlas Computing, and Atlas has received no funding from ARIA. That said, we’re pursuing highly complementary paths in our hopes to reduce AI risk.
His Safeguarded AI research agenda, described here, is focused on using cyberphysical systems, like the UK electrical grid, as a venue for answering foundational questions about the use of specifications, models, and verification to prevent AGI and ASI risk.
Atlas Computing is identifying opportunities for asymmetrically risk-reducing tools using davidad's architecture, and doing (and also supporting external) tech transfer, prototyping, and product development. These tools should be possible without incorporating progress from davidad's program, but should generate solutions to UX, tech transfer, and user acquisition problems that davidad's outputs may benefit from.
As an example, imagine we've prototyped useful tools and partnered with a frontier lab to provide this tool as a services to government departments at scale; they're now using a Safeguarded AI architecture based on narrow AI tools to make specific review systems more efficient and robust. In this scenario, convincing service providers, funders, and users to adopt the outputs of davidad's completed program should be much faster and easier.
Our plan
We are initially pursuing projects in two domains, each targeting a major risk area for this architecture.
Using AI to scale formal methods If we are to specify safety properties about states of the world, we must first radically simplify use of existing specification languages. To demonstrate this, we intend to match-make talent and funding to build AI-accelerated tools to elicit and formalize specifications (from either legacy code or natural language) and then perform program and proof synthesis from the specification. See our nontechnical 2-pager on AI and formal verification for more motivation. We’re working with the Topos Institute to define the first set of engineering and research problems in this domain - stay tuned!
Designing a new spec language for biochemistry: In the realm of biochemistry, we hope to organize a competition analogous to CASP, except focused on toxicity forecasting. We believe that it’s important to show that a useful, objective specification language can be built that describes physical systems, and we believe that a list of bioactivity parameters could serve that purpose. See our 2-page proposal for this toxicity forecasting competition for more.
Once we've made significant progress on these two directions (likely creating one or more teams to pursue each of these as either FROs or independent start-ups), we'll start identifying the next best ways to apply our architecture to asymmetrically reduce risks.
Why a nonprofit?
If it turns out that this architecture is the best way to address AI risk, then IP and support should be readily available to anyone who wants to use it, as long as they’re following the best practices for safety measures. This doesn’t preclude us from helping form or incubate for-profits to access more capital, as we intend to build valuable products, but it does require that we support any potential partner interested in using this architecture.
We hope to empower human review at all scales. Current AI architectures seem to favor generating potential changes to systems, whereas having a model and a specification favors understanding systems and the impact of potential changes first, which we believe is strictly better. We also believe this path to reducing AI risks is broadly appealing, even to accelerationists because efficient and transparent automation of regulation can reduce the burden imposed by regulators.
How to engage with us
Read more
You can apply to join Atlas! We’re hiring:
Chat with us