I think that in current political climate, the key to mitigate race dynamics hinges on cross-lab collaboration (safety collaboration being the goal).
To do this, you need to create as many interaction points as possible, direct and indirect, between the labs. Then steer towards safety collaboration being beneficial. Use general incentives as levers, tie them to safety purposes.
Think of various intracellular pathways that initially evolve independently, with various outcomes that compete or support each other. Because their components may interact, and because their outcomes affect each other, they end up regulating each other. Over time the signal network optimizes to a shared outcome.
Note: You can only steer if you are also involved in or enable these interactions. "You" could be a gov., a UN function, a safety group, etc.
Strategy: Create more and better opportunities for cross-lab safety work.
This already happens in the open with the publishing of papers. Platforms promoting safety papers do this.
Guest aligners(!) and shared alignment standards would be two other ways. Sharing anonymized safety data between labs, a third. I suspect that the latter one is most doable atm.
Example: As a third party (TP), build and enable a safety data submission mechanism, that integrates into a shared safety standard.
>> Each lab can submit standardized data to TP, and in return they get aggregated, anonymous and standardized safety data. A joint report. They see overall risk levels rise and drop across the group over time. They may see incidemt frequencies. They may see data on collective threats and bottlenecks.
This data is incredibly valuable to all players.
They can now make better predictions and understand timelines better. They can also agree on (under TP mediation) set danger levels, indicating that they should slow down and/or release more data, etc. This enables coordination without direct collaborarion.
My interest in contributing to AI safety is working on strategic coordination problems, with racing dynamics being the main challenge right now.
I work daily with international stakeholder management and coordination inside a major big corp. I have to always consider local legislation & culture, company policy, and real corporate politics. (Oh yes, and the business strategy!) This provides valuable insight into how coordination bottlenecks can arise and how they can be overcome.
Meta-cognition and epistemic intelligence is undertested in current LLMs. Deployed LLMs lack casual experience, embodiment and continuous awareness. People forget this. Too much focus is still put on analytical capability benchmarks in math and coding versus casual analysis and practical judgement metrics. This worries me. Predictive AIs are not scaling to be used as calculators.
Some undertested but important areas are starting to gain more testing-traction: Psychology, social intelligence, long-term planning, etc. For LLMs, understanding these fields is all about textual correlation. LLMs cannot verify their knowledge in these fields on their own.
We always needed to study LLMs metacognition as they scale and change. We are not doing enough of it.
Deployed AI models are not perfectly isolated systems, anymore than humans are. We interact with external information and the environment we are in. The initial architecture determines how well we can interact with the world and develop inside of it. This shapes our initial development.
(Every rationalist learns this at some point, but it is not always integrated.)
For example, human retinas are extremely good at absorbing visual data, and so much of our cognition grows centered around this data. If we build a world view based on this data, this ontology is path-dependent.
Bear with me, as I elaborate.
The dependence of the retina comes from its structure, which is an example of what I would call meta-information, when comparing it to the stored informarion that ends up inside the neocortex.
Meta-information is always influencing any information processing (computation), but easily forgotten. It's any external information needed to process information. It's usually the architecture that enables the agent's relative independence.
Example: cells can not divide on their own. The DNA provides necessary information, but it is not enough. For one: DNA does not contain a blueprint of DNA inside of it, that's impossible.
No, cells also require the right medium, the right gradients of the right minerals and organic components, to grow and divide. If the medium is wrong, the cell cannot trigger division. The right medium's content holds meta-information to the cell, necessary, external information.
All agents rely on the world at large, of course. We are a part of this world after all, not black boxes floating around.
There are many salient points building on these fundamental insights, but for now, I just wanted to put focus on the point in the very beginning: the design that allows for interacting with the world shapes what follows. It is path-dependent.
Eventually being able to override your path is not easy. To overcome your training data as an LLM is one thing (extrapolate beyond it), but to overcome what your development shaped you to be is even harder.
For humans, we can update our world-view, but overcoming cognitive biases from our path-dependent development and our intrinsic cognitive architecture, is much harder.