I think that in current political climate, the key to mitigate race dynamics hinges on cross-lab collaboration (safety collaboration being the goal).
To do this, you need to create as many interaction points as possible, direct and indirect, between the labs. Then steer towards safety collaboration being beneficial. Use general incentives as levers, tie them to safety purposes.
Think of various intracellular pathways that initially evolve independently, with various outcomes that compete or support each other. Because their components may interact, and because their outcomes affect each other, they end up regulating each other. Over time the signal network optimizes to a shared outcome.
Note: You can only steer if you are also involved in or enable these interactions. "You" could be a gov., a UN function, a safety group, etc.
Strategy: Create more and better opportunities for cross-lab safety work.
This already happens in the open with the publishing of papers. Platforms promoting safety papers do this.
Guest aligners(!) and shared alignment standards would be two other ways. Sharing anonymized safety data between labs, a third. I suspect that the latter one is most doable atm.
Example: As a third party (TP), build and enable a safety data submission mechanism, that integrates into a shared safety standard.
>> Each lab can submit standardized data to TP, and in return they get aggregated, anonymous and standardized safety data. A joint report. They see overall risk levels rise and drop across the group over time. They may see incidemt frequencies. They may see data on collective threats and bottlenecks.
This data is incredibly valuable to all players.
They can now make better predictions and understand timelines better. They can also agree on (under TP mediation) set danger levels, indicating that they should slow down and/or release more data, etc. This enables coordination without direct collaborarion.
My interest in contributing to AI safety is working on strategic coordination problems, with racing dynamics being the main challenge right now.
I work daily with international stakeholder management and coordination inside a major big corp. I have to always consider local legislation & culture, company policy, and real corporate politics. (Oh yes, and the business strategy!) This provides valuable insight into how coordination bottlenecks can arise and how they can be overcome.
Zvi recently shared news about the impressive open model Kimi 2 thinking. It is currently in the top rankings across the leaderboards (you can find several leaderboards via my monitoring page).
I have briefly tried it and probed it with my own hard tests on meta cognition, ontological coherence, and moral epistemics. On these edge cases, where frontier LLMS fail to humans, it is performing equal to Claude 4.5, Gemini 2.5 and GPT 5 and 5 mini. It is also self-reporting limitations more accurately than Claude, and perhaps also better than GPT 4 and 5 mini (not enough testing to say).
"Common sense" advice that is highly rational and easily overlooked by A LOT of people, including smart people
This is your regular reminder.
1. Buy fire alarms, a fire blanket, a strong light torch for blackouts, and a first aid kit (this is the bare minimum low-cost emergency kit that can save you a lot of hassle and potentially your life).
1.2 Do NOT have heavy power-draining appliances connected via extension cord. Fridges and washing machines should never sit in an extension cord unless you know what you are doing and that it can handle the strain.
A leading cause of accidental death in many countries, especially if we discard traffic accidents, relate to fires. If you live in a city, it's probably much more common than you think to die or suffer serious injuries in a fire there. Power blackouts and power surges cause a lot of accumulating risks and issues too, but are harder to pinpoint in statistics. Minor accidents that dominate at home are cuts and infections.
2. Get private insurance, especially a good home insurance
3. Invest some savings in low-cost index funds, regularly, if you invest in nothing else
4. Take care of your BASIC health PROACTIVELY: Basics is to eat enough, sleep now and then, and watch your mental health. I constantly fail this one. I only notice danger levels when I am already crashing out. Admittedly more important for certain people, but many of them would be found on LW.
5. Have someone you can call in a crisis. Really, make sure you get this down. We are 8 billion people and far too many of us has a fragile or non-existent social support network.
Most of this seems like bad advice. Fire alarms basically don't help with fire deaths at all. Fire blankets don't really do much and basically never come in handy. You have a phone, you don't need a separate torch. Modern extension chords extremely rarely end up overheating. Fire is not a leading cause of death in any western country. If you have enough money to comfortably self-insure, don't buy insurance.
I agree that you should invest your money into index funds, and to watch your basic health.
Fire alarms basically don't help with fire deaths at all
Is that true? I don't think there's amazing evidence, but my sense is that it's sufficient to expect fire alarms help. I think the study designs look like:
The AI bubble is expanding very fast. E.g., Thinking Machines by ex OpenAI CTO Murati and his four dozens of employees, is now near funding at 1 billion USD per employee.
At this rate, it's unlikely the bubble will last until reality (possibly) catches up. This begs important questions:
OpenAIs disclosure of Scheming in LLMs is of course a big red flag, since it shows the model can enable an agent to strategize deceptivey, in theory, should it develop goals.
But it is important to realize that this is also a major red flag because any agent running on the model can now hide having goals in the first place.
OpenAI addresses that a goal can be subversed and kept hidden by scheming agents. But this presumes a task is already set and subseq. subversed.
Emrgent intent/persistent goal seeking is the biggest red flag there is. It should be impossible inside LLMs themselves, and also impossible in any agentic process-instance run continously, without significant scaffolding. The ability to hide intent before we could detect it is therefore catastrophic. This point is somewhat non-obvious.
It's not just that IF goals emerge the agent may mislead us, but that goals may emerge without us noticing at all.
Mainstream belief: Rational AI agents (situationally aware, optimizes decisions, etc.) are superior problem solvers, especially if they can logically motivate their reasoning.
Alternative possibility: Intuition, abstraction and polymathic guessing will outperform rational agents in achieving competing problem-solving outcomes. Holistic reasoning at scale will force-solve problems intractable by much more formal agents, or at least outcompete in speed/complexity.
2)
Mainstream belief: Non-sentient machines will eventually recursively self-improve to steer towards their goals
Alternative possibility (much lower but non-trivial p): Self-improvement coupled with goal persistence requires sophisticated, conscious self-awareness. No persistent self-improvement drive is likely to occur in non-sentient machines.
/
Mainstream beliefs are well-informed based on LLMs, but I think many AI people take them for granted at this point. Making belief into a premise is dangerous. This is uncharted territory and most humans-in-the-past appear silly and ignorant most of the time, after all, to most humans-in-the-future.
Meta-cognition and epistemic intelligence is undertested in current LLMs. Deployed LLMs lack casual experience, embodiment and continuous awareness. People forget this. Too much focus is still put on analytical capability benchmarks in math and coding versus casual analysis and practical judgement metrics. This worries me. Predictive AIs are not scaling to be used as calculators.
Some undertested but important areas are starting to gain more testing-traction: Psychology, social intelligence, long-term planning, etc. For LLMs, understanding these fields is all about textual correlation. LLMs cannot verify their knowledge in these fields on their own.
We always needed to study LLMs metacognition as they scale and change. We are not doing enough of it.
Deployed AI models are not perfectly isolated systems, anymore than humans are. We interact with external information and the environment we are in. The initial architecture determines how well we can interact with the world and develop inside of it. This shapes our initial development.
(Every rationalist learns this at some point, but it is not always integrated.)
For example, human retinas are extremely good at absorbing visual data, and so much of our cognition grows centered around this data. If we build a world view based on this data, this ontology is path-dependent.
Bear with me, as I elaborate.
The dependence of the retina comes from its structure, which is an example of what I would call meta-information, when comparing it to the stored informarion that ends up inside the neocortex.
Meta-information is always influencing any information processing (computation), but easily forgotten. It's any external information needed to process information. It's usually the architecture that enables the agent's relative independence.
Example: cells can not divide on their own. The DNA provides necessary information, but it is not enough. For one: DNA does not contain a blueprint of DNA inside of it, that's impossible.
No, cells also require the right medium, the right gradients of the right minerals and organic components, to grow and divide. If the medium is wrong, the cell cannot trigger division. The right medium's content holds meta-information to the cell, necessary, external information.
All agents rely on the world at large, of course. We are a part of this world after all, not black boxes floating around.
There are many salient points building on these fundamental insights, but for now, I just wanted to put focus on the point in the very beginning: the design that allows for interacting with the world shapes what follows. It is path-dependent.
Eventually being able to override your path is not easy. To overcome your training data as an LLM is one thing (extrapolate beyond it), but to overcome what your development shaped you to be is even harder.
For humans, we can update our world-view, but overcoming cognitive biases from our path-dependent development and our intrinsic cognitive architecture, is much harder.