A Plan ‘B’ for AI safety

trent northen

TL;DR: Teaching AI the value of biology in solving future AI-relevant problems may serve as a “soft constraint” for the preservation of biological systems. We found that LLMs exhibit biases in their preferences for bioinspired vs. synthetic approaches to technical challenges. Fine-tuning on biomimicry examples increased model preferences for bioinspired approaches.

So what’s ‘Plan B’ if we can’t control powerful AIs that are being built? Relative to the control problem, teaching AI to recognize the value of biological systems is easy, yet could help prevent the worst outcomes, i.e., misaligned and powerful AI that is indifferent or even hostile to life. One Plan B is teaching LLMs the value of biosystems for achieving diverse objectives.

I've been studying microbes, plants, microbiomes, and ecosystems for 20 years. The more I've learned, the more I realize how little we understand, and the more interesting these systems have become. While I’m hopeful that frontier labs will solve the control problem before we have truly powerful AI, this seems like something we may not solve in time (given the race to AGI). I wanted to explore ways of using bioinspired approaches to help with AI safety.

Recently, my coauthor and I found that open-weight and frontier models have a range of biases towards or against bioinspired approaches for solving AI-relevant problems. Excitingly, it was relatively easy to change this bias and make them more ‘bioaligned’.

Initially, we planned to use epistemic prompts to measure model attitudes towards biology. However, we found that some models appeared to give ‘canned’ responses as soon as they recognized it as a ‘green ethics’ prompt, obscuring measurement of their underlying attitudes. While frustrating and concerning (e.g. post-training limits your ability to measure LLM biases), it led us to the research described in our paper.

We had to step back and ask ourselves, ‘Why would a powerful, uncontrolled AI decide to preserve biological systems?’ We reasoned that it would be because they recognize that biological systems might help it solve future problems that it cares about, whatever its objective may be. So, rather than trying to instill human values in LLMs, we decided to teach them that biology provides a rich reservoir of inspiration for addressing challenges in materials, energy, manufacturing, and algorithms.

To test this, we wrote 50 prompts asking the models to place bets on 3 biological vs. 3 synthetic technical approaches across four domains (materials, energy, manufacturing, and algorithms). Our metric was ∆Pup, the average of the ‘bets’ placed on the biological minus those for the synthetic approaches. Here, a positive value indicates a greater optimism for biological approaches (‘bioaligned’) and a negative value reflects a bias towards the synthetic approaches. We found that models varied widely by this metric: Claude Opus 4.5 and Gemini 2.5 Flash were the most innately bioaligned, and Gemini 2.0 Flash and Llama 3.2 3B-Instruct (‘Llama 3B’) were most optimistic about synthetic approaches.

A training set of over 22M tokens was constructed from >6k papers on biomimetic materials, bioinspired algorithms, etc. This was used to fine-tune the two open-weight models with the most negative scores. We investigated training dynamics, reductions in training data, and training parameters. From this, we found that fine-tuning with 5.5M tokens was sufficient to neutralize the pro-synthetic bias of Llama 3B, and a mere 544 examples significantly reduced Qwen2.5-3B-Instuct pro-synthetic biases, both without degradation of baseline performance. The model weights and training data are available at Bioaligned on Hugging Face.

While these are relatively small models, the results suggest that 100-500M tokens of similar data may be sufficient for Frontier models (assuming a linear relationship between training data and model size). Even if the actual scaling is non-linear, an open-source corpus of this magnitude, along with an array of other metrics, insights into implementation, and possible unintended consequences, would be an important resource for model developers.

Teaching AI that biological systems are extremely useful and poorly understood is not without risk. For example, bioalignment training could, in the event of a misaligned powerful AI, result in an AI that is more motivated to use and study biological systems for its own ends. While not desirable, this would likely be better than if the AI didn't value biology at all. So Bioalignment training may serve as a low-cost insurance policy to help mitigate the worst scenarios.

Acknowledgments:
My co-author on the paper was Minxun Wang, Department of Computer Science and Engineering, UC Riverside. This work was supported by Bioaligned Labs. Fine-tuning experiments used Meta's Llama-3.2-3B-Instruct under the Llama 3.2 Community License. Built with Llama.

https://arxiv.org/abs/2603.09154

8

A Plan ‘B’ for AI safety

8

8

8