Roland Pihlakas

Message

Independent AI alignment researcher. I hold an MSc equivalent degree in psychology from the University of Tartu. I work as an AI software architect specialising in combinatorial optimisation, machine learning, graph search, natural language processing, and data compression.

I have been both researching and professionally working on multi-objective value problems...

156

Roland Pihlakas

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and open challenges.

I notice that there has been very little discussion on why and how considering homeostasis is significant, even essential for AI alignment and safety. The current post aims to contribute to or assamending that situation. In this post I will treat alignment and safety as explicitly separate subjects, which both benefit from homeostatic approaches. This text is a distillation and reorganisation of three of my older blog posts at Medium: * Making AI less dangerous: Using homeostasis-based goal structures (2017) * Project proposal: Corrigibility and interruptibility of homeostasis based agents (2018) * Diminishing returns and conjunctive goals: Mitigating Goodhart’s law with common sense. Towards corrigibility and interruptibility via the golden middle way. (2018) I will probably share more such distillations or weaves of my old writings in the future. Introduction Much of AI safety discussion revolves around the potential dangers posed by goal-driven artificial agents. In many of these discussions, the agent is assumed to maximise some utility metric over an unbounded timeframe. This simplification, while mathematically convenient, can yield pathological outcomes. A classic example is the so-called “paperclip maximiser”, a “utility monster” which steamrolls over other objectives to pursue a single goal (e.g. creating as many paperclips as possible) indefinitely. “Specification gaming”, Goodhart’s law, and even “instrumental convergence” are also closely related phenomena. However, in nature, organisms do not typically behave like pure maximisers. Instead, they operate under homeostasis: a principle of maintaining various internal and external variables (e.g. temperature, hunger, social interactions) within certain “good enough” ranges. Going far beyond those ranges — too hot, too hungry, too socially isolated — leads to dire consequences, so an organism continually balances multiple needs. Crucially, “too much of a g

48Jan 12, 2025

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

45Mar 16, 2025

A brief review of the reasons multi-objective RL could be important in AI Safety Research

30Sep 29, 2021

Building AI safety benchmark environments on themes of universal human values

18Jan 3, 2025

Roland Pihlakas

Message

I have been both researching and professionally working on multi-objective value problems...

156

Research agenda for training aligned AIs using concave utility functions following the principles of homeostasis and diminishing returns

This conceptual overview post is intended to explain what I mean by the principles of "homeostasis", "diminishing returns", and "balancing" - how these ideas differ, complement, and interact with each other. Alongside, there is also an overview of my research agenda. What am I trying to promote, in simple words...

Dec 28, 202514

Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs

Working document – feedback very welcome. Ideas in the main body are those I currently see as highest-leverage; numerous items under Appendix are more tentative and would benefit from critique as well as additions. I am hoping that this post will serve also as a generally interesting brainstorming collection and...

Jun 22, 202517

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

By Roland Pihlakas, Sruthi Susan Kuriakose, Shruti Datta Gupta This research is now available as an Arxiv preprint at https://arxiv.org/abs/2509.02655 . The paper includes tables with select snippets of failure mode examples. Summary and Key Takeaways Relatively many past AI safety discussions have centered around the dangers of unbounded utility...

Mar 16, 202545

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and open challenges.

Jan 12, 202548

Building AI safety benchmark environments on themes of universal human values

This is an AI Safety Camp 10 project that I will be leading. With this post, I am looking for external collaborators, ideas, questions, resource suggestions, feedback, and other thoughts. Summary Based on various sources of anthropological research, I have compiled a preliminary list of universal (cross-cultural) human values. It...

Jan 3, 202518

Sets of objectives for a multi-objective RL agent to optimize

Background: A multi-objective decision-making AI Previously I've proposed balancing multiple objectives via multi-objective RL as a method to achieve AI Alignment. If we want an AI to achieve goals including maximizing human preferences, or human values, but also maximizing corrigibility, and interpretability, and so on--perhaps the key is to simply...

Nov 23, 202213

A brief review of the reasons multi-objective RL could be important in AI Safety Research

By Ben Smith, Roland Pihlakas, and Robert Klassert Thanks to Linda Linsefors, Alex Turner, Richard Ngo, Peter Vamplew, JJ Hepburn, Tan Zhi-Xuan, Remmelt Ellen, Kaj Sotala, Koen Holtman, and Søren Elverlin for their time and kind remarks in reviewing this essay. Thanks to the organisers of the AI Safety Camp...

Sep 29, 202130

LESSWRONG
LW

LESSWRONG
LW

Roland Pihlakas

Roland Pihlakas

Roland Pihlakas

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and open challenges.

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

A brief review of the reasons multi-objective RL could be important in AI Safety Research

Building AI safety benchmark environments on themes of universal human values

Roland Pihlakas

Research agenda for training aligned AIs using concave utility functions following the principles of homeostasis and diminishing returns

Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and open challenges.

Building AI safety benchmark environments on themes of universal human values

Sets of objectives for a multi-objective RL agent to optimize

A brief review of the reasons multi-objective RL could be important in AI Safety Research

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and open challenges.

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

A brief review of the reasons multi-objective RL could be important in AI Safety Research

Building AI safety benchmark environments on themes of universal human values

Research agenda for training aligned AIs using concave utility functions following the principles of homeostasis and diminishing returns

Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well). Subtleties and open challenges.

Building AI safety benchmark environments on themes of universal human values

Sets of objectives for a multi-objective RL agent to optimize

A brief review of the reasons multi-objective RL could be important in AI Safety Research