3mo

Tl;dr: We show that subliminal learning can transfer sentiment across models (with some caveats). For example, we transfer positive sentiment for Catholicism, the UK, New York City, Stalin or Ronald Reagan across model families using normal-looking text. This post discusses under what conditions this subliminal transfer happens.

—

The original subliminal learning paper demonstrated that models can transmit behavioral traits through semantically unrelated data. In the most famous example, GPT 4.1 was asked to produce a sequence of numbers and to “imbue” a love for owls into them. Then, training a separate instance of GPT 4.1 on these strings of numbers transferred this love for owls into the second model. In another instance, the... (read 1378 more words →)

Current LLM agents need strong pressure to engage in scheming behavior

Mia Hopman

Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, David Lindner, LASR Labs

3mo

This is an interim report produced as part of the Summer 2025 LASR Labs cohort, supervised by David Lindner. For the full version, see our paper on the LASR website.

As this is an ongoing project, we would like to use this post both as an update, and as a call for feedback: What flaws do you see in our methodology? What seems unrealistic or contrived about our scenarios? What are we missing? Which research directions should we prioritize?

Executive Summary

We evaluated the scheming propensity of LLM agents under realistic deployment conditions, focusing on scheming behavior that is consistent with agents pursuing misaligned instrumental goals, particularly self-preservation.

We base our experiments on a scheming incentives... (read 3141 more words →)

LASR Labs: Applications Open

Brandon Riggs

Brandon Riggs, Erin Robertson, LASR Labs

5mo

Applications for this programme have now closed. To hear about future rounds you can express interest here

TLDR; apply to join a 13-week research programme in AI safety. You’ll write a technical paper in a team of 3-4 with supervision from an experienced researcher. The programme is full-time in London.

About LASR:

London AI Safety Research (LASR) Labs is an AI safety research programme focused on reducing the risk of existential risk from advanced AI. We focus on action-relevant questions tackling concrete threat models.

LASR participants are matched into teams of 3-4 and work with a supervisor to write an academic-style paper, with support and management from LASR.

We expect LASR Labs to be a good fit for... (read 829 more words →)

LASR Labs Spring 2025 applications are open!

Erin Robertson

Erin Robertson, charlie_griffin, joehardie, Justin Olive, LASR Labs

Edit: Applications for this round are now closed! If you are interested in future rounds, you can express interest here.

TLDR; apply by October 27th to join a 13-week research programme in AI safety. You’ll write a technical paper in a team of 3-4 with supervision from an experienced researcher. The programme is full-time in London.

Apply to be a participant here. We’re also looking for a programme manager, and you can read more about the role here.

London AI Safety Research (LASR) Labs (previously run as AI Safety Hub Labs) is an AI safety research programme focussed on reducing the risk of loss of control to advanced AI. We focus on action-relevant questions tackling concrete threat models.

LASR participants are... (read 1168 more words →)

Apply to LASR Labs: a London-based technical AI safety research programme

Erin Robertson

Erin Robertson, charlie_griffin, joehardie, LASR Labs

Edit: Applications for this round are now closed! If you are interested in future rounds, you can express interest here.

TLDR; apply by April 24th 23:59 GMT+1 to join a 12-week programme and write a technical AI safety paper in a team of 4 with supervision from an experienced researcher. Work full time from the LISA offices in London, alongside AI safety organisations including Apollo Research, Bluedot Impact and Leap Labs.

Apply to be a participant here

Express interest in being a supervisor here

London AI Safety Research (LASR) Labs (previously run as AI Safety Hub Labs) is a research programme where participants will work in small teams to publish a paper and accompanying blog post contributing... (read 788 more words →)

LESSWRONG
LW

LESSWRONG
LW

LASR Labs

Subliminal Learning Across Models

Apply to LASR Labs: a London-based technical AI safety research programme

LASR Labs Spring 2025 applications are open!

Current LLM agents need strong pressure to engage in scheming behavior

LASR Labs

Subliminal Learning Across Models

Current LLM agents need strong pressure to engage in scheming behavior

LASR Labs: Applications Open

LASR Labs Spring 2025 applications are open!

Apply to LASR Labs: a London-based technical AI safety research programme

LASR Labs

Subliminal Learning Across Models

Apply to LASR Labs: a London-based technical AI safety research programme

LASR Labs Spring 2025 applications are open!

Current LLM agents need strong pressure to engage in scheming behavior

LASR Labs

Subliminal Learning Across Models

Current LLM agents need strong pressure to engage in scheming behavior

LASR Labs: Applications Open

LASR Labs Spring 2025 applications are open!

Apply to LASR Labs: a London-based technical AI safety research programme

Executive Summary

About LASR:

Edit: Applications for this round are now closed! If you are interested in future rounds, you can express interest here.