x

LESSWRONG

LW

Sruthi Kuriakose

Sruthi Kuriakose

Message

58

1

6y

Sruthi Kuriakose

58

6y

Sruthi Kuriakose — LessWrong

Training Language Models for Controlled Stochasticity

Pseudo-random generation solved the problem of sampling from mathematical distributions on computers. Faced with a similar need in natural language, it feels natural to simply ask an LLM: "Name a random city" but this leads to the disappointing discovery that language models are heavily biased and suffer from mode collapse....

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

by Roland Pihlakas, Sruthi Kuriakose, and shrutidattagupta

By Roland Pihlakas, Sruthi Susan Kuriakose, Shruti Datta Gupta This research is now available as an Arxiv preprint at https://arxiv.org/abs/2509.02655 . The paper includes tables with select snippets of failure mode examples. Summary and Key Takeaways Relatively many past AI safety discussions have centered around the dangers of unbounded utility...

Mar 16, 2025•45