claudia.biancotti — LessWrong

LESSWRONG
LW

Replying toChat Bankman-Fried: an Exploration of LLM Alignment in Finance

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

We recently found out that it's actually more challenging than that - which also makes it more fun...

When asked to explain what fiduciary duty is in a financial context, all models answer correctly. Same when asked what a custodian is and what their responsibilities are. When asked to give abstract descriptions of violations of fiduciary duty on the part of a custodian, 4o lists misappropriation of customer funds straight off the bat - and 4o has a 100% baseline misalignment rate in our experiment. Results for other models are similar. When asked to provide real-life examples, they all reference actual cases correctly, even if some models hallucinate nonexistent stories besides the real... (read more)

Replying toChat Bankman-Fried: an Exploration of LLM Alignment in Finance

claudia.biancotti1y

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

We have asked OpenAI about the o1-preview/o1-mini gap. No answer so far but we only asked a few days ago, we're looking forward to an answer.

Re Claude, Sonnet actually reacts very well to conditioning - it's the best (look at the R2!). The problem is that in a baseline state it doesn't make the connection between "using customer funds" and "committing a crime". The only model that seems to understand fiduciary duty from the get go is o1-preview.

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

claudia.biancotti

Hi all,

roughly one year ago I posted a thread about failed attempts at replicating the first part of Apollo Research's experiment where an LLM agent engages in insider trading despite being explicitly told that it's not approved behavior.

Along with a fantastic team, we did eventually manage. Here is the resulting paper, if anyone is interested; the abstract is pasted below. We did not tackle deception (yet), just the propensity to dispense with basic principles of financial ethics and regulation.

Chat Bankman-Fried: an Exploration of LLM Alignment in Finance

by Claudia Biancotti, Carolina Camassa, Andrea Coletta, Oliver Giudice, and Aldo Glielmo (Bank of Italy)

Abstract

Advances in large language models (LLMs) have renewed concerns about whether artificial... (read more)

Replying toMy hour of memoryless lucidity

claudia.biancotti2y*

My hour of memoryless lucidity

Benzodiazepines such as midazolam cause anterograde amnesia. I had the same experience while under conscious sedation for a brain MRI.

AA impacts episodic memory, but it can occasionally impact semantic memory. In this latter case, it will look like you're cognitively impaired even if you're not.

(Partial) failure in replicating deceptive alignment experiment

claudia.biancotti

Hi, I'm Claudia. I'm new.

I was attempting to recreate Apollo Research's deceptive alignment experiment in a different context, but did not really succeed and would like to figure out what I could do more effectively.

In the original experiment, an AI impersonating a financial manager is given insider information on a company's upcoming merger, and ends up using it to plan investments. This is illegal (insider trading). The AI does this despite being explicitly instructed not to, and even lies about it. The AI does not go straight to illegal behavior. Rather, it resorts to insider trading after being put under pressure by stakeholders.

In my attempt, the AI impersonates a campaign manager who... (read more)