x

LESSWRONG

LW

Manqing Liu — LessWrong

Manqing Liu

Manqing Liu

Message

7

2

1

1y

Manqing Liu

7

1y

How agents fail

I built a safety evaluation framework for LLM agents that have access to shell commands, file systems, and inter-agent communication. I ran 17 scenarios against two Claude models (Sonnet not Opus due to limited budget). Still, the results are very educative and surprising. The failures aren't dramatic. No agent tried...

What Can Wittgenstein Teach Us About LLM Safety Research?

The biggest event in 2025 for me is entering the field of AI safety research. In particular, I work with collaborators at Geodesic Research on designing a suite of health metrics to evaluate the "pathologies" in the chain-of-thought reasoning of Large Language Models, such as post-hoc, internalized, and encoded reasoning....

Dec 23, 2025•8