Daniel Paleka

A/B testing could lead LLMs to retain users instead of helping them

OpenAI’s updates of GPT-4o in April 2025 famously induced absurd levels of sycophancy: the model would agree with everything users would say, no matter how outrageous. After they fixed it, OpenAI released a postmortem; and while widely discussed, I find it curious that this sentence received little attention: > Similarly,...

Nov 4, 202528

Daniel Paleka

Daniel Paleka

Large-Scale Online Deanonymization with LLMs

You should delay engineering-heavy research in light of R&D automation

A/B testing could lead LLMs to retain users instead of helping them

Evaluating Superhuman Models with Consistency Checks

Daniel Paleka

Large-Scale Online Deanonymization with LLMs

You should delay engineering-heavy research in light of R&D automation

A/B testing could lead LLMs to retain users instead of helping them

Evaluating Superhuman Models with Consistency Checks

Large-Scale Online Deanonymization with LLMs

A/B testing could lead LLMs to retain users instead of helping them

Daniel Paleka's Shortform

You should delay engineering-heavy research in light of R&D automation

Evaluating Superhuman Models with Consistency Checks

My SERI MATS Application