This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Tags
LW
Login
Mesa-Optimization
•
Applied to
Simple experiments with deceptive alignment
by
Andreas_Moe
1mo
ago
•
Applied to
Consequentialism is in the Stars not Ourselves
by
DragonGod
2mo
ago
•
Applied to
Towards a solution to the alignment problem via objective detection and evaluation
by
Paul Colognese
2mo
ago
•
Applied to
Gradient Descent in Activation Space: a Tale of Two Papers
by
Blaine
2mo
ago
•
Applied to
GPT-4 is bad at strategic thinking
by
Christopher King
3mo
ago
•
Applied to
GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2
by
Christopher King
3mo
ago
•
Applied to
More experiments in GPT-4 agency: writing memos
by
Christopher King
3mo
ago
•
Applied to
Does GPT-4 exhibit agency when summarizing articles?
by
Christopher King
3mo
ago
•
Applied to
A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world!
by
Christopher King
3mo
ago
•
Applied to
Imagine a world where Microsoft employees used Bing
by
Christopher King
3mo
ago
•
Applied to
It Can't Be Mesa-Optimizers All The Way Down (Or Else It Can't Be Long-Term Supercoherence?)
by
Austin Witte
3mo
ago
•
Applied to
Clarifying mesa-optimization
by
Marius Hobbhahn
3mo
ago
•
Applied to
Powerful mesa-optimisation is already here
by
Roman Leventov
4mo
ago
•
Applied to
Why almost every RL agent does learned optimization
by
Lee Sharkey
4mo
ago
•
Applied to
Anomalous tokens reveal the original identities of Instruct models
by
janus
4mo
ago
•
Applied to
Medical Image Registration: The obscure field where Deep Mesaoptimizers are already at the top of the benchmarks. (post + colab notebook)
by
Hastings
5mo
ago
•
Applied to
Against Boltzmann mesaoptimizers
by
porby
5mo
ago