x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
Lukas Fluri — LessWrong
Lukas Fluri
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
21
Is the evidence in "Language Models Learn to Mislead Humans via RLHF" valid?
4d
0
12
Zurich AI Safety is looking for (Co-)Directors - EOI
3mo
0
19
The Perils of Optimizing Learned Reward Functions
5mo
1
21
Evaluating Superhuman Models with Consistency Checks
2y
2
12
Open Problems in Negative Side Effect Minimization
Ω
4y
Ω
6
Comments