by RobertM1 min read25th Jan 20231 comment
This is a special post for short-form writing by RobertM. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.
1 comment, sorted by Click to highlight new comments since: Today at 4:21 PM

We have models that demonstrate superhuman performance in some domains without then taking over the world to optimize anything further. "When and why does this stop being safe" might be an interesting frame if you find yourself stuck.

New to LessWrong?