Scott Garrabrant


Scott Garrabrant's Comments

Honoring Petrov Day on LessWrong, in 2019
Conspiracy theory: There are no launch codes. People who claim to have launch codes are lying. The real test is whether people will press the button at all. I have failed that test. I came up with this conspiracy theory ~250 milliseconds after pressing the button.

Oh no! Someone is wrong on the internet, and I have the ability to prove them wrong...

Honoring Petrov Day on LessWrong, in 2019

Did you consider the unilateralist curse before making this comment?

Do you consider it to be a bad idea if you condition the assumption that only one other person with launch access who sees this post in the time window choose to say it was a bad idea?

Honoring Petrov Day on LessWrong, in 2019
If any users do submit a set of launch codes, tomorrow I’ll publish their identifying details.

If we make it through this, here are some ideas to make it more realistic next year:

1) Anonymous codes.

2) Karma bounty for the first person to press the button.

1+2) Randomly and publicly give some people the same code as each other, and give a karma bounty to everyone who had the code that took down the site.

3) Anyone with button rights can share button rights with anyone, and a karma bounty for sharing with the most other people that only pays out if nobody presses the button.

Why Subagents?

Not sure if you've seen it, but this paper by Critch and Russell might be relevant when you start thinking about uncertainty.

AI Alignment Writing Day Roundup #1

This is my favorite comment. Thank you.

Does Agent-like Behavior Imply Agent-like Architecture?

I think I do want to make my agent-like architecture general enough to include evolution. However, there might be a spectrum of agent-like-ness such that you can't get much more than Sphex behavior with just evolution (without having a mesa-optimizer in there)

I think you can guarantee that, probabilistically, getting a specific outcome requires information about that outcome (no free lunch), which implies "search" on a "world model."

Yeah, but do you think you can make it feel more like a formal proof?

Intentional Bucket Errors

I think there is a possible culture where people say a bunch of inside-view things, and run with speculations all the time, and another possible culture where people mostly only say literally true things that can be put into the listener's head directly. (I associate these cultures with the books R:A-Z and superintelligence respectively.) In the first culture, I don't feel the need to defend myself. However I feel like I am often also interacting with people from the second culture, and that makes me feel like I need a disclaimer before I think in public with speculation that conflates a bunch of concepts.

Computational Model: Causal Diagrams with Symmetry

Were you summoned by this post accidentally using your true name?

Steelmanning Divination

Nitpick: conservation of expected evidence does not seem to me like why you can’t do divination with a random number generator.

Is there a difference between uncertainty over your utility function and uncertainty over outcomes?

You are just normalizing on the dollar. You could ask "how many chickens would I kill to save a human life" instead, and you would normalize on a chicken.

Load More