Scott Garrabrant


Scott Garrabrant's Comments

What's the upper bound of how long COVID is contagious?

You should (more strongly?) disambiguate between how long after being sick are you safe, or how long after being 100% isolated are you safe.

Voting Phase of 2018 LW Review

Is it pro-social or anti-social to vote on posts I have skimmed but not read?

Humans Are Embedded Agents Too

We actually avoided talking about AI in most of the cartoon, and tried to just imply it by having a picture of a robot.

The first time (I think) I presented the factoring in the embedded agency sequence was at a MIRI CFAR collaboration workshop, so parallels with humans was live in my thinking.

The first time we presented the cartoon in roughly its current form was at MSFP 2018, where we purposely did it on the first night before a CFAR workshop, so people could draw analogies that might help them transfer their curiosity in both directions.

Honoring Petrov Day on LessWrong, in 2019
Conspiracy theory: There are no launch codes. People who claim to have launch codes are lying. The real test is whether people will press the button at all. I have failed that test. I came up with this conspiracy theory ~250 milliseconds after pressing the button.

Oh no! Someone is wrong on the internet, and I have the ability to prove them wrong...

Honoring Petrov Day on LessWrong, in 2019

Did you consider the unilateralist curse before making this comment?

Do you consider it to be a bad idea if you condition the assumption that only one other person with launch access who sees this post in the time window choose to say it was a bad idea?

Honoring Petrov Day on LessWrong, in 2019
If any users do submit a set of launch codes, tomorrow I’ll publish their identifying details.

If we make it through this, here are some ideas to make it more realistic next year:

1) Anonymous codes.

2) Karma bounty for the first person to press the button.

1+2) Randomly and publicly give some people the same code as each other, and give a karma bounty to everyone who had the code that took down the site.

3) Anyone with button rights can share button rights with anyone, and a karma bounty for sharing with the most other people that only pays out if nobody presses the button.

Why Subagents?

Not sure if you've seen it, but this paper by Critch and Russell might be relevant when you start thinking about uncertainty.

AI Alignment Writing Day Roundup #1

This is my favorite comment. Thank you.

Does Agent-like Behavior Imply Agent-like Architecture?

I think I do want to make my agent-like architecture general enough to include evolution. However, there might be a spectrum of agent-like-ness such that you can't get much more than Sphex behavior with just evolution (without having a mesa-optimizer in there)

I think you can guarantee that, probabilistically, getting a specific outcome requires information about that outcome (no free lunch), which implies "search" on a "world model."

Yeah, but do you think you can make it feel more like a formal proof?

Intentional Bucket Errors

I think there is a possible culture where people say a bunch of inside-view things, and run with speculations all the time, and another possible culture where people mostly only say literally true things that can be put into the listener's head directly. (I associate these cultures with the books R:A-Z and superintelligence respectively.) In the first culture, I don't feel the need to defend myself. However I feel like I am often also interacting with people from the second culture, and that makes me feel like I need a disclaimer before I think in public with speculation that conflates a bunch of concepts.

Load More