Noah Topper — LessWrong

A Critique of Functional Decision Theory

Second, what decision theory does best, if run by an agent, depends crucially on what the world is like. To see this, let’s go back to question that Y&S ask of what decision theory I’d want my child to have. This depends on a whole bunch of empirical facts: if she might have a gene that causes cancer, I’d hope that she adopts EDT; though if, for some reason, I knew whether or not she did have that gene and she didn’t, I’d hope that she adopts CDT. Similarly, if there were long-dead predictors who can no longer influence the way the world is today, then, if I didn’t know what was in the opaque boxes, I’d hope that she adopts EDT (or FDT); if I did know what was in the opaque boxes (and she didn’t) I’d hope that she adopts CDT. Or, if I’m in a world where FDT-ers are burned at the stake, I’d hope that she adopts anything other than FDT.

I think there's a lot wrong here, but I'm particularly surprised by Will's claim that he'd want his daughter to follow EDT in the world where a gene might cause cancer. Once she's born and has the gene (or doesn't), the decision theory she follows after that point makes absolutely no difference. I assume he's thinking about the smoking lesions problem here. In such a world, I might hope my daughter doesn't have the desire to smoke, but I wouldn't hope she follows EDT! What difference would that possibly make?

I likewise would want my child to follow FDT in both the opaque and transparent Newcomb's problem, so I wouldn't want her to follow CDT in the case where I know the box contents and she doesn't. And the burning-at-the-stake world is just silly and unfair.

The Best Reference Works for Every Subject

Noah Topper7mo80

Domain: Mathematics

Link: Teach Yourself Logic

Author(s): Peter Smith

Type: Study Guide

Why: Extremely thorough guide to logic textbooks from start to finish. Compares pros and cons of various books, tells you what parts you can skip, and identifies books with good exercises.

The Best Tacit Knowledge Videos on Every Subject

Noah Topper1y32

Domain: Miscellaneous (Running D&D/RPGs)

Link: Your First Adventure, Your First Town, Prepping an Adventure, Sandboxing

Person: Matt Colville

Why: Matt's videos are really informative and his primary goal is to show that DMing is a straightforward thing that anyone can do, not something reserved for certain members of the tabletop RPG hobby. I've tried to pick out a few of his videos that are especially hands-on, but he has lots more which are very useful. I recommend exploring them at your leisure if you're into TTRPGs at all.

ACX/LW Meetup

Noah Topper3y20

Thanks for coming. :)

Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why?

Answer by Noah TopperFeb 09, 20237-2

I am confused by your confusion. Your basic question is "what is the source of the adversarial selection". The answer is "the system itself" (or in some cases, the training/search procedure that produces the system satisfying your specification). In your linked comment, you say "There's no malicious ghost trying to exploit weaknesses in our alignment techniques." I think you've basically hit on the crux, there. The "adversarially robust" frame is essentially saying you should think about the problem in exactly this way.

I think Eliezer has conceded that Stuart Russel puts the point best. It goes something like: "If you have an optimization process in which you forget to specify every variable that you care about, then unspecified variables are likely to be set to extreme values." I would tack on that due to the fragility of human value, it's much easier to set such a variable to an extremely bad value than an extremely good one.

Basically, however the goal of the system is specified or represented, you should ask yourself if there's some way to satisfy that goal in a way that doesn't actually do what you want. Because if there is, and it's simpler than what you actually wanted, then that's what will happen instead. (Side note: the system won't literally do something just because you hate it. But the same is true for other Goodhart examples. Companies in the Soviet Union didn't game the targets because they hated the government, but because it was the simplest way to satisfy the goal as given.)

"If the system is trying/wants to break its safety properties, then it's not safe/you've already made a massive mistake somewhere else." I mean, yes, definitely. Eliezer makes this point a lot in some Arbital articles, saying stuff like "If the system is spending computation searching for things to harm you or thwart your safety protocols, then you are doing the fundamentally wrong thing with your computation and you should do something else instead." The question is how to do so.

Also from your linked comment: "Cybersecurity requires adversarial robustness, intent alignment does not." Okay, but if you come up with some scheme to achieve intent alignment, you should naturally ask "Is there a way to game this scheme and not actually do what I intended?" Take this Arbital article on the problem of fully-updated deference. Moral uncertainty has been proposed as a solution to intent alignment. If the system is uncertain as to your true goals, then it will hopefully be deferential. But the article lays out a way the system might game the proposal. If the agent can maximize its meta-utility function over what it thinks we might value, and still not do what we want, then clearly this proposal is insufficient.

If you propose an intent alignment scheme such that when we ask "Is there any way the system could satisfy this scheme and still be trying to harm us?", the answer is "No", then congrats, you've solved the adversarial robustness problem! That seems to me to be the goal and the point of this way of thinking.

Why you should learn sign language

Noah Topper3y*10

I mean, fair enough, but I can't weigh it up against every other opportunity available to you on your behalf. I did try to compare it to learning other languages. I'll toss into the post that I also think it's comparatively easy to learn.

Why you should learn sign language

Noah Topper3y-1-2

FWIW I genuinely think ASL is easy to learn with the videos I linked above. Overall I think sign is more worthwhile to learn than most other languages, but yes, not some overwhelming necessity. Just very personally enriching and neat. :)

Why you should learn sign language

Noah Topper3y10

Thanks for the feedback!

Why you should learn sign language

Noah Topper3y40

It's entirely just a neat thing. I think most people should consider learning to sign, and the idea of it becoming a rationalist "thing" just sounded fun to me. I did try to make that clear, but apologies if it wasn't. And as I said, sorry this is kind of off topic, it's just been a thing bouncing around in my head.

Why you should learn sign language

Noah Topper3y40

Honestly I found ASL easier to learn than, say, the limited Spanish I tried to learn in high school. Maybe because it doesn't conflict with the current way you communicate. Just from watching the ASL 1 - 4 lectures I linked to, I was surprisingly able to manage once dropped in a one-on-one conversation with a deaf person.

It would definitely be good to learn with a buddy. My wife hasn't explicitly learned it yet, but she's picked up some from me. Israel is a tough choice, I'm not sure what the learning resources are like for it.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments