User Profile

star2697
description63
message914

Recent Posts

Curated Posts
starCurated - Recent, high quality posts selected by the LessWrong moderation team.
rss_feed Create an RSS Feed
Frontpage Posts
Posts meeting our frontpage guidelines: • interesting, insightful, useful • aim to explain, not to persuade • avoid meta discussion • relevant to people whether or not they are involved with the LessWrong community.
(includes curated content and frontpage posts)
rss_feed Create an RSS Feed
All Posts
personIncludes personal and meta blogposts (as well as curated and frontpage).
rss_feed Create an RSS Feed

Open question: are minimal circuits daemon-free?

15d
2 min read
Show Highlightsubdirectory_arrow_left
41

Weird question: could we see distant aliens?

1mo
3 min read
Show Highlightsubdirectory_arrow_left
74

[Link] Implicit extortion

1mo
6 min read
Show Highlightsubdirectory_arrow_left
15

Prize for probable problems

2mo
3 min read
Show Highlightsubdirectory_arrow_left
62

Argument, intuition, and recursion

3mo
8 min read
Show Highlightsubdirectory_arrow_left
10

[Link] Funding for AI alignment research

3mo
1 min read
Show Highlightsubdirectory_arrow_left
12

The abruptness of nuclear weapons

3mo
1 min read
Show Highlightsubdirectory_arrow_left
34

[Link] Arguments about fast takeoff

3mo
1 min read
Show Highlightsubdirectory_arrow_left
54

Crowdsourcing moderation without sacrificing quality

1y
Show Highlightsubdirectory_arrow_left
26

Optimizing the news feed

1y
Show Highlightsubdirectory_arrow_left
12

Recent Comments

> My understanding is that Paul thinks breaking the evolution analogy is important, but a lot less difficult than Eliezer thinks it is

My basic take on the evolution analogy:

* Evolution wasn't trying to solve the robustness problem at all. It's analogous to using existing ML while making _zero ...(read more)

> googling around, I wasn't able to quickly find any papers or blog posts supporting it

I think it's a little bit tricky because decision trees don't work that well for the tasks where people usually study adversarial examples. And this isn't my research area so I don't know much about it.

That sa...(read more)

> I guess my point is that there are open questions about how to protect against value drift caused by AI, what the AI should do when the user doesn't have much idea of how they want their values to be pushed around, and how to get the AI to competently help the user with moral questions, which seem...(read more)

> Has anyone spent a timed 5 minutes trying to figure out, say, how vulnerable gcForest is likely to be to adversarial examples?

Yes. (Answer: deep learning is not unusually susceptible to adversarial examples.)

> 5 minutes of research is enough to determine that creating models which "correctly c...(read more)

> Then you could come up with a list of desiderata we seek in a paradigm: resistance to adversarial examples, robustness to distributional shift, interpretability, conservative concepts, calibration, etc.

For most of these examples, the current research in safety is more like "Try to find _any_ app...(read more)

> don't understand how imitation+RL brings Amplification closer to Debate

The default setup for amplification with RL is:

* Your AI samples two answers to a question. * The human evaluates which one of them is better. The AI's objective is to sample answers that are most likely to be marked as...(read more)

> In the reverse direction amplification mostly seems less adversarial since it's pure supervised learning

Note that you could do amplification with either supervised learning or imitation or RL as the distillation step, in the long run I imagine using imitation+RL, which brings it closer to debate...(read more)

I don't see why to separate 1/2, the goal is to find training data that describes some "universal" core for behavior.

3\. I don't think you need to know the training distribution. You just need something that points you back in the direction of the universal core where the human model is competent,...(read more)

I don't know what the statement of the theorem would be. I don't really think we'd have a clean definition of "contains daemons" and then have a proof that a particular circuit doesn't contain daemons.

Also I expect we're going to have to make some assumption that the problem is "generic" (or else ...(read more)

Suppose "predict well" means "guess the output with sufficiently high probability," and the noise is just to replace the output with something random 5% of the time.