User Profile

star1330
description99
message2945

Recent Posts

Curated Posts
starCurated - Recent, high quality posts selected by the LessWrong moderation team.
rss_feed Create an RSS Feed
Frontpage Posts
Posts meeting our frontpage guidelines: • interesting, insightful, useful • aim to explain, not to persuade • avoid meta discussion • relevant to people whether or not they are involved with the LessWrong community.
(includes curated content and frontpage posts)
rss_feed Create an RSS Feed
Personal Blogposts
personPersonal blogposts by LessWrong users (as well as curated and frontpage).
rss_feed Create an RSS Feed

Can corrigibility be learned safely?

18d
3 min read
Show Highlightsubdirectory_arrow_left
82

Multiplicity of "enlightenment" states and contemplative practices

1mo
2 min read
Show Highlightsubdirectory_arrow_left
4

Combining Prediction Technologies to Help Moderate Discussions

1y
Show Highlightsubdirectory_arrow_left
15

[link] Baidu cheats in an AI contest in order to gain a 0.24% advantage

3y
Show Highlightsubdirectory_arrow_left
32

Is the potential astronomical waste in our universe too small to care about?

3y
Show Highlightsubdirectory_arrow_left
14

What is the difference between rationality and intelligence?

4y
Show Highlightsubdirectory_arrow_left
52

Six Plausible Meta-Ethical Alternatives

4y
Show Highlightsubdirectory_arrow_left
36

Look for the Next Tech Gold Rush?

4y
Show Highlightsubdirectory_arrow_left
115

Outside View(s) and MIRI's FAI Endgame

5y
Show Highlightsubdirectory_arrow_left
60

Three Approaches to "Friendliness"

5y
Show Highlightsubdirectory_arrow_left
86

Recent Comments

> [Note: I probably like explicit reasoning a lot more than most people, so keep that in mind.] Great! We need more people like you to help drive this forward. For example I think we desperately need explicit, worked out examples of meta-execution (see my request to Paul [here](https://www.lesswr...(read more)

> But in meta-execution the type signature is a giant tree of messages (which can be compressed by an approval-directed encoder); I don’t see how to use that type of “value” with any value-learning approach not based on amplification (and I don’t see what other type of “value” is plausible). In t...(read more)

I think you're right that it's not a binary HBO=human like, LBO=not human like, but rather that as the overseer's bandwidth goes down from an unrestricted human to HBO to LBO, our naive intuitions about what a large group of humans can do become less and less applicable, so we need to use explicit r...(read more)

Is there an alternative thing they could maximize for that would be considered corrigible?

Paul answered this question in this thread.

There's another issue with voting, which is that I sometimes find a comment or post on the LW1 part of the site that I want to vote up or down, but I can't because my 5 points of karma power would totally mess up the score of that comment/post in relation to its neighbors. I hadn't mentioned this be...(read more)

(This comment is being reposted to be under the right parent.)

> Error recovery could be supported by having a parent agent running multiple versions of a query in parallel with different approaches (or different random seeds).

This doesn't seem to help in the case of H misunderstanding the meanin...(read more)

I got the sense from Dario that he has no plans to publish the document in the foreseeable future.

<blockquote> It&#x27;s not enough to represent uncertainty about their values, you also need to represent the fact that V is supposed to be *their* values, in order to include what counts as VOI.</blockquote><p>Ah, ok. </p><blockquote>To answer &quot;What should I do if the user&#x27;s values are {V...(read more)

> Thanks for writing this, Will! I think it’s a good + clear explanation, and “high/​low-bandwidth oversight” seems like a useful pair of labels. Seconded! (I said this to William privately while helping to review his draft, and just realized that I should also say it publicly so more people will...(read more)

This is a really good example of hard communication can be. When I read

> For example, suppose meta-execution asks the subquestion “What does the user want?”, gets a representation of their values, and then asks the subquestion “What behavior is best according to those values?”

I assumed that "rep...(read more)