User Profile


Recent Posts

Curated Posts
starCurated - Recent, high quality posts selected by the LessWrong moderation team.
rss_feed Create an RSS Feed
Frontpage Posts
Posts meeting our frontpage guidelines: • interesting, insightful, useful • aim to explain, not to persuade • avoid meta discussion • relevant to people whether or not they are involved with the LessWrong community.
(includes curated content and frontpage posts)
rss_feed Create an RSS Feed
Personal Blogposts
personPersonal blogposts by LessWrong users (as well as curated and frontpage).
rss_feed Create an RSS Feed

Can corrigibility be learned safely?

3 min read
Show Highlightsubdirectory_arrow_left

Multiplicity of "enlightenment" states and contemplative practices

2 min read
Show Highlightsubdirectory_arrow_left

Combining Prediction Technologies to Help Moderate Discussions

Show Highlightsubdirectory_arrow_left

[link] Baidu cheats in an AI contest in order to gain a 0.24% advantage

Show Highlightsubdirectory_arrow_left

Is the potential astronomical waste in our universe too small to care about?

Show Highlightsubdirectory_arrow_left

What is the difference between rationality and intelligence?

Show Highlightsubdirectory_arrow_left

Six Plausible Meta-Ethical Alternatives

Show Highlightsubdirectory_arrow_left

Look for the Next Tech Gold Rush?

Show Highlightsubdirectory_arrow_left

Outside View(s) and MIRI's FAI Endgame

Show Highlightsubdirectory_arrow_left

Three Approaches to "Friendliness"

Show Highlightsubdirectory_arrow_left

Recent Comments

> [Note: I probably like explicit reasoning a lot more than most people, so keep that in mind.] Great! We need more people like you to help drive this forward. For example I think we desperately need explicit, worked out examples of meta-execution (see my request to Paul [here](https://www.lesswr...(read more)

> But in meta-execution the type signature is a giant tree of messages (which can be compressed by an approval-directed encoder); I don’t see how to use that type of “value” with any value-learning approach not based on amplification (and I don’t see what other type of “value” is plausible). In t...(read more)

I think you're right that it's not a binary HBO=human like, LBO=not human like, but rather that as the overseer's bandwidth goes down from an unrestricted human to HBO to LBO, our naive intuitions about what a large group of humans can do become less and less applicable, so we need to use explicit r...(read more)

Is there an alternative thing they could maximize for that would be considered corrigible?

Paul answered this question in this thread.

There's another issue with voting, which is that I sometimes find a comment or post on the LW1 part of the site that I want to vote up or down, but I can't because my 5 points of karma power would totally mess up the score of that comment/post in relation to its neighbors. I hadn't mentioned this be...(read more)

(This comment is being reposted to be under the right parent.)

> Error recovery could be supported by having a parent agent running multiple versions of a query in parallel with different approaches (or different random seeds).

This doesn't seem to help in the case of H misunderstanding the meanin...(read more)

I got the sense from Dario that he has no plans to publish the document in the foreseeable future.

<blockquote> It&#x27;s not enough to represent uncertainty about their values, you also need to represent the fact that V is supposed to be *their* values, in order to include what counts as VOI.</blockquote><p>Ah, ok. </p><blockquote>To answer &quot;What should I do if the user&#x27;s values are {V...(read more)

> Thanks for writing this, Will! I think it’s a good + clear explanation, and “high/​low-bandwidth oversight” seems like a useful pair of labels. Seconded! (I said this to William privately while helping to review his draft, and just realized that I should also say it publicly so more people will...(read more)

This is a really good example of hard communication can be. When I read

> For example, suppose meta-execution asks the subquestion “What does the user want?”, gets a representation of their values, and then asks the subquestion “What behavior is best according to those values?”

I assumed that "rep...(read more)