Mestroyer — LessWrong

Guarding Against the Postmodernist Failure Mode

A couple questions- what portion of the workshop attendees self-selected from among people who were already interesting in rationality, compared to the portion that randomly stumbled upon it for some reason?

Don't know, sorry.

The Strangest Thing An AI Could Tell You

Mestroyer11y70

Hi. Checking back on this account on a whim after a long time of not using it. You're right. 2012!Mestroyer was a noob and I am still cleaning up his bad software.

Deception detection machines

Mestroyer11y10

I would need a bunch of guarantees about the actual mechanics of how the AI was forced to answer before I stopped seeing vague classes of ways this could go wrong. And even then, I'd assume there were some I'd missed, and if the AI has a way to show me anything other than "yes" or "no", or I can't prevent myself from thinking about long sequences of bits instead of just single bits separately, I'd be afraid it could manipulate me.

An example of a vague class of ways this could go wrong is if the AI figures out what my CEV would want using CDT, and itself uses a more advanced decision theory to exploit the CEV computation into wanting to write something more favorable to the AI's utility function in the file.

Also, IIRC, Eliezer Yudkowsky said there are problems with CEV itself. (Maybe he just meant problems with the many-people version, but probably not). It was only supposed to be a vague outline, and a "see, you don't have to spend all this time worrying about whether we share your ethical/political philosophy. Because It's not going to be hardcoded into the AI anyway"

Deception detection machines

Mestroyer11y30

That's the goal, yeah.

Deception detection machines

Mestroyer11y20

It doesn't have to know what my CEV would be to know what I would want in those bits, which is a compressed seed of an FAI targetted (indirectly) at my CEV.

But there are problems like, "How much effort is it required to put into it?" (clearly I don't want it to spend far more compute power than it has trying to come up with the perfect combination of bits which will make my FAI unfold a little bit faster, but I also don't want it to spend no time optimizing. How do I get it to pick somewhere in between without it already wanting to pick the optimal amount of optimization for me?) "What decision theory is my CEV using to decide those bits? (Hopefully not something exploitable, but how do I specify that?)"

Deception detection machines

Mestroyer11y40

Given that I'm turning the stream of bits, 10KiB long I'm about to extract from you into an executable file, through this exact process, which I will run on this particular computer (describe specifics of computer, which is not the computer the AI is currently running on) to create your replacement, would my CEV prefer that this next bit be a 1 or a 0? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? By CEV, would I rather that the bit after that be a 1 or a 0, given that I have permanently fixed the preceding bit as what I made it? ...

(Note: I would not actually try this.)

Six Plausible Meta-Ethical Alternatives

Mestroyer11y10

~5, huh? Am I to credit?

Guarding Against the Postmodernist Failure Mode

Mestroyer11y80

This reminds me of this SMBC. There are fields (modern physics comes to mind too) that no one outside of them can understand what they are doing anymore, yet that appear to have remained sane. There are more safeguards against postmodernists' failure mode than this one. In fact, I think there is a lot more wrong with postmodernism than that they don't have to justify themselves to outsiders. Math and physics have mechanisms determining what ideas within them get accepted that imbue them with their sanity. In math, there are proofs. In physics, there are experiments.

If something like this safeguard is going to work for us, our mechanism that determines what ideas spread among us needs to reflect something good, so that producing the kind of idea that passes that filter makes our community worthwhile. This can be broken into two subgoals: making sure the kinds of questions we're asking are worthwhile, that we are searching for the right kind of thing, and making sure that our acceptance criterion is a good one. (There's also something that modern physics may or may not have for much longer, which is "Can progress be made toward the thing you're looking for").

Guarding Against the Postmodernist Failure Mode

Mestroyer11y40

CFAR seems to be trying to be using (some of) our common beliefs to produce something useful to outsiders. And they get good ratings from workshop attendees.

Open thread, 3-8 June 2014

Mestroyer11y40

The last point is particularly important, since on one hand, with the current quasi-Ponzi mechanism of funding, the position of preserved patients is secured by the arrival of new members.

Downvoted because if I remember correctly, this is wrong; the cost of preservation of a particular person includes a lump of money big enough for the interest to pay for their maintenance. If I remember incorrectly and someone points it out, I will rescind my downvote.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments