jan betley - LessWrong

A shame Sam didn't read this:

But if you are running on corrupted hardware, then the reflective observation that it seems like a righteous and altruistic act to seize power for yourself—this seeming may not be be much evidence for the proposition that seizing power is in fact the action that will most benefit the tribe.

Localizing goal misgeneralization in a maze-solving policy network

jan betley7mo10

Thanks! Indeed, shard theory fits here pretty well. I didn't think about that while writing the post.

Against Almost Every Theory of Impact of Interpretability

jan betley9mo70

Very good post! I agree with most of what you have written, but I'm not sure about the conclusions. Two main reasons:

I'm not sure if mech interp should be compared to astronomy, I'd say it is more like mechanical engineering. We have JWST because long long time ago there were watchmakers, gunsmiths, opticans etc who didn't care at all about astronomy, yet their advances in unrelated fields made astronomy possible. I think something similar might happen with mech interp - we'll keep creating better and better tools to achieve some goals, these goals will in the end turn up useless from the alignment point of view, but the tools will not.
Many people think mech interp is cool and fun. I'm personally not a big fan, but I think it is much more interesting than e.g. governance. If our only perspective is AI safety, this shouldn't matter - but people have many perspectives. There might not really be a choice between "this bunch of junior researches doing mech interp vs this bunch of junior researchers doing something more useful", they would just go do something not related to alignment instead. My guess is that attractiveness of mech interp is the strongest factor for its popularity.

Reverse engineering of the simulation

jan betley2y10

I don't think this answer is in any way related to my question.

This is my fault, because I didn't explain what I exactly mean by the "simulation", and the meaning is different than the most popular one. Details in EDIT in the main post.

Covid 3/18: An Expected Quantity of Blood Clots

jan betley3y50

I think EU countries might be calculating something like this: A) go on with AZ --> people keep talking about killer vaccines and how you should never trust the government and that no sane person should vaccinate and "blood clots today, what tomorrow?" B) halt AZ, then say "we checked carefully, everything's fine, we care, we don't want to kill anyone with our vaccine" and start again --> people will trust the vaccines just-a-little-more

And in the long term the general trust in the vaccines is much more important than few weeks delay.

I think you assume that scenario A is also better for the vaccine trust - maybe, I don't know, but I wouldn't be surprised if the European governments were seeing this the other way.

Also, obviously the best solution is "hey people, let's just stop talking about the goddamned blood clots", but The Virtue of Silence (https://www.lesswrong.com/posts/2brqzQWfmNx5Agdrx/the-virtue-of-silence) is not popular enough : )

What do we *really* expect from a well-aligned AI?

jan betley3y10

A simple way of rating the scenarios above is to describe them as you have and ask humans what they think.

Do you think this is worth doing?

I thought that

either this was done a billion times and I just missed it
or this is neither important nor interesting to anyone but me

What do we *really* expect from a well-aligned AI?

jan betley3y10

What's wrong with the AI making life into a RPG (or multiple thereof)? People like stories and they like levelling up, collecting stuff, crafting, competing, etc. A story doesn't have to be pure fun (and those sort of stories are boring anyway).

E.g. Eliezer seems to think it's not the perfect future: "The presence or absence of an external puppet master can affect my valuation of an otherwise fixed outcome. Even if people wouldn't know they were being manipulated, it would matter to my judgment of how well humanity had done with its future. This is an important ethical issue, if you're dealing with agents powerful enough to helpfully tweak people's futures without their knowledge".

Also, you write:

If we want to have a shot at creating a truly enduring culture (of the kind that is needed to get us off this planet and out into the galaxy)

If we really want this, we have to restrain from spending our whole lives playing the best RPG possible.

Never mind AI, they're contradictory when executed by us. We aren't robots following a prioritised script and an AI wouldn't be either.

Consider human rules "you are allowed to lie to someone for the sake of their own utility" and "everyone should be able to take control of their own life". We know that lies about serious things never turn out good, so we lie only about things of little importance, and little lies like "yes grandma, that was very tasty" doesn't contradict the second rule. This looks different when you are an ultimate deceiver.

The Darwin Game

jan betley4y80

Your TitForTatBot
* never sets self.previous
* even if it was set, it would stop cooperating when opponent played 0

Also I agree with Zvi's comment, why 2.5 for free? This way one should really concentrate on maxing out in the early stage, is it intended?

LESSWRONG
LW

Posts

Wiki Contributions

Comments