elriggs' Comments

Corrigibility as outside view

Okay, the outside view analogy makes sense. If I were to explain it to me, I would say:

Locally, an action may seem good, but looking at the outside view, drawing information from similar instances of my past or other people like me, that same action may seem bad.

In the same way, an agent can access the outside view to see if it’s action is good by drawing on similar instances. But how does it get this outside view information? Assuming the agent has a model of human interactions and a list of “possible values for humans”, it can simulate different people with different values to see how well it learned their values by the time it’s considering a specific action.

Considering the action “disable the off-switch”. It simulates itself interacting with Bob who values long walks on the beach. By the time it considers the disable action, it can check it’s simulated self’s prediction of Bob’s value. If the prediction is “Bob likes long walks on the beach”, then that’s an update towards doing the disable action. If it’s a different prediction, that’s an update against the disable action.

Repeat 100 times for different people with different values and you’ll have a better understanding of which actions are safe or not. (I think a picture of a double-thought bubble like the one in this post would help explain this specific example.)

Meditation: the screen-and-watcher model of the human mind, and how to use it

I pattern match this to the Buddhist idea of interdependence, where what you are is reliant on the environment and the environment is reliant on you (or embedded agency).

My experience with the "rationalist uncanny valley"

If I understand you right, you value some things (finding them meaningful) because you robustly value them regardless of circumstances (like I value human life regardless of whether I had coffee this morning). Is this correct?

But you also mentioned that this only accounts for some values, and other things you value and find meaningful aren’t robust?

Today a Tragedy

Happy Birthday Will,

I remember in 9th grade you started dating my ex right after consoling me. I was so mad! Haha. I never told you this, but me and the others on our forensics team saw y’all just sitting, holding hands, and having a good time, and Jennifer suggested that me and her hold hands and sit next to y’all giggling.

I said no, though it would’ve made a better story if I went through with it, haha. I think we started getting along again after she moved, although I can’t remember saying anything mean to you because of it.

I’m not sure she knows what happened to ya. I know y’all kept in touch when she moved, and maybe she checks Facebook more than I do.

Anyways, a lot of us are back home cause of the Coronavirus, and I would love to be able to give you a call and see how your life’s progressed these past few years.

Love you Will,


Link Retrospective for 2020 Q1

Thanks for the links and I hope you post another next quarter!

"No evidence" as a Valley of Bad Rationality

Correct, favoring hypothesis H or NOT H simply because you label one "null hypothesis" are both bad. Equally bad when you don't have evidence either way.

In this case, intuition favors "more chemo should kill more cancer cells", and intuition counts as some evidence. The doctor ignores intuition (which is the only evidence we have here) and favors the opposite hypothesis because it's labeled "null hypothesis".

Attainable Utility Preservation: Scaling to Superhuman

Thanks for the link (and the excellent write-up of the problem)!

Regarding the setting, how would the agent gain the ability to create a sub-agent, roll a rock, or limit it's own abilities initially? Throughout AUP, you normally start with a high penalty for acquiring power, and then you scale it down to reach reasonable, non-catastrophic plans, but your post begins with having higher power.

I don't think AUP prevents abuse of power you have currently have (?), but prevents gaining that power in the first place.

Attainable Utility Preservation: Scaling to Superhuman

I expect AUP to fail in embedded agency problems (which I interpret the subagent problem to be included). Do you expect it to fail in other areas?

Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think

I realized afterwards that only “not sharing others secrets” is an example of “it’s ethical to lie if someone asks a direct question”. The other two were more “don’t go out of your way to tell the whole truth in this situation (but wait for a better situation)”

I do believe my ethics is composed of wanting what’s “best” for others and truthful communication is just an instrumental goal.

If I had to blatantly lie every day, so that all my loved ones could be perfectly healthy and feel great, I would lie every day.

I don’t think anyone would terminally value honesty (in any of it’s forms).

Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think

Thanks for the clarification.

For me the answer is no, I don’t believe it’s ethically mandatory to share all information I know to everyone if they happen to ask the right question. I can’t give a complete formalization of why, but three specific situations are 1) keeping someone else’s information secret & 2) when I predict the other person will assume harmful implications that aren’t true &3) when the other person isn’t in the right mind to hear the true information.

Ex for #3: you would like your husband to change more diapers and help clean up a little more before they leave work every day, but you just thought of it right when he came home from a long work day. It would be better to wait to give a criticism when you’re sure they’re in a good mood.

An example for #2: I had a friend have positive thoughts towards a girl that wasn’t his girlfriend. He was confused about this and TOLD HIS GIRLFRIEND WHEN THEY WERE DATING LONG DISTANCE. The two girls have had an estranged relationship for years since.

If I was my friend, I would understand that positive thoughts towards a pretty girl my age doesn’t imply that I am required to romantically engage them. Telling my girlfriend about these thoughts might be truthful and honest, but it would likely cause her to feel insecure and jealous, even though she has nothing to worry about.

Load More