When I say an AI A is aligned with an operator H, I mean:
A is trying to do what H wants it to do.
The “alignment problem” is the problem of building powerful AI systems that are aligned with their operators.
This is significantly narrower than some other definitions of the alignment problem, so it seems important to clarify what I mean.
In particular, this is the problem of getting your AI to try to do the right thing, not the problem of figuring out which thing is right. An aligned AI would try to figure out which thing is right, and like a human it may or may not succeed.
Consider a human... (Read more)
Epistemic Status: Simple point, supported by anecdotes and a straightforward model, not yet validated in any rigorous sense I know of, but IMO worth a quick reflection to see if it might be helpful to you.
A curious thing I've noticed: among the friends whose inner monologues I get to hear, the most self-sacrificing ones are frequently worried they are being too selfish, the loudest ones are constantly afraid they are not being heard, the most introverted ones are regularly terrified that they're claiming more than their share of the conversation, the most assertive ones are always su... (Read more)
Second version, updated for the 2018 Review. See change notes.
There's a concept which many LessWrong essays have pointed at it (honestly, the entire sequences are getting at). But I don't think there's a single post really spelling it out explicitly:
You might want to become a more robust, coherent agent.
By default, humans are a kludgy bundle of impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies. Some people find this naturally motivating – there’s something aesthetically appea... (Read more)