LESSWRONG
LW

3131
Flow
7050
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
What Success Might Look Like
Flow11d10

I'm out of my depth with mathematical and logical proofs, but wouldn't this be just rhetorical engagement with a hypothetical. In probability theory we can use conditionals, this feels like doing that.

Reply
What Success Might Look Like
Flow11d65

I'm reminded of this part from HP:MoR when 

 Harry's following Voldemort to what seems to be his doom.

Suppose, said that last remaining part, suppose we try to condition on the fact that we win this, or at least get out of this alive. If someone told you as a fact that you had survived, or even won, somehow made everything turn out okay, what would you think had happened—

Not legitimate procedure, whispered Ravenclaw, the universe doesn’t work like that, we’re just going to die

I never understood why this was considered illegitimate. If we have a particular desired outcome, it makes sense to me to envisage it and work backwards from there. Remaining open to deviations of course.

Reply
We Change Our Minds Less Often Than We Think
Flow2y10

The principle of the bottom line

 

I think "The Bottom Line" here is meant to link to the essay.

Reply
Evolution is a bad analogy for AGI: inner alignment
Flow3y10

E.g., the snake wouldn't have human-like reward circuitry, so it would probably learn to value very different things than a human which went through the same experiences.

 

So in this case I think we then agree. But it seems a bit at odds with the 4% weighting of genetic roots. If we agree the snake would exhibit very different values despite experiencing the 'human learning' part then shouldn't this adjust the 60% weight you grant that? Seems the evolutionary roots made all the difference for the snake. Which is the whole point about initial AGI alignment having to be exactly right.

 

Otherwise I understand your post to be 'for humans, how much of human value is derived from evolution vs learning'. But that's using humans as evidence who are human to begin with. 

Reply
Evolution is a bad analogy for AGI: inner alignment
Flow3y30

I would consider that you cannot weight these things along a single metric. Say evolution -> human values really is only 4% of your value alignment, if that 4% is the fundamental core then it's not part of the sum of all values, but a coefficient or a base where the other stuff is the exponent. It's the hardware the software has to be loaded on, but not totally tabula rasa either.

Correct me if I'm wrong, but this would assume that if you could somehow make a human level intelligence snake and raise it in human society (let's pretend nobody considers it weird that there's a snake taking Chemistry class with them), then that snake would be 96% aligned with humanity?

My intuition would be along the lines of the parable of the scorpion and the frog. 

Reply