Wiki Contributions

Comments

For me, ability = capability = means. This is one of the two arguments that I said were  load bearing. Where will it come from? Well, we are specifically trying to build the most capable systems possible. 

Motivation (ie goals) is not actually strictly required. However, there are reasons to think that an AGI could have goals that are not aligned with most humans. The most fundamental is instrumental convergence.


Note that my original comment was not making this case. It was just a meta discussion about what it would take to refute Eliezer's argument.

I disagree that rapid self improvement and goal stability are load-bearing arguments here. Even goals are not strictly, 100% required. If we build something with the means to kill everyone, then we should be worried about it. If it has goals that cannot be directed of predicted, then we should be VERY worried about it.

I am still not sure why the Doomsday reasoning is incorrect. To get P(A | human) = P(B | human), I first need to draw some distinction between being a human observer and an AGI observer. It's not clear to me why or how you could separate them into these categories.

When you say "half of them are wrong", you are talking about half of humans. However, if you are unable to distinguish observers, then only  1 in 10^39 is wrong. 

My thinking on this is not entirely clear, so please let me know if I am missing something.

I suppose that is my real concern then. Given we know intelligences can be aligned to human values by virtue of our own existence, I can't imagine such a proof exists unless it is very architecture specific. In which case, it only tells us not to build atom bombs, while future hydrogen bombs are still on the table.

I love this idea. However, I'm a little hesitant about one aspect of it. I imagine that any proof of the infeasibility of alignment will look less like the ignition calculations and more like a climate change model. It might go a long way to convincing people on the fence, but unless it is ironclad and has no opposition, it will likely be dismissed as fearmongering by the same people who are already skeptical about misalignment. 
More important than the proof itself is the ability to convince key players to take the concerns seriously. How far is that goal advanced by your ignition proof? Maybe a ton, I don't know. 

My point is that I expect an ignition proof to be an important tool in the struggle that is already ongoing, rather than something which brings around a state change.

Ha, no kidding. Honestly, it can't even play chess. I just tried to play it, and asked it to draw the board state after each move. It started breaking on move 3, and deleted its own king. I guess I win? Here was its last output.

For my move, I'll play Kxf8:

8  r n b q . b . .
7  p p p p . p p p
6  . . . . . n . .
5  . . . . p . . .
4  . . . . . . . .
3  . P . . . . . .
2  P . P P P P P P
1  R N . Q K B N R    
     a b c d e f g h

Small nitpick with the vocabulary here. There is a difference between 'strategic' and 'tactical', which is particularly poignant in chess. Tactics is basically your ability to calculate and figure out puzzles. Finding a mate in 5 would be tactical. Strategy relates to things too big to calculate. For instance, creating certain pawn structures that you suspect will give you an advantage in a wide variety of likely scenarios, or placing a bishop in such a way that an opponent must play more defensively.

I wasn't really sure which you were referring to here; it seems that you simply mean that GPT isn't very good at playing strategy games in general; ie it's bad at strategy AND tactics. My guess is that GPT is actually far better at strategy; it might have an okay understanding of what board state looks good and bad, but no consistent ability to run any sort of minimax to find a good move, even one turn ahead.

I have a general principle of not contributing to harm. For instance, I do not eat meat, and tend to disregard arguments about impact. For animal rights issues, it is important to have people who refuse to participate, regardless of whether my decades of abstinence have impacted the supply chain.

For this issue however, I am less worried about the principle of it, because after all, a moral stance means nothing in a world where we lose. Reducing the probability of X-risk is a cold calculation, while vegetarianism is is an Aristotelian one.

With that in mind, a boycott is one reason not to pay. The other is a simple calculation: is my extra $60 a quarter going to make any tiny miniscule increase in X-risk? Could my $60 push the quarterly numbers just high enough so that they round up to the next 10s place, and then some member of the team works slightly harder on capabilities because they are motivated by that number? If that risk is 0.00000001%, well when you multiply by all the people who might ever exist... ya know? 

I agree that we are unlikely to pose any serious threat to an ASI. My disagreement with you comes when one asks why we don't pose any serious threat. We pose no threat, not because we are easy to control, but because we are easy to eliminate. Imagine you are sitting next to a small campfire, sparking profusely in a very dry forest. You have a firehose in your lap. Is the fire a threat? Not really. You can douse it at any time. Does that mean it couldn't in theory burn down the forest? No. After all, it is still fire. But you're not worried because you control all the variables. An AI in this situation might very well decide to douse the fire instead of tending it. 

To bring it back to your original metaphor: For a sloth to pose a threat to the US military at all, it would have to understand that the military exists, and what it would mean to 'defeat' the US military. The sloth does not have that baseline understanding. The sloth is not a campfire. It is a pile of wood. Humans have that understanding. Humans are a campfire.

Now maybe the ASI ascends to some ethereal realm in which humans couldn't harm it, even if given completely free reign for a million years. This would be like a campfire in a steel forest, where even if the flames leave the stone ring, they can spread no further. Maybe the ASI will construct a steel forest, or maybe not. We have no way of knowing.

An ASI could use 1% of its resources to manage the nuisance humans and 'tend the fire', or it could use 0.1% of its resources to manage the nuisance humans by 'dousing' them. Or it could incidentally replace all the trees with steel, and somehow value s'mores enough that it doesn't replace the campfire with a steel furnace. This is... not impossible? But I'm not counting on it.

Sorry for the ten thousand edits. I wanted the metaphor to be as strong as I could make it.

I understand that perspective, but I think it's a small cost to Sam to change the way he's framing his goals. Small nudge now, to build good habits for when specifying goals becomes, not just important, but the most important thing in all of human history.

Load More