I wonder: "How it happens that people trust someone or something at all?" (in the context of: how can we ever get to trust an AI smarter than us?). 

Epistemic status: this is just a dump of my thoughts (inspired by Babble & Prune Thoughts). Feel free to down-vote if you don't agree or don't like it. But, if that's not too much to ask, please leave a comment explaining what are you objecting to, as my goal in sharing this thoughts is to refine them.

I notice that:
(this list is numbered only so it is easier to disagree with me by referring to a specific number - it's not meant to be some finite set of axioms or some ordering):

  1. people shake hands when meeting people and showing friendly intentions. They allow to be touched. They show hands do not hold a weapon. They allow to be held by the hand. They allow the other party to measure tension, strength, sweat/stress. They find it strange if somebody refuses to extend a hand in response. They find it strange if somebody holds too long or too strong. They look into eyes, too (or rather the full face to read to the best of ability if they are smiling or grinning etc.)
  2. dogs even expose vulnerable underbelly or necks' arteries
  3. people bow to show obedience, voluntarily disabling their vision and ability to attack, for a moment. OTOH dueling martial artists might look into the others' eyes when bowing
  4. a wannabe knights have to kneel in front of the king, who puts a sword right next to their neck
  5. a boss might delegate tasks of increasing complexity and importance to a subordinate, pausing at some level if the performance is below expectations
  6. people may build friendships or even fall in love by exchanging details about them of increasing personality (by which it seems to mean: exposing vulnerabilities?) - see 36 questions to fall in love
  7. what does it take to convince me to share PIN with spouse? 
  8. what does it take to convince me to share car-keys with my kid?
  9. what does it take to convince me to give power to my government?
  10. the whole movie Hero seems to be about (SPOILER ALERT SKIP TO NEXT BULLET IF YOU PLAN TO WATCH THE MOVIE) build around the idea that to gain trust of the emperor the warrior has to do so in increments performing acts of loyalty, sacrifice. The warrior tries to game this system, by trying to pretend obedience in training, secretly plotting for the treacherous turn once he gain access to the emperor. But, eventually it turns out the rules are structured so that by trying to follow them he gets aligned and eventually no longer sees the world the same way he did initially. A dynamic similar to "fake it till you make it" lets the mask to posses the actor. Yes, this is all fictional evidence from a movie, but it seems to describe some interesting idea: build the rules in such a way that even gaming them is actually still playing the game. Or maybe: teach you something about the metagame which created the game, so that you now are more like its creator?
  11. investor might first invest small money to see what will it get in return before making a larger investment
  12. "the right to be sued" (I think I've first heard about the concept from Robin Hanson's post, but I can't find it now) - that it is actually very important that people can sue you, for your ability for being able to make any deals with them. If they fear they will not be able to sue you if you violate the deal, then the deal is not really a deal and they will not want to make it with you
  13. similarly, having a recognizable face, and ID, a non-anonymous user handle, etc. might be important prerequisite to be trustworthy
  14. mutually assured destruction seems to be at least sometimes a helpful idea. Isn't it perhaps the only idea really? Isn't the "I can trust you because I can hurt you" at the bottom of every deal ever?
  15. also it looks like the pattern is often some form of alternatively raising the stake so that at any point you risk loosing all prior investments plus the new epsilon. The new epsilon in this turn is not a big deal, but loosing that prior investment is a huge deterrent. Would it be seen strange if I requested to raise the stake two-fold, instead of just a little? I think in many social context: yes, that would feel awkward, desperate or suspicious. Would it be seen strange if I refused to match the bet? Yeah, it feels cold
  16. people do exchange small acts of devotion (time, attention, flowers, greetings), and if that stops they read much into it (as it kinda looks as somebody is optimizing out this part of their value function which included me?)
  17. some deals are made over vodka - as drinking alcohol disables some restraints/self-censorship and long-term planning, revealing more about currently sought goals and valuations? (Attaching a debugger? Running under monitoring?) 
  18. people become friends, or at least recognize who the real friend is, in need, in common adventures which required sacrifices for each other
  19. players would do wise to cooperate in an iterated prisoner dilemma, but might perhaps find it in their interest to defect in one-shot battle against a different (i.e. no shared source code) entity
  20. when planning an ideal society I like to adopt a framework of thinking that I'd have to live in it as a randomly chosen citizen and interact with others for a long time according to the rules
  21. would it be possible to somehow make AI uncertain about which player in the game it will end up to be? At a hand waving level "prompt it" with such a story setup that it might be genuinely afraid that the actions it will take on others may actually affect "it" (or "its clones"). Something like "you are currently inside a paused simulation and got access to modify it after which it will be unpaused" but better?
  22. would it be possible to let the AI somehow expose its vulnerabilities to us? How something which can backup&restore be vulnerable? is there any equivalent of a dog holding its teeth around other's dog neck?
  23. and what do we even mean by "it"? How can AI "have an ID/face"? How can it be a "reputation holder"?
  24. is there anything it may need from us? And if in general the answer is no, can we devise an incremental process of building it such that along the way it gets, so intertwined with us that it will have to see/value the world the way we do, including valuing us? How can we break the symmetry here, so that it is AI - not we - who gets aligned in the process?
  25. do I even trust the society, government or corporation (of which I think as more powerful than me)? I think that "as a revealed preference", the answer is "yes", but it feels more like "actually, do I even have an option to not 'trust' it? I just hope it will turn out OK next day, and so far it did", but it feels like a different meaning of the same word "trust". But yeah, I fantasize that if things go really wrong in my democratic country I could go to the streets, and then perhaps at least some of policemen/soldiers might disobey bad orders. This kinda sounds like: I'd like AI to be composed of small parts which can be persuaded/corrigible/allied-with independently?

Hm, so far it looks like, I for one, would be unable to trust an AI more powerful than me, unless I saw how it has a stake too lose if I am not happy with our interactions. And the path to get to there from here, would usually go through alternate rounds of raising the stake by both of us if the two parties are almost-equal. It doesn't have to be alternate and symmetrical if there is asymmetry of power: it may require sacrifices just from the weaker side if in case of asymmetry, but in the scenario considered I'd play the role of the weaker party. I see no way I could trust something more powerful than me - except if it is a sum of smaller parts which can somehow be corrigible, because they are weaker, and have own interests, and coordinate with each other for similar reasons they might coordinate with me if they found this to better advance their goals.


New Comment
1 comment, sorted by Click to highlight new comments since: Today at 12:25 AM

Does the literature on the economics of reputation have ideas that are helpful?

New to LessWrong?