Hgbanana123 — LessWrong

I think that

"Don't say things that you believe to be literally false in a context where people will (with reasonably high probability) persistently believe that you believe them to be true"

is actually in line with the "bayesian honesty" component/formulation of the proposal. If one is known to universally lie, one's words have no information content, and therefore don't increase other people's bayesian probabilities of falsy statements. However, it seems this is not a behaviour that Eliezer finds morally satisfactory. (I agree with Rob Bensinger that this formulation is more practical in daily life)

How it feels to have your mind hacked by an AI

Hgbanana1233y100

Scott Alexander has an interesting little short on human manipulation: https://slatestarcodex.com/2018/10/30/sort-by-controversial/
So far everything I'm seeing, both fiction and anecdotes, is consistent with the notion that humans are relatively easy to model and emotionally exploit. I also agree with CBiddulph's analysis, insofar as while the paperclip/stamp failure mode requires the AI to have planning, generation of manipulative text doesn't need to have a goal--if you generate text that is maximally controversial (or maximises some related metric) and disseminate the text, that by itself may already do damage.

Meta-Honesty: Firming Up Honesty Around Its Edge-Cases

Hgbanana1233y20

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments