I suspect describing AI as having "values" feels more alien than "goals," but I don't have an easy way to figure this out.
whynotboth.jpeg
Here's my current four-point argument for AI risk/danger from misaligned AIs.
Request for feedback: I'm curious whether there are points that people think I'm critically missing, and/or ways that these arguments would not be convincing to "normal people." I'm trying to write the argument to lay out the simplest possible case.
Yeah I believe this too. Possibly one of the relatively few examples of the midwit meme being true in real life.
What are people's favorite arguments/articles/essays trying to lay out the simplest possible case for AI risk/danger?
Every single argument for AI danger/risk/safety I’ve seen seems to overcomplicate things. Either they have too many extraneous details, or they appeal to overly complex analogies, or they seem to spend much of their time responding to insider debates.
I might want to try my hand at writing the simplest possible argument that is still rigorous and clear, without being trapped by common pitfalls. To do that, I want to quickly survey the field so I can learn from the best existing work as well as avoid the mistakes they make.
Ashley Babbit, an unarmed <protestor or rioter, depending on your party affiliation>
I appreciate your attempt to be charitable, but I don't think the left-wing/liberal concerns with Jan 6 is appropriately summarized as "riot."
Alas, no my model is rather limited.
I would not trust Dean Ball as a trustworthy actor acting on the object-level, and certainly would not take any of his statements at face value! I think it's much better to model him as a combination of a political actor saying whatever words causes his political aims to be achieved, plus someone willing to pursue random vendettas.
I'd recommend trying to talk to people 1:1, especially about topics that are more in their wheelhouses than in yours'. At least I've found my average conversation with Uber drivers to be more interesting and insightful than reading my phone.
My guess is that I do this more than you do, but one thing I find unpleasant about interacting with large groups of people I don't know well is that I wind up doing a bunch of semi-conscious theory-of-mind modeling, emotional regulation-type management of different levels of a conversation, etc [1], so it's harder for me to focus on the object-level. [2]I think this is much less of a problem in 1:1 conversations where maintaining the multilevel tracking feels quite natural.
It's unclear to me if I do this more or less than "normies." The case for "less" is that I don't think I've spent a lot of my skillpoints on people modeling compared to other things. The case for "more" is that often people I interact with have almost laughably simplistic or non-existent models of other people.
I would not be surprised if I specifically happen to be in a midwit part of the curve, alas.
I think this sort of assumes that terminal-ish goals are developed earlier and thus more stable and instrumental-ish goals are developed later and more subject to change.
I think this may or may not be true on the individual level but it's probably false on the ecological level.
Competitive pressures shape many instrumental-ish goals to be convergent whereas terminal-ish goals have more free parameters.