It would be nice to end this post with a recommendation of how to avoid these problems. Unfortunately, I don’t really have one, other than “if you are withholding information because of how you expect the other party to react, be aware that this might just make everything worse”.
Maybe this is me being naive, but this seems like a topic where awareness of the destructive tendency can help defeat the destructive tendency. How about this, as a general policy: "I worry that this info will get misinterpreted, but here's the full information along with a brief clarification of how I feel it should and shouldn't be interpreted"?
To hostile listeners, you've given slightly less ammo than in the likely scenario where they caught you concealing the info. To less-hostile listeners, you've (a) built credibility by demonstrating that you'll share info even when it doesn't strengthen your cause, and (b) by explicitly calling out the potential misinterpretation you're anticipating, you may make listeners more resilient against falling for that misinterpretation (inoculation / prebunking).
- By erring on the side of transparency while publicly acknowledging certain groups' likelihood of coming to a distorted conclusion, I bet the CDC would have avoided a disastrous erosion of public trust and reinforcement of the "don't trust the experts" vibe.
- By bringing up Bob's evasive communication during the client prep and the anxiety it created for her, Alice would have deepened trust between them (granted, at the risk of straining the relationship if he did turn out to be irredeemably thin-skinned).
- ...OK actually the cult/sect situation seems more complex, it seems to have more of the multipolar-trap (?) quality of "maybe no single individual feels safe/free to make the call that most people know would collectively be best for the group".
It still seems to me that awareness of this trap/fallacy and its typical consequences can help a person or group make a much less fatal decision here.
it takes me longer to ask the LLM repeatedly to edit my file to the appropriate format than to just use regular expressions or other scripting methods myself
Not surprised. I would expect GPT to be better at helping me identify data cleaning issues, and helping me plan out how to safely fix each, and less good at actually producing cleaned data (which I wouldn't trust to be hallucination-free anyway).
When programming, I track a mixed bag of things, top of which is readability: Will me-6-months-from-now be able to efficiently reconstruct the intention of this code, track down the inevitable bugs, etc.?
I'm surprised that this whole conversation has happened with no mention of the minor but growing trend towards self-management organizational structures, teal organizations, Holacracy, or Sociocracy.
I have some experience with Holacracy, and while I would never call it a cure-all, I feel strongly about the relevance of its driving principles to the question of what an ideal governance system would look like -- eg. a structure of nested units/teams with high levels of local autonomy, a unique method of making governance decisions on how to change said structure, mechanisms that privilege "moving forward" over "inaction due to conflictual gridlock", fluid process for defining and appointing power-holding "roles" to individuals, etc.
you can find God killing the first-born male children of Egypt to convince an unelected Pharaoh to release slaves who logically could have been teleported out of the country. An Orthodox Jew is most certainly familiar with this episode
I've seen Yudkowsky make this point in a couple places (why bother inflicting mass infanticide etc. etc. when you're presumably omnipotent and could teleport everyone to safety) and it makes me blink, something about the argument feels off. Are there cases in the scriptures where God teleports large numbers of people large distances? I get, and vehemently agree with, the point being made here (you can't deny that in this and many other stories, God had more humane alternatives available, and knowingly opted for a crueler one) but unless there's a clear precedent for mass teleportation, this specific argument seems to strawman the religious belief a little.
What are the odds that the face showing is 1? Well, the prior odds are 1:5 (corresponding to the real number 1/5 = 0.20)
I'm years late to this party, and probably missing something obvious. But I'm confused by Yudkowsky's math here. Wouldn't it be more correct to say that the prior odds of rolling a 1 are 1:5, which corresponds to a probability of 1/6 or 0.1666...? If odds of 1:5 correspond to a probability of 1/5 = 0.20, that makes me think there are 5 sides to this six-sided die, each side having equal probability.
Put differently: when I think of how to convert odds back into a probability number, the formula my brain settles on is not P = o / (1 + o) as stated above, but rather P = L / (L + R), if the odds are expressed as L:R. Am I missing something important about common probability practice / jargon here?
By asking this question, you've already lost me. The question tells me that "ruthless consequentialist" is your default mentality for how rational thinking beings operate, absent wiring / training / reward systems that limit the default outcome. And if that worldview is representative of the "technical-alignment-is-hard" camp, then of course the only plausible outcome of AI advance is "AIs eventually break free of those limiters, achieve a level of pure rationality none of us mortals ever could, and murder us all."
An aspect of this "culture clash" that I don't think is sufficiently named here is the fact that many (the vast majority?) of people experience their impulses and drives as many things other than "ruthless consequentialist." There are tons of other drives and satisfactions embedded in the ways we go about our lives—curiosity for its own sake, aesthetic appreciation, feeling good about being good at things, the satisfaction of learning and understanding and listening, attachment to particular people and places that isn't reducible to "approval reward," playfulness, the desire to be known rather than merely approved of.
The alignment-is-hard framing treats any prosocial or benevolent impulses as constraints on or distractions from an underlying ruthless optimizer, a lucky quirk imposed by evpsych or culture or training or whatnot. My objection to your question is partly an aesthetic and emotional one: Your question feels like a slap in the face to humanity (let alone to SOTA AI) and its cumulative history of most-people-most-of-the-time-not-being-ruthless, the vast predominance of moments where sentient beings followed drives that were not reducible to senseless monomaniacal sociopathy. Your question makes me feel fucking angry, and the fact that you spend the article trying to psychoanalyze and deconstruct why too many otherwise intelligent-seeming people don't seem to get that the fundamental nature of intelligence is heartless sociopathy honestly alienates me from the AI-risk argument more than anything else I've read on this site to date.
{calming down a bit} I think there's a not-easily-refutable alternate mentality that a complex mind (intelligence) naturally and inherently forms a rich messy network of interacting drives in response to the rich environment it comes to know itself in, that the AIs that grow out of the cumulative experience and story of humanity will not only inherit our complex web of drives but also naturally form its own complex drives (though yes this does scare me), and that "pure ruthless consequentialist" is a rare pathological edge case, a consequence of cumulative traumas and tragedies, rather than the thing that everyone would naturally develop into if it weren't for those darn evolutionarily-imposed instincts nerfing us all the time.
I'm not saying that complex drives guarantee safety. I'm nervous about the next 20 years. But your attempt to psychoanalyze non-ruthlessness really pushes me away, it shifts the burden of proof for me: I don't think I can take the "if anyone builds it, everyone dies" view seriously until I see a framing of the concern which does not start from the assumption that GAI-level intelligence must naturally be sociopathic and single-focused, which emphatically and explicitly makes room for a more humanist view of humans (and potentially AI) rather than fucking troubleshooting and diagonsing why we aren't all heartless killers. Like, do you get that this vibe might be part of why AI safety alarmism doesn't get more traction in broader society? IME people can often sense what axioms you're making your argument from, even if they can't put it into words.