I was going to write something saying "no actually we have the word genocide to describe the destruction of a peoples," but walked away because I didn't think that'd be a productive argument for either of us. But after sleeping on it, I want to respond to your other point:
I don't think the orthogonality thesis is true in humans (i.e. I think smarter humans tend to be more value aligned with me); and sometimes making non-value-aligned agents smarter is good for you (I'd rather play iterated prisoner's dilemma with someone smart enough to play tit-for-tat than someone who can only choose between being CooperateBot or DefectBot).
My actual experience over the last decade is that some form of the above statement isn't true. As a large human model trained on decades of interaction, my immediate response to querying my own next experience predictor in situations around interacting with smarter humans is: no strong correlation with my values and will defect unless there's a very strong enforcement mechanism (especially in finance, business and management). (Presumably because in our society, most games aren't iterated--or if they are iterated are closer to the dictator game instead of the prisoner's dilemma--but I'm very uncertain about causes and am much more worried about previous observed outputs.)
I suspect that this isn't going to be convincing to you because I'm giving you the output of a fuzzy statistical model instead of giving you a logical verbalized step by step argument. But the deeper crux is that I believe "The Rationalists" heavily over-weigh the second and under-weigh the first, when the first is a much more reliable source of information: it was generated by entanglement with reality in a way that mere arguments aren't.
And I suspect that's a large part of the reason why we--and I include myself with the Rationalists at that point in time--were blindsided by deep learning and connectionism winning: we expected intelligence to require some sort of symbolic reasoning and focusing on explicit utility functions and formal decision theory and maximizing things...and none of that seems even relevant to the actual intelligences we've made, which are doing fuzzy statistical learning on their training sets, arguably, just the way we are.
This is kind of the point where I despair about LessWrong and the rationalist community.
While I agree that he did not call for nuclear first strikes on AI centers, he said:
If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
and
Make it explicit in international diplomacy that preventing AI extinction scenarios is considered a priority above preventing a full nuclear exchange, and that allied nuclear countries are willing to run some risk of nuclear exchange if that’s what it takes to reduce the risk of large AI training runs.
Asking us to be OK with provoking a nuclear second strike by attacking a nation that is not actually a signatory to an international agreement banning building gpu clusters that's building a gpu cluster is actually still bad, and whether the nukes fly as part of the first strike or the retaliatory second strike seems like a weird thing to get hung up on. Picking this nit feels like a deflection because what Eliezer said in the TIME article is still entirely deranged and outside international norms.
And emotionally, I feel really, really uncomfortable. Like, sort of dread in stomach uncomfortable.
Yeah, see, my equivalent of making ominous noises about the Second Amendment is to hint vaguely that there are all these geneticists around, and gene sequencing is pretty cheap now, and there's this thing called CRISPR, and they can probably figure out how to make a flu virus that cures Borderer culture by excising whatever genes are correlated with that and adding genes correlated with greater intelligence. Not that I'm saying anyone should try something like that if a certain person became US President. Just saying, you know, somebody might think of it.
Reading it again almost 7 years later, it's just so fractaly bad. There are people out there with guns, while the proposed technology to CRISPR a flu that gene changes people's genes is science fiction so they top frame is nonsense. The actual viral payload, if such a thing could exist, would be genocide of a people (no you do not need to kill people to be genocide, this is still a central example). The idea wouldn't work for so many reasons: a) peoples are a genetic distribution cluster instead of a set of Gene A, Gene B, Gene C; b) we don't know all of these genes; c) in other contexts, Yudkowsky's big idea is the orthogonality thesis so focusing on making his outgroup smarter is sort of weird; d) actually, the minimum message length of this virus would be unwieldy even if we knew all of the genes to target to the point where I don't know whether this would be feasible even if we had viruses that could do small gene edits; and of course, e) this is all a cheap shot where he's calling for genocide over partisan politics which we can now clearly say: the Trump presidency was not a thing to call for a genocide of his voters over.
(In retrospect (and with the knowledge that these sorts of statements are always narrativizing a more complex past), this post was roughly the inflection point where I went gradually started moving from "Yudkowsky is a genius who is one of the few people thinking about the world's biggest problems" to "lol, what's Big Yud catastrophizing about today?" First seeing that he was wrong about some things meant that it was easier to think critically about other things he said, and here we are today, but that's dragging the conversation in a very different direction than your OP.)
Over the years roughly between 2015 and 2020 (though I might be off by a year or two), it seemed to me like numerous AI safety advocates were incredibly rude to LeCun, both online and in private communications.
I think this generalizes to more than LeCun. Screencaps of Yudkowsky's Genocide the Borderers Facebook post still circulated around right wing social media in response to mentions of him for years, which makes forming any large coalition rather difficult. Would you trust someone who posted that with power over your future if you were a Borderer or had values similar to them?
(Or at least it was the goto post until Yudkowsky posted that infanticide up to 18 months wasn't bad in response to a Caplan poll. Now that's the post used to dismiss anything Yudkowsky says.)
Redwood Research used to have a project about trying to prevent a model from outputting text where a human got hurt, which IIRC, they did primarily by trying to fine tunes and adversarial training. (Followup). It would be interesting to see if one could achieve better results then they did at the time through subtracting some sort of hurt/violence vector.
Firstly, it suggests that open-source models are improving rapidly because people are able to iterate on top of each other's improvements and try out a much larger number of experiments than a small team at a single company possibly could.
Widely, does this come as a surprise? I recall back to the GPT2 days where the 4chan and Twitter users of AIDungeon discovered various prompting techniques we use today. More access means more people trying more things, and this should already be our base case because of how open participation in open source has advanced and improved OSS projects.
I'm worried that up until now, this community has been too focused on the threat of big companies pushing capabilities ahead and not focused enough on the threat posed by open-source AI. I would love to see more discussions of regulations in order to mitigate this risk. I suspect it would be possible to significantly hamper these projects by making the developers of these projects potentially liable for any resulting misuse.
I have no idea how you think this would work.
First, any attempt at weakening liability waivers will cause immediate opposition by the entire software industry. (I don't even know under what legal theory of liability this would even operate.) Remember under American law, code is free speech. So...second, in the case you're somehow (somehow!) able to pass something (while there's a politicized and deadlocked legislature) where a coalition that includes the entire tech industry is lobbying against it and there isn't an immediate prior restraint to speech challenge...what do you think you're going to do? Go after the mostly anonymous model trainers? A lot of these people are random Joe Schmoes with no assets. Some of the SD model trainers which aren't anonymous already have shell corporations set up, both to shield their real identities and to preemptively tank liability in case of artist nuisance lawsuits.
I have a very strong bias about the actors involved, so instead I'll say:
Perhaps LessWrong 2.0 was a mistake and the site should have been left to go read only.
My recollection was that the hope was to get a diverse diaspora to post in one spot again. Instead of people posting on their own blogs and tumblrs, the intention was to shove everyone back into one room. But with a diverse diaspora, you can have local norms to a cluster of people. But now when everyone is trying to be crammed into one site, there is an incentive to fight over global norms and attempt to enforce them on others.
This response is enraging.
Here is someone who has attempted to grapple with the intellectual content of your ideas and your response is "This is kinda long."? I shouldn't be that surprised because, IIRC, you said something similar in response to Zack Davis' essays on the Map and Territory distinction, but that's ancillary and AI is core to your memeplex.
I have heard repeated claims that people don't engage with the alignment communities' ideas (recent example from yesterday). But here is someone who did the work. Please explain why your response here does not cause people to believe there's no reason to engage with your ideas because you will brush them off. Yes, nutpicking e/accs on Twitter is much easier and probably more hedonic, but they're not convincible and Quinton here is.
Meta-note related to the question: asking this question here, now, means you're answer will be filtered for people who stuck around with capital r Rationality and the current LessWrong denizens, not the historical ones who have left the community. But I think that most of the interesting answers you'd get are from people who aren't here at all or rarely engage with the site due to the cultural changes over the last decade.
Is this actually wrong? It seems to be a more math flavored restatement of Girardian mimesis, and how mimesis minimizes distinction which causes rivalry and conflict.