Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?

Benjamin Bourlier

[Intro Note: (1) this "post" is, technically, a "question", for which there's a separate category on LW. I debated posting it as a "question", but it also involves considerable argumentation for my own views. Ideally there'd be some in-between "question/post" category, but I opted for "post"? I understand if a reader thinks this should have been categorized as a "question", though, and would appreciate criteria considerations for how to distinguish these categories on the site in the comments--maybe "dialogue" would be better? (2) My goal in posting on LW is to become less wrong, not to persuade, get upvotes, etc. My recent first post was significantly downvoted, as I expected. I naively expected some debate, though--in particular, I expected someone to say, like, "Metzinger and Benatar have already argued this". Anyway, I've revisited site norm guidelines to check my cynicism. I'm trying, sincerely, to write "good content". I'm sure I can do better. I don't mind being downvoted/buried, and I don't mind rude comments. I would, though, genuinely appreciate rational/logical explanations as to why, if I am "not good", or am producing "low quality content", this is so. Please, don't just passively bury my arguments/questions without offering any explanation as to why. That's all I ask. I maintain several controversial philosophical views. I'm not a contrarian or provocateur, but simply a heretic. I can list my controversial views: I support anti-natalism and am against intelligence-augmentation (see my first post); I support free, voluntary access to euthanasia; I am pro-impulsive/irrational-suicide-prevention, but I believe this can only be achieved for suicidally depressed people via existential-pessimism/depressive-realism (that is, I maintain a theory of suicide prevention not endorsed by CBT-dominant psychology, according to which "depression" is deemed inherently delusional--I don't think it is, not necessarily); I think genocide and abuse in general are causally explainable in terms of Terror Management Theory (TMT); I think along the lines of Ellul's technological pessimism (my future predictions are all quite negative); I think along the lines of Kahneman's epistemological pessimism (my view of "intelligence" and "knowledge" is pessimistic, and my predictions regarding "education" are all quite negative); and, perhaps most controversially, I am sympathetic to (though not explicitly endorsing of) negative utilitarian "benevolent world exploder" positions (which, I admit, are dangerous, due to high misinterpretation likelihood and appeal to psychopathic villainy). When I talk, I tend to bum people out, or upset them, without intending to. I have been excommunicated from a church I tried attending for the heresy of "pessimism", literally (official letter and everything). My point is, I really want to think LW is better than the church that excommunicated me as an actual dogma-violating heretic, but so far into posting, it has been less helpful. This intro request is my sincere attempt to alter that trajectory toward useful rational feedback. This post will concern "Strong-Misalignment". I'm aware this is an unpopular view on LW. I can't find any refutation of it, though. I anticipate downvoting. I will not personally downvote any comment, though, whatsoever. Please, explain why I'm wrong. I'd appreciate it.]

Q: What is Yudkowsky’s (or anyone doing legit alignment research’s) elevator speech to dispel Strong-Misalignment? (Any/all relevant links appreciated—I can’t find it.)

Elevator Passenger to Alignment Researcher: “You do ‘alignment research’? Ah, fiddling as the world burns, eh?”

Alignment Researcher: “[*insert concise/~5-10 floors’ worth rational rebuttal here--or longer, I'm not wedded to this elevator scenario; just curious as to how that would go down*]”

By “Strong-Misalignment” (SM) I intend the position that at least AI/ML-alignment (if not intelligence-alignment in general—as in, ultimate inescapability of cognitive bias) always has been (like, from the Big Bang onward) and always will be (like, unto cosmic heat death) fundamentally impossible, not merely however-difficult/easy-somebody-claims-it-is, no matter what locally-impressive research they’re pointing to (e.g., RL being provably non-scheming). Strong-misalignment = inescapable misalignment. I guess. “Strong” as in inviolable. Or, at least, SM = “alignment research” should sound something like “anti-gravity research” (as in, this is most likely impossible and therefore most likely a waste of time--that is, we should be doing something else with our remaining time).

[No need to read beyond here if you have links/actual arguments to share—to script convincingly the Alignment Researcher above—but need to go do whatever else. All links/actual arguments appreciated.]

I’m thinking of SM in terms of what in mathematics would be the distinction between a “Singular Perturbation Problem” and a “Regular Perturbation Problem”. If alignment is a “regular” problem, working on better and better approximations of alignment is entirely reasonable and necessary. But if the assumption of Less-Strong/Weak-Misalignment (LSM, WM) is a “naive perturbation analysis”, and if the problem is actually a singular perturbation problem, this whole project is doomed from the beginning—it will only ever fail. Right? Sure, this is just my blue-collar “that thing looks like this thing” instinct. I’m no expert on the math here. But I’m not entirely ignorant either. Somebody must be dealing with this? (Or is assuming that just a modest-epistemology-bias?)

If SM is demonstrably not a “singular perturbation problem”, if it’s disproven, or disprovable, or provably very unlikely—and it seems to me it would have to be, first, before even beginning to take alignment research seriously at all (and it seems Yudkowsky is saying, indeed, we have not even begun collectively to take alignment research seriously?)—where is that proof/argument? Links appreciated, again.

I’ve been SM-leaning, admittedly, since I started paying attention to alignment discourse about eight years ago and since (Yudkowsky and Bostrom to begin with, and expanding from there). I understand I’m still comparatively a newcomer (except I’m a veteran follower of the issue relative to the only recent Overton window dilation/semi-accommodation, so I’m used to explaining things to folks in my life only because ignorance has been so widespread and problematic), but more importantly I don’t claim (yet?) to be able to conclusively prove SM. I’ve just considered this to be most likely, based on the evidence I’ll list below. I’ve done the pessimist-sideliner grumbling thing this whole time, though, rather than, say, making bets on probability calculations or diligently following actual alignment research (“Silly optimists”, [*weed smoke*, *doom metal*]; this can/could be mere avoidance or intellectual laziness on my part, I’m saying]). Until now, anyway. I’m hoping to get answers to questions by LW posting where I can’t find (or haven’t yet found) any LW posts/comments already answering them. The evidence below is readily available to a layperson like me, which is good (for me, annoying for others having to listen to me). The evidence is then reliably reliable, in that I’m drawing from the most obvious sources/experts, their conclusions are tried and less-wrong, etc. However, I’ve also only paid attention to things readily available to a layperson. I’m a musician who’s worked as a CNC programmer, and I’ve done private coding projects for work and pleasure but that’s it, as far as comp-sci. Everything I’ve studied is just whatever I’ve found to be most relevant and available as I’ve gone autodidactically along (I haven’t followed cutting edge research mostly due to accelerationist malaise [“Wake me up when tech-tinkering kills us all or doesn’t or whatever, just get it the hell over with”], I don’t attend or listen to seminars (it’s too depressing), I’m focused more on pure mathematics/philosophy than computer science, etc.). So maybe I am—like they tell me—just biased pessimistically, unreasonably. Or is this itself modest epistemology (cornered optimism-bias)? You tell me. Links appreciated.

Having said that, reader, please don’t just claim that SM has obviously been solved/disproven without presenting any proof, or downvote this without offering any explanation below. Help a guy out, you know? If it’s been disproven, great, shoot me a link (or two, or three, or however many are needed). Or if you disagree with my premise that SM would need to be disproven/proven unlikely before seriously pursuing alignment research (as opposed to using one’s remaining time on earth doing things other than alignment research—things like, say, figuring out how to build some kind of underground railroad for young people to flee the U.S. education system where, on top of everything else wrong with the system, they are increasingly being gunned down by mass shootings; but that’s another post, presumably after another downvoting censor interval if my LW trajectory doesn't change radically—bringing up the actual mechanics of defending innocent children is, strangely, taboo, while at the same time being celebrated as an abstract ideal, at nearly all levels of society, as far as I can tell?)…great! Why, though?

If you think SM is obviously disproven and yet can offer no proof: what do you think you know, and why do you think you know it? (Does asking this work? I mean, I ask it, but…usually it’s just followed by the person/group/whatever rehashing the same irrationality that preceded the question. I suppose it’s still worth asking here. There just doesn’t seem to be any obvious intervention into another’s irrationality that works consistently, that summons the rational intuition from the murky bog of mentation, or else LW members would be using it to obvious effect, right? I don’t know. I know I can strive to be less wrong, myself. Hoping to learn here. Predicting I likely won’t, though, not from posting this, that I’ll be downvoted/ignored/buried for even daring to suggest the idea of SM, which wouldn’t teach me anything other than that humans everywhere--even in explicitly rationality-prioritizing communities--indeed seem as constrained by optimism-bias as I’ve feared, as prone to suppression of heretics as ever, even if we’ve gotten less burn-at-a-literal-stake about it (two weeks' online censorship isn't so bad, of course)—but prove me wrong, by all means. For the love of the intervening God in whom I cannot possibly believe and not for lack of trying and whose reach I agree we are beyond, prove me wrong.)

Evidence, as I see it, for Strong-Misalignment:

(1) Computational irreducibility, as defined by Stefan Wolfram. This one’s interesting, because as far as I can tell, this is strong evidence for SM. However, Wolfram himself remains oddly optimistic regarding future alignment, it seems, (although casually admitting at least some of the devastation-likely features, also?), and yet…I can’t find a clear example of him explaining why he is so optimistic. Implicitly, this would, of course, suit the trajectory of his life’s work. He hopes, sure. But what does that mean to me, someone likely to outlive him? Where’s the proof? The irony of it, for me, is that “computational irreducibility” seems to be the clearest challenge to any conception of AI-alignment. Right? The point of “computational irreducibility” is that it’s literally irreducible, unsolvable, an issue that will never go away, and this emerges from very simple algorithms. Ones with very complex outputs. Is this not directly applicable to current LLMs? More deeply, Wolfram’s insight regarding “computational irreducibility” suggests there have always been and there always will be simple algorithms with arbitrarily complicated and humanly unpredictable (irreducibly complex) outputs. Right? What am I missing? “Alignment” means reducibility, ultimately, right, or else…what? We’re talking about reducing the domain of AI output (AGI, ASI, conscious/unconscious, dangerously superhuman, dangerously subhuman, whatever) to a well-defined domain of human output/behavior/pursuit-of-life-liberty-happiness. Any common sensical person, when they first hear of “alignment”, says something like: “But, hey, we’re not aligned to each other, not very well (genocide, war, hatred, etc.)? And we’re not aligned to ourselves, not that well (cognitive bias, mental illness, suicide, death-anxiety, etc.)? So, we’re going to manage this AI-alignment without having to first solve those other alignment problems? Not to mention death itself? We can’t align our genes’ interests to our bodies’ interests, not ultimately, nor can we even accept without emotional distress when genes’ interests are diametrically opposed to ‘our’ interests, despite being able to interface with and engineer both in a variety of ways? So…that’s weird, then? That we’re just going to assume that we can do that? Right? And even assume that the solution to ALL of our alignment problems (every other existential risk, disease, the inaccessibility of intergalactic travel and colonization and even sustaining some recognizably us-enough “intelligence” on an escape mission beyond the universe itself) is only obtainable via advancing machine learning, this thing we’re just assuming we can align indefinitely, continually? This all sounds very and-the-band-plays-on-as-the-Titanic-sinks, no? It’s like assuming we can build flying machines without first figuring out the wing? Right? As in, we just fall off a cliff first try? And if that means all of us, then we all just die. And if the response is, ‘Well, we were all going to die anyway, we might as well have tried’, then the problem is that we might have used our time more effectively, knowing that, no? Isn’t that what Yudkowsky is saying, certainty of absolute failure on the first attempt? Or is this supposed to be like…I don’t know, what’s a prior example of a technology that’s orders of magnitude more immediately dangerous than anything we’ve ever known except don’t worry everything just happens to work out by default?” My point is, blue collar/call-it-like-I-see-it folks generally don’t take any of this seriously, and I don’t think that’s necessarily due to tech-ignorance. I live in blue collar land. When I bring this up to people who can at least understand what I’m saying, they all say the same thing: we’re fucked, then, and ‘alignment research’ is thus just the chickens running around with their heads cut off (no offense intended, just calling-it-like-I-see-it). There are plenty of “blue collar” folks that are Abrahamic believers of some stripe claiming some manner of end-times prophecy is being fulfilled, of course (which I assume LW readers can recognize as not helpful, ever). But, that blue collar leans very grim on this seems relevant, still, to me—I’ve found this to be far less of an unreasonable/uninformed bias than it may appear. In my experience, these “common people” and these “obvious questions” are for the most part just ignored or trivialized by the folks who are already involved in ML enough to focus on (distract themselves by?) researching specific ML algorithms and strategies, and…we never get back to these first-principles questions. (Or do we? Links appreciated.) If ML output is ultimately irreducible, this means at least that we can’t ultimately avoid all catastrophic possibilities. Wolfram admits this much (https://www.youtube.com/watch?v=8oG1FidVE2o), which just makes his attitude weirder to me. Why, then, does he not abstract from this an overall SM conclusion? I understand the desire to be optimistic, especially if your computer modeling is successfully laying out a framework for fundamental physics (“Is the world ending? Who cares, I get to go down understanding fundamental physics, my life’s greatest ambition!”—I can’t help but picture Wolfram saying this in the bathroom mirror, perhaps daily). My pessimistic assumption is that people who’ve lived a long, fulfilling life of scientific research and self-esteem security within a pattern of research that they may well be able to continue their whole lives without ever living to be directly accountable for any wrong predictions regarding the future…yeah, my assumption is that such a person with a seemingly unqualified optimism about the future is not a very reliable source (for me or anyone else likely to outlive them), regardless of their brilliance in general. So, it’s weird. The guy who speaks so eloquently and clearly about “computational irreducibility” also doesn’t seem to associate this with AI-misalignment beyond a vague warning of incompleteness (except this could mean sub-optimal extinction, so the lack of urgency is quite odd), and he certainly doesn’t go anywhere as far as SM. Why, though? He says, with a computationally irreducible algorithm, “the passage of time corresponds to an irreducible computation that we have to run to know how it will turn out.” Ok. But then he says, “the big achievements of AI in recent times have been about making systems that are closely aligned with us humans. We train LLMs on billions of webpages so they can produce text that’s typical of what we humans write” (https://writings.stephenwolfram.com/2023/10/how-to-think-computationally-about-ai-the-universe-and-everything/). Ok, so he just used “aligned” to effectively mean Turing-test-passing, though? Successfully imitative, well-fitting to the training data. But that doesn’t mean “alignment”? Does it? That current LLMs are RLHF (or whatever other algorithms are the hip competitive thing) doesn’t mean they are “aligned” in the sense of disaster avoidance ultimately? Or does any amount of stalling extinction count as “alignment”? (I mean, is everything aligned to everything else by virtue of having existed at all in the first place? Nature’s Own Alignment? Univocity=Win? How far does this wild definitional optimism reach?) Maybe current LLMs are “aligned” in the sense of not manifesting schemer/counting problems and direct extinction-manifesting catastrophes? But they are still wide open to all manner of abuse and to thus causing widespread decoherence (in communications, in economics, in political power struggle, in every “pillar of civilization”—as TurnTrout admits)? Right? Wolfram refers to the idea that human evolution is comparable to LLM training, as if this is a comfort regarding LLMs being (therefore? implicitly?) aligned? Are LLMs going to figure out stop-committing-genocides for us? Like, now, please? Or is that not “well-fitting”? Or is it too well-fitting? Is keep-on-genociding part of “alignment”? Wolfram goes on in vaguely optimistic terms regarding the vistas of human-computational future. Not only does he ignore SM entirely, he seems to just leap to the conclusion of alignment without needing any further alignment research or…anything. Done? Solved? I don’t get it. But yeah, as far as I can tell, computational irreducibility should mean that alignment is, in any ultimate sense, impossible, irreducible, unsolvable. Prove me wrong, please (with links, arguments—your downvotes are meaningless to me, obviously).
(2) Godel Incompleteness/Completeness: I won’t bastardize the prime-number encoding, numbers-to-statements, statements-to-numbers ballet moves of these famous theorems, but my blue-collar run-down of the theorems (if I were trying to explain them to a fellow machinist in the break room) is: “you can’t have your consistency and complete it too”. Right? Pick your poison: incompleteness or inconsistency. Which is inherently more upsetting to you? The dogmatist/idealist says: “But my God, my Ideal, is both Complete and Consistent”. The Godelite zen master says, “Yeah, no, it’s isn’t, sorry [proves true statements that cannot be proved, shatters illusions, drops mic, dissolves into pari-nirvanic pure buddha-nature]”. It seems we’re still living within this existential crisis (incompleteness vs. inconsistency) the same as we’ve been living in the “nuclear era”, since around about the same time as these theorems started popping up, even though it seems mathematicians have been saying ever since, “The answer is incompleteness, you want incompleteness. Trust us. You get an incredible amount of formal consistency. And thus grant money for research. Tenure. Ok? Like, take the incompleteness. You’ll be glad you did.” I’ve been struck in my exploration of mathematics, and to some extent physics, by how on one hand famous and celebrated Godel’s Incompleteness/Completeness Theorems are, and yet on the other hand how they aren’t being referred to constantly across the sciences, all the time. They’re there, they’re important, but, like, you’d think (I would/have) that “Godel problems” should be popping up all over the place, and that this would be a constant refrain of research projects, at least the more any area of research is pushed up against some kind of singularity crisis (“Running up against Godel problems, you know how it is, everybody…”). I’ve heard folks like John Conway say that in some sense the Continuum Hypothesis has been resolved (but he was also dying when saying this, so maybe he was just comforting himself?), and I don’t know if I’ll be able to catch up to “descriptive inner model theory” (DIMT) and super-compactness theory enough to understand how/why this supposedly is what’s happening there (explanations of this in the comments would be really appreciated). But my “gut” tells me the Continuum Hypothesis is not resolved. That it will never be resolved. That it has to do with an irresolvable incompleteness problem. That it has to do with the irresolvable interdependency of the infinite and the finite as foundational, inescapable concepts. That things are looping, and/or decohering into singularities, in ways that will always be at once strange-yet-familiar, upsetting-yet-emboldening, to us prisoners of computational constraints and inconsistent reasoning trying to reason through the inconsistency of consistency and the incompleteness of constraint, caught in paradoxes we wish were the exclusive domain of stoner basement entertainment but which keep haunting serious research efforts, yet also we get to run on a long leash of formal consistency as long as we can before wearing ourselves out (going extinct?)—and can get grants, careers, awards in the meantime (treats, to keep us “good”). Something along these lines. Epistemological pessimism, basically (not “post-modernism”, blech, but what I’d call “Radical Depressive Realism”)—my gut sense is that Godel was simply the first in the context of rigorous mathematics to grope in the dark at the edge of this incompleteness-constraint/completeness-explosion, beyond which is the always-greener-pasture of stuff-we’d-need-to-be-less-computationally-bounded-to-understand-but-which-if-we-did-understand-would-also-mean-our-incoherence/extinction-anyway. You know, the good stuff. The Godel stuff. And that this means we can’t…outrace infinity? I suppose is one way to put it? So yeah, “AI-alignment” has always sounded to me a lot more like some impossible “Consistently Complete and Completely Consistent System” (being within the reach of God, that is, and thus beyond the reach of God-el—not merely waiting for God-ot; ok, I’ll stop), and less like “a careful approximation being continually updated to minimize risk in clever ways”, as a concept. I’ve assumed Godel paradoxes would render such a project impossible or at least as not-satisfying as whatever sense in which the Continuum Hypothesis is supposedly solved (except I feel like it isn’t, that it can’t be, that this too must be a kind of over-confidence optimism error). But I’m no expert on these matters, obviously. Any relevant links appreciated.
(3) Cognitive bias: not much to say about this one, really. Cognitive bias=human thought. We’re misaligned. We’re trying to be less wrong. That’s cool. We are, though, collectively, as a species, failing. Badly. Genocides badly. Civilizational collapse badly. Nuclear threat badly. The whole “better angels of our nature” illusory bubble of supposed progress has popped—and no, this doesn’t mean “worst of all possible worlds” (an idea as stupid as Leibnizian metaphysical optimism); this means non-dialectical unilateral negativity, existentially. As Yudkowsky says, cognitive bias science is “settled science”. The word “settled” there, to me, is not comforting. Quite the opposite. It is horrifyingly settled. With regard to our present contention with impending extinction, it certainly appears that all the usual biasing suspects are at play today as ever: Sunk-Cost Fallacies, False Dichotomies, Planning Fallacies…and the big over-arcing Grand Wizard of Mental Brokenness, Optimism-Bias. Are we seriously supposed to believe that we’re not going extinct, as we speak, fast, not slow? Fast, not slow. Think Kahneman. Think errors. What I’m seeing on LW is that anyone who even remotely brings up these issues in the way I’m attempting gets downvoted/ignored/buried. Why would that be? If I’m supposed to believe that it’s because everything I’m saying is completely wrong…I find this hard to believe. That I’m wrong? Sure, I expect as much, and welcome being proven wrong. But that I’m supposed to conclude with no argument, no refutation, that this whole line of thinking is somehow completely wrong? This would appear to be far from the “Way”, which is to say completely unchecked out-in-the-open optimism-bias and groupthink on a forum explicitly devoted to promoting rationalism. (Again, unless I’m missing something. Links appreciated.) Are we seriously supposed to believe alignment research or even participating in tech advancement whatsoever are productive, well-founded priorities for human beings at this point in history, that this is our only chance of an 11^th hour Hail Mary “win”? Are we not rather thinking fast, and thus badly, about this? What reason do we have, honestly, seriously, for not prioritizing intervention into child abuse to protect existing children; encouraging free-choice ethical anti-natalism to avoid layering new generations of innocent children into the mass grave which is our collective failure to accept the inevitability of death and extinction; advocating for access to a pain-free death for all people? These concepts sound a lot like sense to me, a good use of humanity’s remaining time on earth. These sound like dying with dignity. Chasing after a “Completely Consistent and Consistently Complete System”, pretending we’ve aligned ourselves to our machines, that we’ve aligned our brains’ to reality, aligned our bodies’ interests to our genes’ interests, aligned our fear of death to an everlasting immortality of cosmic colonization? This sounds like utter madness. Obvious cognitive bias. Doesn’t it? Or am I missing something, something big, something fundamental? Links appreciated.

If you haven’t already downvoted and bailed, thanks. Please enjoy my unapologetically pessimistic conclusion (extra points for not passively downvoting and bailing if you make it through this—except there aren’t any such points in the LW voting system, it seems, because it’s not weighted for rationality/overcoming bias, only for group consensus, so debating anything I say rationally is actually, in terms of the “karma” system, not worth your time):

That’s my evidence. Three things? That’s it? Yes. I get it, three things sounds like not many things. However, these three things (Computational Irreducibility, Godel Incompleteness/Completeness, Cognitive Bias) are hugely important, well-established, agreed upon concepts whose universal scope is basically taken for granted at this point—the word “universe” regularly comes up with all three of these concepts, as in they apply to everything. Right? These three things may as well be mountain ranges of unimaginable proportions that we’re claiming to be crossing on foot, in the midst of howling winter storms, and everyone is just supposed to blindly follow along without expecting to pull a global Donner Party. These three things may as well be the furthest reaches of space which we are claiming to be escaping to, and everyone’s just supposed to blindly get on the ship without expecting to die in the indignity of already-dead space (as opposed to the dignity of the CAVE--Compassionate, Accessible, Voluntary Euthanasia). There are many ways to go extinct. We appear to be actively choosing some of the worst possible ways, and choosing a better way isn’t even officially a priority. What are our priorities? Well, the best one we have, officially, is “mere survival” (denial of impending extinction)—which is almost always a bad idea. The worst and top priority we have is “survival-at-any-costs”—which is literally always a terrible idea (because, in denying the ultimate impossibility of indefinite survival, this can only result in contradictory thinking, inadequate actions--think "Custom of the Sea"). We’re just believing in alignment because we feel like it. We all know this, deep down. (Links appreciated. Proof, rational argument appreciated.) The alternative is too disturbing—accepting extinction and focusing therefore on ending existing child abuse and nurturing existing children in every way possible while we still can; focusing on administering access to pain-free voluntary death to as many humans as possible (and ideally other animals in a system beyond the animal owner's mere whims, if we can figure out the voluntarism ethics of this—we’re not even officially working on it); accepting that reproduction is itself a form of abuse, a mistake that increases suffering, increases death, and significantly decreases the already very low likelihood of already-existing and completely innocent children getting the care and protection and support they need and deserve. We can’t even officially admit that non-existent children don’t exist, and that existing ones do. We won’t even allow ourselves to go down fighting, collectively, for innocent children. We’d rather go down fighting for at turns uncontrollable and fascistically controlling anti-human tech, fighting for tower building without concern for tower collapse, fighting for the idea of clinging to an illusion of perpetual life at all costs (are we any better than any Tiplerian “Omega Cosmologist”—“We’re going to win!”, Tipler shouts, insanely), ignoring or even funding or even actively perpetrating genocides while clinging, tinkering with tech we know to be deadly beyond compare to anything we’ve ever seen. We are, collectively, as a species, as a global society, completely insane. Thanks for reading.

[-]Jonas Hallgren1mo74

Hey! I saw that you had a bunch of downvotes and I wanted to get in here before you came too disilusioned with the LW crowd. I think a big point for me is that you don't really have any sub-headings or examples that are more straight to the point. It is all a long text that seems similar to how you directly thought, this makes it really hard to engage with what you say. Of course you're saying controversial things but if there was more clarity I think you would have more engagement.

(GPT is really op for this nowadays) Anyway, I wish you the best of luck! I'm also sorry for not engaging with any of your arguments but I couldnt quite follow.

[-]Benjamin Bourlier1mo30

Thanks for your comment, I appreciate it!

That makes sense. I will try to delineate things more clearly, with sub-headings. I admit, my instinctive writing style does more or less reflect my normal train of thought. It can be easy for me to take for granted, and overlook things I assume are clear but aren't to others. Thank you for being helpful in your comment!

I'm going to try editing this post, and perhaps you'd be willing to give it another read if you care to engage with the arguments. Cheers.

[-]Mitchell_Porter1mo52

Would you say that you yourself have achieved some knowledge of what is true and what is good, despite irreducibility, incompleteness, and cognitive bias? And that was achieved with your own merely human intelligence. The point of AI alignment is not to create something perfect, it is to tilt the superhuman intelligence that is coming, in the direction of good things rather than bad things. If humans can make some progress in the direction of truth and virtue, then super-humans can make further progress.

LESSWRONG
LW

Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?

-4

-4