Back in 2016, CFAR pivoted to focusing on xrisk. I think the magic phrase at the time was:

"Rationality for its own sake, for the sake of existential risk."

I was against this move. I also had no idea how power works. I don't know how to translate this into LW language, so I'll just use mine: I was secret-to-me vastly more interested in being victimized at people/institutions/the world than I was in doing real things.

But the reason I was against the move is solid. I still believe in it.

I want to spell that part out a bit. Not to gripe about the past. The past makes sense to me. But because the idea still applies.

I think it's a simple idea once it's not cloaked in bullshit. Maybe that's an illusion of transparency. But I'll try to keep this simple-to-me and correct toward more detail when asked and I feel like it, rather than spelling out all the details in a way that turns out to have been unneeded.

Which is to say, this'll be kind of punchy and under-justified.

The short version is this:

We're already in AI takeoff. The "AI" is just running on human minds right now. Sorting out AI alignment in computers is focusing entirely on the endgame. That's not where the causal power is.

Maybe that's enough for you. If so, cool.

I'll say more to gesture at the flesh of this.

What kind of thing is wokism? Or Communism? What kind of thing was Naziism in WWII? Or the flat Earth conspiracy movement? Q Anon?

If you squint a bit, you might see there's a common type here.

In a Facebook post I argued that it's fair to view these things as alive. Well, really, I just described them as living, which kind of is the argument. If your woo allergy keeps you from seeing that… well, good luck to you. But if you're willing to just assume I mean something non-woo, you just might see something real there.

These hyperobject creatures are undergoing massive competitive evolution. Thanks Internet. They're competing for resources. Literal things like land, money, political power… and most importantly, human minds.

I mean something loose here. Y'all are mostly better at details than I am. I'll let you flesh those out rather than pretending I can do it well.

But I'm guessing you know this thing. We saw it in the pandemic, where friendships got torn apart because people got hooked by competing memes. Some "plandemic" conspiracy theorist anti-vax types, some blind belief in provably incoherent authorities, the whole anti-racism woke wave, etc.

This is people getting possessed.

And the… things… possessing them are highly optimizing for this.

To borrow a bit from fiction: It's worth knowing that in their original vision for The Matrix, the Wachowski siblings wanted humans to be processors, not batteries. The Matrix was a way of harvesting human computing power. As I recall, they had to change it because someone argued that people wouldn't understand their idea.

I think we're in a scenario like this. Not so much the "in a simulation" part. (I mean, maybe. But for what I'm saying here I don't care.) But yes with a functionally nonhuman intelligence hijacking our minds to do coordinated computations.

(And no, I'm not positing a ghost in the machine, any more than I posit a ghost in the machine of "you" when I pretend that you are an intelligent agent. If we stop pretending that intelligence is ontologically separate from the structures it's implemented on, then the same thing that lets "superintelligent agent" mean anything at all says we already have several.)

We're already witnessing orthogonality.

The talk of "late-stage capitalism" points at this. The way greenwashing appears for instance is intelligently weaponized Goodhart. It's explicitly hacking people's signals in order to extract what the hypercreature in question wants from people (usually profit).

The way China is drifting with a social credit system and facial recognition tech in its one party system, it appears to be threatening a Shriek. Maybe I'm badly informed here. But the point is the possibility.

In the USA, we have to file income taxes every year even though we have the tech to make it a breeze. Why? "Lobbying" is right, but that describes the action. What's the intelligence behind the action? What agent becomes your intentional opponent if you try to change this? You might point at specific villains, but they're not really the cause. The CEO of TurboTax doesn't stay the CEO if he doesn't serve the hypercreature's hunger.

I'll let you fill in other examples.

If the whole world were unified on AI alignment being an issue, it'd just be a problem to solve.

The problem that's upstream of this is the lack of will.

Same thing with cryonics really. Or aging.

But AI is particularly acute around here, so I'll stick to that.

The problem is that people's minds aren't clear enough to look at the problem for real. Most folk can't orient to AI risk without going nuts or numb or splitting out gibberish platitudes.

I think this is part accidental and part hypercreature-intentional.

The accidental part is like how advertisements do a kind of DDOS attack on people's sense of inherent self-worth. There isn't even a single egregore to point at as the cause of that. It's just that many, many such hypercreatures benefit from the deluge of subtly negative messaging and therefore tap into it in a sort of (for them) inverse tragedy of the commons. (Victory of the commons?)

In the same way, there's a very particular kind of stupid that (a) is pretty much independent of g factor and (b) is super beneficial for these hypercreatures as a pathway to possession.

And I say "stupid" both because it's evocative but also because of ties to terms like "stupendous" and "stupefy". I interpret "stupid" to mean something like "stunned". Like the mind is numb and pliable.

It so happens that the shape of this stupid keeps people from being grounded in the physical world. Like, how do you get a bunch of trucks out of a city? How do you fix the plumbing in your house? Why six feet for social distancing? It's easier to drift to supposed-to's and blame minimization. A mind that does that is super programmable.

The kind of clarity that you need to de-numb and actually goddamn look at AI risk is pretty anti all this. It's inoculation to zombiism.

So for one, that's just hard.

But for two, once a hypercreature (of this type) notices this immunity taking hold, it'll double down. Evolve weaponry.

That's the "intentional" part.

This is where people — having their minds coopted for Matrix-like computation — will pour their intelligence into dismissing arguments for AI risk.

This is why we can't get serious enough buy-in to this problem.

Which is to say, the problem isn't a need for AI alignment research.

The problem is current hypercreature unFriendliness.

From what I've been able to tell, AI alignment folk for the most part are trying to look at this external thing, this AGI, and make it aligned.

I think this is doomed.

Not just because we're out of time. That might be.

But the basic idea was already self-defeating.

Who is aligning the AGI? And to what is it aligning?

This isn't just a cute philosophy problem.

A common result of egregoric stupefaction is identity fuckery. We get this image of ourselves in our minds, and then we look at that image and agree "Yep, that's me." Then we rearrange our minds so that all those survival instincts of the body get aimed at protecting the image in our minds.

How did you decide which bits are "you"? Or what can threaten "you"?

I'll hop past the deluge of opinions and just tell you: It's these superintelligences. They shaped your culture's messages, probably shoved you through public school, gripped your parents to scar you in predictable ways, etc.

It's like installing a memetic operating system.

If you don't sort that out, then that OS will drive how you orient to AI alignment.

My guess is, it's a fuckton easier to sort out Friendliness/alignment within a human being than it is on a computer. Because the stuff making up Friendliness is right there.

And by extension, I think it's a whole lot easier to create/invoke/summon/discover/etc. a Friendly hypercreature than it is to solve digital AI alignment. The birth of science was an early example.

I'm pretty sure this alignment needs to happen in first person. Not third person. It's not (just) an external puzzle, but is something you solve inside yourself.

A brief but hopefully clarifying aside:

Stephen Jenkinson argues that most people don't know they're going to die. Rather, they know that everyone else is going to die.

That's what changes when someone gets a terminal diagnosis.

I mean, if I have a 100% reliable magic method for telling how you're going to die, and I tell you "Oh, you'll get a heart attack and that'll be it", that'll probably feel weird but it won't fill you with dread. If anything it might free you because now you know there's only one threat to guard against.

But there's a kind of deep, personal dread, a kind of intimate knowing, that comes when the doctor comes in with a particular weight and says "I've got some bad news."

It's immanent.

You can feel that it's going to happen to you.

Not the idea of you. It's not "Yeah, sure, I'm gonna die someday."

It becomes real.

You're going to experience it from behind the eyes reading these words.

From within the skin you're in as you witness this screen.

When I talk about alignment being "first person and not third person", it's like this. How knowing your mortality doesn't happen until it happens in first person.

Any kind of "alignment" or "Friendliness" or whatever that doesn't put that first person ness at the absolute very center isn't a thing worth crowing about.

I think that's the core mistake anyway. Why we're in this predicament, why we have unaligned superintelligences ruling the world, and why AGI looks so scary.

It's in forgetting the center of what really matters.

It's worth noting that the only scale that matters anymore is the hypercreature one.

I mean, one of the biggest things a single person can build on their own is a house. But that's hard, and most people can't do that. Mostly companies build houses.

Solving AI alignment is fundamentally a coordination problem. The kind of math/programming/etc. needed to solve it is literally superhuman, the way the four color theorem was (and still kind of is) superhuman.

"Attempted solutions to coordination problems" is a fine proto-definition of the hypercreatures I'm talking about.

So if the creatures you summon to solve AI alignment aren't Friendly, you're going to have a bad time.

And for exactly the same reason that most AGIs aren't Friendly, most emergent egregores aren't either.

As individuals, we seem to have some glimmer of ability to lean toward resonance with one hypercreature or another. Even just choosing what info diet you're on can do this. (Although there's an awful lot of magic in that "just choosing" part.)

But that's about it.

We can't align AGI. That's too big.

It's too big the way the pandemic was too big, and the Ukraine/Putin war is too big, and wokeism is too big.

When individuals try to act on the "god" scale, they usually just get possessed. That's the stupid simple way of solving coordination problems.

So when you try to contribute to solving AI alignment, what egregore are you feeding?

If you don't know, it's probably an unFriendly one.

(Also, don't believe your thoughts too much. Where did they come from?)

So, I think raising the sanity waterline is upstream of AI alignment.

It's like we've got gods warring, and they're threatening to step into digital form to accelerate their war.

We're freaking out about their potential mech suits.

But the problem is the all-out war, not the weapons.

We have an advantage in that this war happens on and through us. So if we take responsibility for this, we can influence the terrain and bias egregoric/memetic evolution to favor Friendliness.

Anything else is playing at the wrong level. Not our job. Can't be our job. Not as individuals, and it's individuals who seem to have something mimicking free will.

Sorting that out in practice seems like the only thing worth doing.

Not "solving xrisk". We can't do that. Too big. That's worth modeling, since the gods need our minds in order to think and understand things. But attaching desperation and a sense of "I must act!" to it is insanity. Food for the wrong gods.

Ergo why I support rationality for its own sake, period.

That, at least, seems to target a level at which we mere humans can act.

New to LessWrong?

New Comment
117 comments, sorted by Click to highlight new comments since: Today at 5:07 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

I wasn't convinced of this ten years ago and I'm still not convinced.

When I look at people who have contributed most to alignment-related issues - whether directly, like Eliezer Yudkowsky and Paul Christiano - or theoretically, like Toby Ord and Katja Grace - or indirectly, like Sam Bankman-Fried and Holden Karnofsky - what all of these people have in common is focusing mostly on object-level questions. They all seem to me to have a strong understanding of their own biases, in the sense that gets trained by natural intelligence, really good scientific work, and talking to other smart and curious people like themselves. But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures, awaken to their own mortality, refactor their identity, or cultivate their will. In fact, all them (except maybe Eliezer) seem like the kind of people who would be unusually averse to thinking in those terms. And if we pit their plumbing or truck-manuevering skills against those of an average person, I see no reason to think they would do better (besides maybe high IQ and general ability).

It's seemed to me that the more that people talk about "rationality trai... (read more)

I think your pushback is ignoring an important point. One major thing the big contributors have in common is that they tend to be unplugged from the stuff Valentine is naming!

So even if folks mostly don't become contributors by asking "how can I come more truthfully from myself and not what I'm plugged into", I think there is an important cluster of mysteries here. Examples of related phenomena:

  • Why has it worked out that just about everyone who claims to take AGI seriously is also vehement about publishing every secret they discover?
  • Why do we fear an AI arms race, rather than expect deescalation and joint ventures?
  • Why does the industry fail to understand the idea of aligned AI, and instead claim that "real" alignment work is adversarial-examples/fairness/performance-fine-tuning?

I think Val's correct on the point that our people and organizations are plugged into some bad stuff, and that it's worth examining that.

But as far as I know, none of them have made it a focus of theirs to fight egregores, defeat hypercreatures


Egregore is an occult concept representing a distinct non-physical entity that arises from a collective group of people.

I do know one writer who talks a lot about demons and entities from beyond the void. It's you, and it happens in some of, IMHO, the most valuable pieces you've written.

I worry that Caplan is eliding the important summoner/demon distinction. This is an easy distinction to miss, since demons often kill their summoners and wear their skin.

That civilization is dead. It summoned an alien entity from beyond the void which devoured its summoner and is proceeding to eat the rest of the world.

And Ginsberg answers: “Moloch”. It’s powerful not because it’s correct – nobody literally thinks an ancient Carthaginian demon causes everything – but because thinking of the system as an agent throws into relief the degree to which the system isn’t an agent.

But the current rulers of the universe – call them what you want, Moloch, Gnon, whatever – want us dead, and with us everything we value. Art, science, love, ph

... (read more)

I sadly don't have time to really introspect what is going in me here, but something about this comment feels pretty off to me. I think in some sense it provides an important counterpoint to the OP, but also, I feel like it also stretches the truth quite a bit: 

  • Toby Ord primarily works on influencing public opinion and governments, and very much seems to view the world through a "raising the sanity waterline" lense. Indeed, I just talked to him last morning where I tried to convince him that misuse risk from AI, and the risk from having the "wrong actor" get the AI is much less than he thinks it is, which feels like a very related topic.
  • Eliezer has done most of his writing on the meta-level, on the art of rationality, on the art of being a good and moral person, and on how to think about your own identity. 
  • Sam Bankman-Fried is also very active in political activism, and (my guess) is quite concerned about the information landscape. I expect he would hate the terms used in this post, but I expect there to be a bunch of similarities in his model of the world and the one outlined in this post, in terms of trying to raise the sanity waterline and improve the world's decision-
... (read more)
The Sam Bankman Fried reads differently now his massive fraud with FTX is public, might be worth a comment/revision? I can't help but see Sam disagreeing with a message as a positive for the message (I know it's a fallacy, but the feelings still there)
Hmm, I feel like the revision would have to be in Scott's comment. I was just responding to the names that Scott mentioned, and I think everything I am saying here is still accurate.

I wasn't convinced of this ten years ago and I'm still not convinced.

Given the link, I think you're objecting to something I don't care about. I don't mean to claim that x-rationality is great and has promise to Save the World. Maybe if more really is possible and we do something pretty different to seriously develop it. Maybe. But frankly I recognize stupefying egregores here too and I don't expect "more and better x-rationality" to do a damn thing to counter those for the foreseeable future.

So on this point I think I agree with you… and I don't feel whatsoever dissuaded from what I'm saying.

The rest of what you're saying feels like it's more targeting what I care about though:


When I look at people who have contributed most to alignment-related issues […] what all of these people have in common is focusing mostly on object-level questions.

Right. And as I said in the OP, stupefaction often entails alienation from object-level reality.

It's also worth noting that LW exists mostly because Eliezer did in fact notice his own stupidity and freaked the fuck out. He poured a huge amount of energy into taking his internal mental weeding seriously in order to never ever ever be that st... (read more)

Maybe. It might be that if you described what you wanted more clearly, it would be the same thing that I want, and possibly I was incorrectly associating this with the things at CFAR you say you're against, in which case sorry.

But I still don't feel like I quite understand your suggestion. You talk of "stupefying egregores" as problematic insofar as they distract from the object-level problem. But I don't understand how pivoting to egregore-fighting isn't also a distraction from the object-level problem. Maybe this is because I don't understand what fighting egregores consists of, and if I knew, then I would agree it was some sort of reasonable problem-solving step.

I agree that the Sequences contain a lot of useful deconfusion, but I interpret them as useful primarily because they provide a template for good thinking, and not because clearing up your thinking about those things is itself necessary for doing good work. I think of the cryonics discussion the same way I think of the Many Worlds discussion - following the motions of someone as they get the right answer to a hard question trains you to do this thing yourself.

I'm sorry if "cultivate your will" has the wrong connotations,... (read more)

There's also the skulls to consider. As far as I can tell, this post's recommendations are that we, who are already in a valley littered with a suspicious number of skulls,

turn right towards a dark cave marked 'skull avenue' whose mouth is a giant skull, and whose walls are made entirely of skulls that turn to face you as you walk past them deeper into the cave.

The success rate of movments aimed at improving the longterm future or improving rationality has historically been... not great but there's at least solid concrete emperical reasons to think specific actions will help and we can pin our hopes on that.

The success rate of, let's build a movement to successfully uncouple ourselves from society's bad memes and become capable of real action and then our problems will be solvable, is 0. Not just in that thinking that way didn't help but in that with near 100% success you just end up possessed by worse memes if you make that your explicit final goal (rather than ending up doing that as a side effect of trying to get good at something). And there's also no concrete paths to action to pin our hopes on.

“The success rate of, let's build a movement to successfully uncouple ourselves from society's bad memes and become capable of real action and then our problems will be solvable, is 0.“ I’m not sure if this is an exact analog, but I would have said the scientific revolution and the age of enlightenment were two (To be honest, I’m not entirely sure where one ends and the other begins, and there may be some overlap, but I think of them as two separate but related things) pretty good examples of this that resulted in the world becoming a vastly better place, largely through the efforts of individuals who realized that by changing the way we think about things we can better put to use human ingenuity. I know this is a massive oversimplification, but I think it points in the direction of there potentially being value in pushing the right memes onto society.

The success rate of developing and introducing better memes into society is indeed not 0. The key thing there is that the scientific revolutionaries weren't just as an abstract thinking "we must uncouple from society first, and then we'll know what to do". Rather, they wanted to understand how objects fell, how animals evolved and lots of other specific problems and developed good memes to achieve those ends.

I’m by no means an expert on the topic, but I would have thought it was a result of both object-level thinking producing new memes that society recognized as true, but also some level of abstract thinking along the lines of “using God and the Bible as an explanation for every phenomenon doesn’t seem to be working very well, maybe we should create a scientific method or something.” I think there may be a bit of us talking past each other, though. From your response, perhaps what I consider “uncoupling from society’s bad memes” you consider to be just generating new memes. It feels like generally a conversation where it’s hard to pin down what exactly people are trying to describe (starting from the OP, which I find very interesting, but am still having some trouble understanding specifically) which is making it a bit hard to communicate.

Now that I've had a few days to let the ideas roll around in the back of my head, I'm gonna take a stab at answering this.

I think there are a few different things going on here which are getting confused.

1) What does "memetic forces precede AGI" even mean?

"Individuals", "memetic forces", and "that which is upstream of memetics" all act on different scales. As an example of each, I suggest "What will I eat for lunch?", "Who gets elected POTUS?", and "Will people eat food?", respectively.

"What will I eat for lunch?" is an example of an individual decision because I can actually choose the outcome there. While sometimes things like "veganism" will tell me what I should eat, and while I might let that have influence me, I don't actually have to. If I realize that my life depends on eating steak, I will actually end up eating steak.

"Who gets elected POTUS" is a much tougher problem. I can vote. I can probably persuade friends to vote. If I really dedicate myself to the cause, and I do an exceptionally good job, and I get lucky, I might be able to get my ideas into the minds of enough people that my impact is noticeable. Even then though, it's a drop in the bucket and pretty far outside ... (read more)

I feel seen. I'll tweak a few details here & there, but you have the essence. Thank you.   Agreed. Two details: * "…we should not flinch away…" is another instance of the thing. This isn't just banishing the word "should": the ability not to flinch away from hard things is a skill, and trying to bypass development of that skill with moral panic actually makes everything worse. * The orientation you're pointing at here biases one's inner terrain toward Friendly superintelligences. It's also personally helpful and communicable. This is an example of a Friendly meme that can give rise to a Friendly superintelligence. So while sincerely asking "And then what?" is important, as is holding the preciousness of the fact that we don't yet have an answer, that is enough. We don't have to actually answer that question to participate in feeding Friendliness in the egregoric wars. We just have to sincerely ask.   Admittedly I'm not sure either. Generally speaking, viewing things as "so incredibly dangerous as to avoid out of principle" ossifies them too much. Ossified things tend to become attack surfaces for unFriendly superintelligences. In particular, being scared of how incredibly dangerous something is tends to be stupefying. But I do think seeing this clearly naturally creates a desire to be more clear and to drop nearly all "shoulding" — not so much the words as the spirit. (Relatedly: I actually didn't know I never used the word "should" in the OP! I don't actually have anything against the word per se. I just try to embody this stuff. I'm delighted to see I've gotten far enough that I just naturally dropped using it this way.)   I'm not totally sure I follow. Do you mean a hard line against "shoulding"? If so, I mostly just agree with you here. That said, I think trying to make my point more compelling would in fact be an example of the corruption I'm trying to purify myself of. Instead I want to be correct and clear. That might happen to result in what
Doh! Busted. Thanks for the reminder. Agreed. Good point. Agreed, and worth pointing out explicitly. Yes. You don't really need it, things tend to work better without it, and the fact no one even noticed that that it didn't show up in this post is a good example of that. At the same time, "I shouldn't ever use 'should'" obviously has the exact same problems, and it's possible to miss that you're taking that stance if you don't ever say it out loud. I watched some of your videos after Kaj linked one, and... it's not that it looked like you were doing that, but it looked like you might be doing that. Like there wasn't any sort of self caricaturing or anything that showed me that "Val is well aware of this failure mode, and is actively steering clear", so I couldn't rule it out and wanted to mark it as a point of uncertainty and a thing you might want to watch out for. Ah, but I never said you should try to make your point more compelling! What do you notice when you ask yourself why "X would have effect Y" led you to respond with a reason to not do X? ;)

Don't have the time to write a long comment just now, but I still wanted to point out that describing either Yudkowsky or Christiano as doing mostly object-level research seems incredibly wrong. So much of what they're doing and have done focused explicitly on which questions to ask, which question not to ask, which paradigm to work in, how to criticize that kind of work... They rarely published posts that are only about the meta-level (although Arbital does contain a bunch of pages along those lines and Prosaic AI Alignment is also meta) but it pervades their writing and thinking.

More generally, when you're creating a new field of science of research, you tend to do a lot of philosophy of science type stuff, even if you don't label it explicitly that way. Galileo, Carnot, Darwin, Boltzmann, Einstein, Turing all did it.

(To be clear, I'm pointing at meta-stuff in the sense of "philosophy of science for alignment" type things, not necessarily the more hardcore stuff discussed in the original post)

That's true, but if you are doing philosophy it is better to admit to it, and learn from existing philosophy, rather than deriding and dismissing the whole field.
This seems irrelevant to the point, yes? I think adamShimi is challenging Scott's claim that Paul & Eliezer are mostly focusing on object-level questions. It sounds like you're challenging whether they're attending to non-object-level questions in the best way. That's a different question. Am I missing your point?

Eliezer, at least, now seems quite pessimistic about that object-level approach. And in the last few months he's been writing a ton of fiction about introducing a Friendly hypercreature to an unfriendly world.

When I look at people who have contributed most to alignment-related issues - whether directly... or indirectly, like Sam Bankman-Fried

Perhaps I have missed it, but I’m not aware that Sam has funded any AI alignment work thus far.

If so this sounds like giving him a large amount of credit in advance of doing the work, which is generous but not the order credit allocation should go.

My attempt to break down the key claims here:

  • The internet is causing rapid memetic evolution towards ideas which stick in people's minds, encourage them to take certain actions, especially ones that spread the idea. Ex: wokism, Communism, QAnon, etc
  • These memes push people who host them (all of us, to be clear) towards behaviors which are not in the best interests of humanity, because Orthogonality Thesis
  • The lack of will to work on AI risk comes from these memes' general interference with clarity/agency, plus selective pressure to develop ways to get past "immune" systems which allow clarity/agency
  • Before you can work effectively on AI stuff, you have to clear out the misaligned memes stuck in your head. This can get you the clarity/agency necessary, and make sure that (if successful) you actually produce AGI aligned with "you", not some meme
  • The global scale is too big for individuals - we need memes to coordinate us. This is why we shouldn't try and just solve x-risk, we should focus on rationality, cultivating our internal meme garden, and favoring memes which will push the world in the direction we want it to go

Putting this in a separate comment, because Reign of Terror moderation scares me and I want to compartmentalize. I am still unclear about the following things:

  • Why do we think memetic evolution will produce complex/powerful results? It seems like the mutation rate is much, much higher than biological evolution.
  • Valentine describes these memes as superintelligences, as "noticing" things, and generally being agents. Are these superintelligences hosted per-instance-of-meme, with many stuffed into each human? Or is something like "QAnon" kind of a distributed intelligence, doing its "thinking" through social interactions? Both of these models seem to have some problems (power/speed), so maybe something else?
  • Misaligned (digital) AGI doesn't seem like it'll be a manifestation of some existing meme and therefore misaligned, it seems more like it'll just be some new misaligned agent. There is no highly viral meme going around right now about producing tons of paperclips.
I really appreciate your list of claims and unclear points. Your succinct summary is helping me think about these ideas. A few examples came to mind: sports paraphernalia, tabletop miniatures, and stuffed animals (which likely outnumber real animals by hundreds or thousands of times). One might argue that these things give humans joy, so they don't count. There is some validity to that. AI paperclips are supposed to be useless to humans. On the other hand, one might also argue that it is unsurprising that subsystems repurposed to seek out paperclips derive some 'enjoyment' from the paperclips... but I don't think that argument will hold water for these examples. Looking at it another way, some amount of paperclips are indeed useful. No egregore has turned the entire world to paperclips just yet. But of course that hasn't happened, else we would have already lost. Even so: consider paperwork (like the tax forms mentioned in the post), skill certifications in the workplace, and things like slot machines and reality television. A lot of human effort is wasted on things humans don't directly care about, for non-obvious reasons. Those things could be paperclips. (And perhaps some humans derive genuine joy out of reality television, paperwork, or giant piles of paperclips. I don't think that changes my point that there is evidence of egregores wasting resources.)
I think the point under contention isn't whether current egregores are (in some sense) "optimizing" for things that would score poorly according to human values (they are), but whether the things they're optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI. To this question I think the answer is a fairly clear "no", though of course this doesn't invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
I agree with you. I also don't think it matters whether the AGI will optimize for something current egregores care about. What matters is whether current egregores will in fact create AGI. The fear around AI risk is that the answer is "inevitably yes". The current egregores are actually no better at making AGI egregore-aligned than humans are at making it human-aligned. But they're a hell of a lot better at making AGI accidentally, and probably at all. So if we don't sort out how to align egregores, we're fucked — and so are the egregores.
I think I see what you mean. A new AI won't be under the control of egregores. It will be misaligned to them as well. That makes sense.
Doesn't the second part answer the first? I mean, the reason biological evolution matters is because its mutation rate massively outstrips geological and astronomical shifts. Memetic evolution dominates biological evolution for the same reason. Also, just empirically: memetic evolution produced civilization, social movements, Crusades, the Nazis, etc. I wonder if I'm just missing your question.   Both. I wonder if you're both (a) blurring levels and (b) intuitively viewing these superintelligences as having some kind of essence that either is or isn't in someone. What is or isn't a "meme" isn't well defined. A catch phrase (e.g. "Black lives matter!") is totally a meme. But is a religion a meme? Is it more like a collection of memes? If so, what exactly are its constituent memes? And with catch phrases, most of them can't survive without a larger memetic context. (Try getting "Black lives matter!" to spread through an isolated Amazonian tribe.) So should we count the larger memetic context as part of the meme? But if you stop trying to ask what is or isn't a meme and you just look at the phenomenon, you can see something happening. In the BLM movement, the phrase "Silence is violence" evolved and spread because it was evocative and helped the whole movement combat opposition in a way that supported its egregoric possession. So… where does the whole BLM superorganism live? In its believers and supporters, sure. But also in its opponents. (Think of how folk who opposed BLM would spread its claims in order to object to them.) Also on webpages. Billboards. Now in Hollywood movies. And it's always shifting and mutating. The academic field of memetics died because they couldn't formally define "meme". But that's backwards. Biology didn't need to formally define life to recognize that there's something to study. The act of studying seems to make some definitions more possible. That's where we're at right now. Egregoric zoology, post Darwin but pre Watson & Crick.
Faster mutation rate doesn't just produce faster evolution - it also reduces the steady-state fitness. Complex machinery can't reliably be evolved if pieces of it are breaking all the time. I'm mostly relying No Evolutions for Corporations or Nanodevices plus one undergrad course in evolutionary bio here. Thank you for pointing this out. I agree with the empirical observation that we've had some very virulent and impactful memes. I'm skeptical about saying that those were produced by evolution rather than something more like genetic drift, because of the mutation-rate argument. But given that observation, I don't know if it matters if there's evolution going on or not. What we're concerned with is the impact, not the mechanism.  I think at this point I'm mostly just objecting to the aesthetic and some less-rigorous claims that aren't really important, not the core of what you're arguing. Does it just come down to something like: "Ideas can be highly infectious and strongly affect behavior. Before you do anything, check for ideas in your head which affect your behavior in ways you don't like. And before you try and tackle a global-scale problem with a small-scale effort, see if you can get an idea out into the world to get help."
I like this, thank you. I score this as "Good enough that I debated not bothering to correct anything." I think some corrections might be helpful though:   While I think that's true, that's not really central to what I'm saying. I think these forces have been the main players for way, way longer than we've had an internet. The internet — like every other advance in communication — just increased evolutionary pressure at the memetic level by bringing more of these hypercreatures into contact with one another and with resources they could compete for.   Yes. I'd just want to add that not all of them do. It's just that the ones that tend to dominate tend to be unFriendly. Two counterexamples: * Science. Not as an establishment, but as a kind of clarifying intelligence. This strikes me as a Friendly hypercreature. (The ossified practices of science, like "RCTs are the gold standard" and "Here's the Scientific Method!", tend to pull toward stupidity via Goodhart. A lot of LW is an attempt to reclaim the clarifying influence of this hypercreature's intelligence.) * Jokes. These are sort of like innocuous memetic insects. As long as they don't create problems for more powerful hypercreatures, they can undergo memetic evolution and spread. They aren't particularly Friendly or unFriendly for the most part. Some of them add a little value via humor, although that's not what they're optimizing for. (The evolutionary pressure on jokes is "How effectively does hearing this joke cause the listener to faithfully repeat it?"). But if a joke were to somehow evolve into a more coherent behavior-controlling egregore, by default it'll be an unFriendly one.   Almost. I think it's more important that you have installed a system for noticing and weeding out these influences. Like how John Vervaeke argues that the Buddha's Eightfold Noble Path is a kind of virtual engine for creating relevant insight. The important part isn't the insight but is instead the engine. Because the
3Going Durden8mo
  Im not entirely convinced. Memes are parasites, and thus, aim for equilibrium with its host. Hence why memeplexes that are truly evil and omnicidal never stick, memeplexes that are relatively evil peter out, and what we are left with are memeplexes that "kinda suck I guess" at worst. Succesful memeplex is one that ensures the host's survival while forcing the host to spend maximum energy and resources spreading the memeplex without harming themselves too badly.
2the gears to ascension8mo
but the memeplexes can, at times, resist the growth of more accurate memeplexes which would ensure host survival better, because agency of the memetic networks and agency of the neural and genetic networks need not be aimed anywhere good, or even necessarily anywhere coherent in particular at times of high mutation. Notably, memeplexes that promote death and malice are more common in the presence of high rates of death and malice; death and malice are themselves self-propagating memetic diseases, in addition to whatever underlying mechanistic diseases might be causing them.
3Going Durden8mo
Of course, but IMHO they cannot do it for long, at least not on civilizational time scales. Memeplexes that ensure host survival better, and atop of that, empower the hosts, ultimately always win.  As of yet, we do not have any Deus Ex Machina to help the memeplexes exist without a host, or spread without the host being more powerful (physically, politically, socially, scientifically, technologically etc) than the hosts of other memeplexes. Over time, the memetic landscape tends to average out to begrudgingly positive and progressive, because memeplexes that fail to push the hosts forward are outcompeted.  One of the best examples of that is the memeplex of Far Right/Nazi/Fascist ideology,  which, while memetically robust, tends to shoot itself in the foot and lose the memetic warfare without much coherent opposition from the liberal memeplexes. It resurfaces all the time, but never accomplishes much, because it is more host-detrimental than it is virulent. Meanwhile, memeplexes tht are kinda-sorta wishy-washy slightly Left of center, egalitarian-ish but not too much, vaguely pro-science and mildly technological, progressive-ish but unobtrusively, tend to always win, and had been winning since the times of Babylon. They struck the perfect balance between memetic frugality, virulence, and benefiting the hosts. 
2the gears to ascension8mo
Yeah, I see we're thinking on similar terms. I was in fact thinking specifically of the pattern of authoritarian, hyper-destructive memeplexes occasionally coming back up, growing fast, and then suddenly collapsing, repeatedly; sometimes doing huge amounts of damage when this occurs. I don't think we disagree, I was just expressing another rotation of what seems to already be your perspective.
I think there's an important difference Valentine tries to make with respect to your fourth bullet (and if not, I will make). You perhaps describe the right idea, but the wrong shape. The problem is more like "China and the US both have incentives to bring about AGI and don't have incentives towards safety." Yes deflecting at the last second with some formula for safe AI will save you, but that's as stupid as jumping away from a train at the last second. Move off the track hours ahead of time, and just broker a peace between countries to not make AGI.
Ah, so on this view, the endgame doesn't look like "make technical progress until the alignment tax is low enough that policy folks or other AI-risk-aware people in key positions will be able to get an unaware world to pay it"  But instead looks more like "get the world to be aware enough to not bumble into an apocalypse, specifically by promoting rationality, which will let key decision-makers clear out the misaligned memes that keep them from seeing clearly" Is that a fair summary? If so, I'm pretty skeptical of the proposed AI alignment strategy, even conditional on this strong memetic selection and orthogonality actually happening. It seems like this strategy requires pretty deeply influencing the worldview of many world leaders. That is obviously very difficult because no movement that I'm aware of has done it (at least, quickly), and I think they all would like to if they judged it doable. Importantly, the reduce-tax strategy requires clarifying and solving a complicated philosophical/technical problem, which is also very difficult. I think it's more promising for the following reasons: * It has a stronger precedent (historical examples I'd reference include the invention of computability theory, the invention of information theory and cybernetics, and the adventures in logic leading up to Godel) * It's more in line with rationalists' general skill set, since the group is much more skewed towards analytical thinking and technical problem-solving than towards government/policy folks and being influential among those kinds of people * The number of people we would need to influence will go up as AGI tech becomes easier to develop, and every one is a single point of failure. To be fair, these strategies are not in a strict either/or, and luckily use largely separate talent pools. But if the proposal here ultimately comes down to moving fungible resources towards the become-aware strategy and away from the technical-alignment strategy, I think I (mid-tent
1 0x44 0x46 2y
  It seems to me that in 2020 the world was changed relatively quickly. How many events in history was able to shift every mind on the planet within 3 months? If it only takes 3 months to occupy the majority of focus then you have a bounds for what a Super Intelligent Agent may plan for.  What is more concerning and also interesting is that such an intelligence can make something appear to be for X but it's really planning for Y. So misdirection and ulterior motive is baked into this theory gaming. Unfortunately this can lead to a very schizophrenic inspection of every scenario as if strategically there is intention to trigger infinite regress on scrutiny.  When we're dealing with these Hyperobjects/Avatars/Memes we can't be certain that we understand the motive.  Given that we can't understand the motive of any external meme, perhaps the only right path is to generate your own and propagate that solely?
A sketch of solution that doesn't involve (traditional) world leaders could look like "Software engineers get together and agree that the field is super fucked, and start imposing stronger regulations and guidelines like traditional engineering disciplines use but on software." This is a way of lowering the cost of alignment tax in the sense that, if software engineers all have a security mindset, or have to go through a security review, there is more process and knowledge related to potential problems and a way of executing a technical solution at the last moment. However, this description is itself is entirely political not technical, yet easily could not reach the awareness of world leaders or the general populace.
Two points: 1. I have more hope than you here. I think we're seeing Friendly memetic tech evolving that can change how influence comes about. The key tipping point isn't "World leaders are influenced" but is instead "The Friendly memetic tech hatches a different way of being that can spread quickly." And the plausible candidates I've seen often suggest it'll spread superexponentially. 2. This is upstream of making the technical progress and right social maneuvers anyway. There's insufficient collective will to do enough of the right kind of alignment research. Trying anyway mostly adds to the memetic dumpster fire we're all in. So unless you have a bonkers once-in-an-aeon brilliant Messiah-level insight, you can't do this first.
  Wait, literally evolving? How? Coincidence despite orthogonality? Did someone successfully set up an environment that selects for Friendly memes? Or is this not literally evolving, but more like "being developed"? Whoa! I would love to hear more about these plausible candidates. I parse this second point as something like "alignment is hard enough that you need way more quality-adjusted research-years (QARY's?) than the current track is capable of producing. This means that to have any reasonable shot at success, you basically have to launch a Much larger (but still aligned) movement via memetic tech, or just pray you're the messiah and can singlehandedly provide all the research value of that mass movement.". That seems plausible, and concerning, but highly sensitive to difficulty of alignment problem - which I personally have practically zero idea how to forecast.

I ~entirely agree with you.


At some point (maybe from the beginning?), humans forgot the raison d’etre of capitalism — encourage people to work towards the greater good in a scalable way.  It’s a huge system that has fallen prey to Goodhart’s Law, where a bunch of Powergamers have switched from “I should produce the best product in order to sell the most” to “I should alter the customer‘s mindset so that they want my (maybe inferior) product”.  And the tragedy of the commons has forced everyone to follow suit.

Not only that, the system that could stand in the way — the government — has been captured by the same forces.  A picture of an old man wearing mittens that was shared millions of times likely had a larger impact on how people vote than actual action or policy.

I don’t know what to do about these things.  I’ve tried hard to escape the forces myself, but it’s a constant battle to not be drawn back in.  The thing I’d recommend to anyone else willing to try is to think of who your enemy is, and work hard to understand their viewpoint and how they came to it.  For most people in the US, I imagine it’s the opposite political party.   You’ll pr... (read more)

Yep. The USA Constitution was an attempt to human-align an egregore. But it was done in third person, and it wasn't mathematically perfect, so of course egregoric evolution found loopholes.   Thank you! By Karl Schroeder?
4Said Achmiz2y
I second the recommendation of Lady of Mazes (by Karl Schroeder, yes).
I third the recommendation.  I buy that book from any used bookstore I find it in, and then give it to people who can think and who are working on the future. I'm not sure if this has actually has ever moved the needle, but... it probably doesn't hurt? The theme of "getting control of your media diet" is totally pervasive in the work. One of the most haunting parts of it, for me, after all these years, is how the smartest things in the solar system take only the tiniest and rarest of sips of "open-ended information at all", because they're afraid of being hijacked by hostile inputs, which they can't not ultimately be vulnerable to, if they retain their Turing Completeness... but they have to keep risking it sometimes if they want to not end up as pure navel gazers.

I really liked this post, though I somewhat disagree with some of the conclusions. I think that in fact aligning an artificial digital intelligence will be much, much easier than working on aligning humans. To point towards why I believe this, think about how many "tech" companies (Uber, crypto, etc) derive their value, primarily, from circumventing regulation (read: unfriendly egregore rent seeking). By "wiping the slate clean" you can suddenly accomplish much more than working in a field where the enemy already controls the terrain. 

If you try to tackle "human alignment", you will be faced with the coordinated resistance of all the unfriendly demons that human memetic evolution has to offer. If you start from scratch with a new kind of intelligence, a system that doesn't have to adhere to the existing hostile terrain (doesn't have to have the same memetic weaknesses as humans that are so optimized against, doesn't have to go to school, grow up in a toxic media environment etc etc), you can, maybe, just maybe, build something that circumvents this problem entirely

That's my biggest hope with alignment (which I am, unfortunately, not very optimistic about, but I am even ... (read more)

That's a good point. I hope you're right.

Keeping your identity small posits that most of your attack surface is in something you maintain yourself. It would make sense, then, that as the sophistication of these entities increase, they would eventually start selecting for causing you to voluntarily increase your attack surface.

Tim Ferriss' biggest surprise while doing interviews for his Tools of Titans book was that 90% of the people he interviewed had some sort of meditation practice. I think that contemplative tech is already mostly a requirement for high performance in an adversarially optimizing environment.

I think statistical physics of human cooperation is the best overview of one method of studying the emergence of such hyperobjects that is basically a nascent field right now.

In a Facebook post I argued that it’s fair to view these things as alive.

Just a note, unlike in the recent past, Facebook post links seem to now be completely hidden unless you are logged into Facebook when opening them, so they are basically broken as any sort of publicly viewable resource.

Well, that's just terrible.

Here's the post:

I think the world makes more sense if you recognize humans aren't on the top of the food chain.

We don't see this clearly, kind of like ants don't clearly see anteaters. They know something is wrong, and they rush around trying to deal with it, but it's not like any ant recognizes the predator in much more detail than "threat".

There's a whole type of living being "above" us the way animals are "above" ants.

Esoteric traditions sometimes call these creatures "egregores".

Carl Jung called a special subset of them "archetypes".

I often refer to them as "memes" — although "memeplex" might be more accurate. Self-preserving clusters of memes.

We have a hard time orienting to them because they're not made of stuff we're used to thinking of as living — in basically the same way that anteaters are tricky for ants to orient to as ant-like. Wrong pheromones, wrong size, more like reality than like members of this or another colony, etc.

We don't see a fleshy body, or cells, or a molecular mechanism. So there's no organism, right?

But we have a clear intuition for life without molecular mechanisms. That's why we refer to "computer viruses" as such: the analo

... (read more)

“Sure, cried the tenant men, but it’s our land…We were born on it, and we got killed on it, died on it. Even if it’s no good, it’s still ours….That’s what makes ownership, not a paper with numbers on it."

"We’re sorry. It’s not us. It’s the monster. The bank isn’t like a man."

"Yes, but the bank is only made of men."

"No, you’re wrong there—quite wrong there. The bank is something else than men. It happens that every man in a bank hates what the bank does, and yet the bank does it. The bank is something more than men, I tell you. It’s the monster. Men made it, but they can’t control it.”
John Steinbeck, The Grapes of Wrath

The part about hypercreatures preventing coordination sounds very true to me, but I'm much less certain about this part:

Who is aligning the AGI? And to what is it aligning?

This isn't just a cute philosophy problem.

A common result of egregoric stupefaction is identity fuckery. We get this image of ourselves in our minds, and then we look at that image and agree "Yep, that's me." Then we rearrange our minds so that all those survival instincts of the body get aimed at protecting the image in our minds.

How did you decide which bits are "you"? Or what can threaten "you"?

I'll hop past the deluge of opinions and just tell you: It's these superintelligences. They shaped your culture's messages, probably shoved you through public school, gripped your parents to scar you in predictable ways, etc.

It's like installing a memetic operating system.

If you don't sort that out, then that OS will drive how you orient to AI alignment.

It seems to me that you can think about questions of alignment from a purely technical mindset, e.g. "what kind of a value system does the brain have and would the AI need to be like in order to understand that", and that this kind of technical thinking is much less affe... (read more)

I agree that thinking about alignment from a purely technical mindset provides a dissociative barrier that helps to keep the hypercreatures at bay. However, I disagree with the implication that this is "all good". When you're "removed" like that, you don't just cut the flow of bad influences. You cut everything which comes from a tighter connection to what you're studying. If you're a doctor treating someone close to you, this "tighter connection" might bring in emotions that overwhelm your rational thinking. Maybe you think you "have to do something" so you do the something that your rational brain knows to have been performing worse than "doing nothing" in all the scientific studies. Or... maybe you have yourself under control, and your intuitive knowledge of the patient gives you a feel of how vigilant they would be with physical therapy, and maybe this leads to different and better decisions than going on science alone when it comes to "PT or surgery?". Maybe your caring leads you to look through the political biases because you care more about getting it right than you do about the social stigma of wearing masks "too early" into the pandemic. So if you want to be a good doctor to those you really care about, what do you do? In the short term, if you can't handle your emotions, clearly you pass the job off to someone else. Or if you must, you do it yourself while "dissociated". You "Follow accepted practice", and view your emotions as "false temptations". In the long term though, you want to get to the place where such inputs are assets rather than threats, and that requires working on your own inner alignment. In the example I gave, the some of the success of being "more connected" came from being more connected to your patient than you are to the judgment of the world at large. Maybe cutting off twitter would be a good start, since that's where these hypercreatures live, breed, and prey on minds. I think "How active are the leading scientists on twitter?"
If this was true, then any attempt to improve your rationality or reduce the impact of hypercreatures on your mind would be doomed, since they would realize what you're doing and prevent you from doing it. In my model, "hypercreatures" are something like self-replicating emotional strategies for meeting specific needs, that undergo selection to evolve something like defensive strategies as they emerge. I believe Val's model of them is similar because I got much of it from him. :)  But there's a sense in which the emotional strategies have to be dumber than the entire person. The continued existence of the strategies requires systematically failing to notice information that's often already present in other parts of the person's brain and which would contradict the underlying assumptions of the strategies (Val talks a bit about how hypercreatures rely on systematically cutting off curiosity, at 3:34 - 9:22 of this video).  And people already do the equivalent of "doing a thing which might lead to the removal of the hypercreature". For instance, someone may do meditation/therapy on an emotional issue, heal an emotional wound which happens to also have been the need fueling the hypercreature, and then find themselves being unexpectedly more calm and open-minded around political discussions that were previously mind-killing to them. And rather than this being something that causes the hypercreatures in their mind to make them avoid any therapy in the future, they might find this a very positive thing that encourages them to do even more therapy/meditation in the hopes of (among other things) feeling even calmer in future political discussions. (Speaking from personal experience here.) I agree, in part. Hypercreatures are instantiated as emotional strategies that fulfill some kind of a need. Though "the person perceives it to have value" suggests that it's a conscious evaluation, whereas my model is that the evaluation is a subconscious one. Which makes something lik
I'm in agreement with a lot of what you're saying. I agree that people's "perceptions of value", as it pertains to what influences them, are primarily unconscious. I agree that "possession" can be a usefully accurate description, from the outside. I agree that people can do "things which might lead to the removal of the hypercreature", like meditation/therapy, and that not only will it sometimes remove that hypercreature but also that the person will sometimes be conditioned towards rather than away from repeating such things. I agree that curiosity getting killed is an important part of their stability, that this means that they don't update on information that's available, and that this makes them dumb. I agree that *sometimes* people can be "smarter than their hypercreature" in that they can be aware of and reason about things about which their hypercreatures cannot due to said dumbness. I disagree about the mechanisms of these things. This leads me to prefer different framings, which make different predictions and suggest different actions. I think I have about three distinct points. 1) When things work out nicely, hypercreatures don't mount defenses, and the whole thing get conditioned towards rather than away from, it's not so much "hypercreatures too dumb because they didn't evolve to notice this threat", it's that you don't give them the authority to stop you. From the inside, it feels more like "I'm not willing to [just] give up X, because I strongly feel that it's right, but I *am* willing to do process Y knowing that I will likely feel different afterwards. I know that my beliefs/priorities/attachments/etc will likely change, and in ways that I cannot predict, but I anticipate that these changes will be good and that I won't lose anything not worth losing. And then when you go through the process and give up on having the entirety of X, it feels like "This is super interesting because I couldn't see it coming, but this is *better* than X in ever
You might like to know: I debated erasing that part and the one that followed, thinking of you replying to it! :-D But I figured hey, let's have the chat. :-)   Yep, I know it seems that way. And I disagree. I think it maintains a confusion about what "alignment" is. However, I'm less certain of this detail than I am about the overall picture. The part that has me say "We're already in AI takeoff." Which is why I debated erasing all the stuff about identity and first-person. It's a subtle point that probably deserves its own separate post, if I ever care to write it. The rest stands on its own I think. But! Setting that aside for a second: To think of "questions of alignment from a purely technical mindset", you need to call up an image of each of: * the AI * human values * some process by which these connect But when you do this, you're viewing them in third person. You have to call these images (visual or not) in your mind, and then you're looking at them. What the hell is this "human values" thing that's separable from the "you" who's looking? The illusion that this is possible creates a gap that summons Goodhart. The distance between your subjective first-person experience and whatever concept of "human values" you see in third person is precisely what summons horror. That's the same gap that unFriendly egregores use to stupefy minds. You can't get around this by taking "subjectivity" or "consciousness" or whatever as yet another object that "humans care about". The only way I see to get around this is to recognize in immediate direct experience how your subjectivity — not as a concept, but as a direct experience — is in fact deeply inextricable from what you care about. And that this is the foundation for all care. When you tune a mind to correctly reflect this, you aren't asking how this external AI aligns with "human values". You're asking how it synchs up with your subjective experience. (Here minds get super squirrely. It's way, way too
Any further detail you'd like to give on what constitutes "synching up with your subjective experience" (in the sense relevant to making an intelligence that produces plans that transform the world, without killing everyone)? :)
Not at the moment. I might at some other time. This is a koan-type meditative puzzle FWIW. A hint: When you look outside and see a beautiful sky, you can admire it and think "Wow, that's a beautiful sky." But the knowing had to happen before the thought. What do you see when you attend to the level of knowing that comes before all thought? That's not a question to answer. It's an invitation to look. Not meaning to be obtuse here. This is the most direct I know how to be right now.
Ok thanks.

I agree with most of what I think you're saying, for example that the social preconditions for unfriendly non-human AGI are at least on the same scale of importance and attention-worthiness as technical problems for friendly non-human AGI, and that alignment problems extend throughout ourselves and humanity. But also, part of the core message seems to be pretty incorrect. Namely:

Anything else is playing at the wrong level. Not our job. Can't be our job. Not as individuals, and it's individuals who seem to have something mimicking free will.

This sounds like you're saying, it's not "our" (any of our?) job to solve technical problems in (third person non-human-AGI) alignment. But that seems pretty incorrect because it seems like there are difficult technical obstacles to making friendly AGI, which take an unknown possibly large amount of time. We can see that unfriendly non-human very-superhuman AGI is fairly likely by default given economic incentives, which makes it hard for social conditions to be so good that there isn't a ticking clock. Solving technical problems is very prone to be done in service of hostile / external entities; but that doesn't mean you can get good outcomes without solving technical problems.

1Rene de Visser2y
What do you mean by "technical" here?  I think solving the alignment problem for government, corporations, and other coallitions would probably help solving the alignment problem in AGI. I guess you are saying that even if we could solve the above alignment problems it would still not go all the way to solving it for AGI? What particular gaps are you thinking of?
Yeah, mainly things such that solving them for human coalitions/firms doesn't generalize. It's hard to point to specific gaps because they'll probably involve mechanisms of intelligence, which I / we don't yet understand. The point is that the hidden mechanisms that are operating in human coalitions are pretty much just the ones operating in individual humans, maybe tweaked by being in a somewhat different local context created by the coalition (Bell Labs, scientific community, job in a company, role in a society, position in a government, etc. etc.). We're well out of distribution for the ancestral environment, but not *that* far out. Humans, possibly excepting children, don't routinely invent paradigm-making novel cognitive algorithms and then apply them to everything; that sort of thing only happens at a super-human level and what effects on the world it's pointed at are not strongly constrained by it's original function. By "technical" I don't mean anything specific, exactly. I'm gesturing vaguely at the cluster of things that look like math problems, math questions, scientific investigations, natural philosophy, engineering; and less like political problems, aesthetic goals, lawyering, warfare, cultural change. The sort of thing that takes a long time and might not happen at all because it involves long chains of prerequisites on prerequisites. Art might be an example of something that's not "technical" but still matches this definition; I don't know the history but from afar it seems like there's actually quite a lot of progress in art and it's somewhat firmly sequential / prerequisited, like perspective is something you invent, and you only get cubism after perspective, and cubism seems like a stepping stone towards more abstractionism.... So if the fate of everything depended on artistic progress, we'd want to be persistently working on art, refining and discovering concepts, even if we weren't pure of soul.
1Rene de Visser2y
How do you know they don't generalize? As far as I know, no one has solved these problems for coallitions of agents, regardless of human, theoritical or otherwise.
Well the standard example is evolution: the compact mechanisms discovered first by the gradient-climbing search for fit organisms generalized to perform effectively in many domains, but not particularly to maximize fitness---we don't monomaniacally maximize number of offspring (which would improve our genetic fitness a lot relative to what we actually do). Human coalitions are made of humans, and humans come ready built with roughly the same desires and shape of cognition as you. That makes them vastly easier to interface with and approximately understand intuitively.
3Rene de Visser2y
I was thinking specifically here of maximizing the value function (desires) across the agents  interacting with other. Or more specially adapting the system in a way that it self maintains "maximizing the value function (desires) across the agents" property. An example is an ecomonic system which seeks to maximize the total wealthfare. Current systems though don't maintain themselves. More powerful agents take over the control mechanisms (or adjust the market rules) so that they are favoured (lobbying, cheating, ignoring the rules, mitageting enforcement). Similar problems occur in other types of coallitions. Postulating a more powerful agent that forces this maximization property (an aligned super AGI) is cheating unless you can describe how this agent works and self maintains itself and this goal. However coming to a solution of a system of agents that self maintains this property with no "super agent" might lead to solutions for AGI alignment, or might prevent the creation of such a misaligned agent. I read a while ago the design/theoritics of corruption resistent systems is an area that has not received much research.
I doubt that because intelligence explosions or their leadups make things local.
I actually think it necessarily does, and that the method by which unFriendly egregores control us exploits and maintains a gap in our thinking that prevents us from solving the AGI alignment problem. However! That's up for debate, and given the uncertainty I think you highlighting this concern makes sense. You might turn out to be right. (But I still think sorting out egregoric Friendliness is upstream to solving the technical AI alignment problem even if the thinking from one doesn't transfer to the other.)
I'm skeptical but definitely interested, if you have already expanded or at some point expand on this. E.g. what can you say about what precisely this method is; what's the gap it maintains; why do you suspect it prevents us from solving alignment; what might someone without this gap say about alignment; etc. Leaving aside the claim about upstreamness, I upvote keeping this distinction live (since in fact I think an almost as strong version of the claim as you seem to, but I'm pretty skeptical about the transfer).
I haven't really detailed this anywhere, but I just expanded on it a bit in my reply to Kaj.
Right, but I mean something precise by that. I agree with you. There's a technical problem, and it takes intelligent effort over time to solve it. And that's worthwhile. It's also not up to any one individual whether or how that happens, and choice only ever happens (for now) at the scale of individuals. So "should"s applied at the egregore scale don't make any coherent sense. They're mostly stupefying forces, those "should"s. If you want to work on technical AGI alignment, great! Go for it. I don't think that's a mistake. I also don't think it's a mistake to look around and model the world and say "If we don't sort out AI alignment, we all die." But something disastrous happens when people start using fear of death to try to pressure themselves and others to do collective action. That's super bad. Demon food. Really, really awful. Stupefying and horror-summoning. I think getting super clear about that distinction is upstream of anyone doing any useful work on AI alignment. I could be wrong. Maybe we've made enough collective (i.e., egregoric) progress on this area that the steps remaining for the technical problem aren't superhuman. Maybe some smart graduate student could figure it out over this summer. I really wouldn't bet on it though.
Hrm... I agree with what you say in this comment, but I still don't get how it's coherent with what you said here: I guess if I interpret "our job" as meaning "a Should that is put on an individual by a group" then "Not our job" makes sense and I agree. I want to distinguish that from generally "the landscape of effects of different strategies an individual can choose, as induced by their environment, especially the environment of what other people are choosing to do or not do". "Role" sort of means this, but is ambiguous with Shoulds (as well as with "performance" like in a staged play); I mean "role" in the sense of "my role is to carry this end of the table, yours is to carry that end, and together we can move the table". So I'm saying I think it makes sense to take on, as individuals, a role of solving technical alignment. It sounds like we agree on that.... Though still, the sentence I quoted, "Anything else is playing at the wrong level", seems to critique that decision if it trades off against playing at the egregore level. I mostly disagree with that critique insofar as I understand it. I agree that the distinction between being Shoulded into pretending to work on X, vs. wanting to solve X, is absolutely crucial, but avoiding a failure mode isn't the only right level to exercise free will on, even if it's a common and crucial failure mode.
Mmm, there's an ambiguity in the word "our" that I think is creating confusion. When I say "not our job", what I mean is: it's not up to me, or you, or Eliezer, or Qiaochu, or Sam, or…. For every individual X, it's not X's job. Of course, if "we/us" is another name for the superduperhypercreature of literally all of humanity, then obviously that single entity very much is responsible for sorting out AI risk. The problem is, people get their identity confused here and try to act on the wrong level. By which I mean, individuals cannot control beyond their power range. Which in practice means that most people cannot meaningfully affect the battlefield of the gods. Most applications of urgency (like "should"ing) don't track real power. "Damn, I should exercise." Really? So if you in practice cannot get yourself to exercise, what is that "should" doing? Seems like it's creating pain and dissociating you from what's true. "Damn, this AI risk thing is really big, we should figure out alignment" is just as stupid. Well, actually it's much more so because the gap between mental ambition and real power is utterly fucking gargantuan. But we'll solve that by scaring ourselves with how big and important the problem is, right? This is madness. Stupefaction. Playing at the wrong level. (…which encourages dissociation from the truth of what you actually can in fact choose, which makes it easier for unFriendly hypercreatures to have their way with you, which adds to the overall problem.) Does that make more sense?
[I'll keep going since this seems important, though sort of obscured/slippery; but feel free to duck out.] I think there's also ambiguity in "job". I think it makes sense for it to be "up to" Eliezer in the sense of being Eliezer's role (role as in task allocation, not as in Should-field, and not as in performance in a play). Like, I think I heard the OP as maybe saying "giving and taking responsibility for AI alignment is acting at the wrong level", which is ambiguous because "responsibility" is ambiguous; who is taking whom to be responsible, and how are they doing that? Are they threatening punishment? Are they making plans on that assumption? Are they telling other people to make plans on that assumption? Etc. I think we agree that: * Research goes vastly or even infinitely better when motivated by concrete considerations about likely futures, or by what is called curiosity. * Doubling down on Shoulds (whether intra- or inter-personal) is rarely helpful and usually harmful. * Participating in Shoulds (giving or receiving) is very prone to be or become part of an egregore. I don't know whether we agree that: * There is a kind of "mental ambition" which is the only thing that has a chance at crossing the gap to real power from where any of us is, however utterly fucking gargantuan that gap may be. * There is a way of being scared about the problem (including how big and important it is, though not primarily in those words) that is healthy and a part of at least one correct way of orienting. * Sometimes "Damn, I should exercise" is what someone says when they feel bloopiness in their body and want to move it, but haven't found a fun way to move their body. * It's not correct that "Sorting out AI alignment in computers is focusing entirely on the endgame. That's not where the causal power is.", because ideas are to a great extent had by small numbers of people, and ideas have a large causal effect on what sort of control ends up being exercised. I could

You are 200% right. This is the problem we have to solve, not making sure a superintelligent AI can be technologically instructed to serve the whims of its creators. 

Have you read Scott Alexander's Meditations on Moloch? It's brilliant, and is quite adjacent to the claims you are making. It has received too little follow-up in this community.

The implicit question is – if everyone hates the current system, who perpetuates it? And Ginsberg answers: “Moloch”. It’s powerful not b

... (read more)
Thanks for quoting the bit about Elua at the end. It is helpful to remember that despite Moloch, et al, humanity has managed some pretty impressive feats, even in the present day. It's easy to think that the counterexample of science in earlier posts is something accomplished "Once upon a time in a land far away." As a concrete example, I'm quite glad that the highly effective mRNA vaccines (Moderna/Pfizer) exist for the common man. They exist despite things like the FDA, the world of academic publishing, the need to find funding to survive, and so on.
Absolutely! You might like David Deutsch's book "The Beginning of Infinity". I read an early draft of one of the chapters (something about "cultural evolution" I think) several years ago. That got me thinking seriously about all this even moreso than Scott's brilliant essay.
Looks interesting, thanks for the recommendation! 

This is an interesting idea. Note that superforecasters read more news than the average person, and so are online a significant amount of time, yet they seem unaffected (this could be for many reasons, but is weak evidence against your theory). I’d like to know whether highly or moderately successful people, especially in the EA-sphere, avoid advertising and other info characterized as malicious by your theory. Elon Musk stands out as very online, yet very successful, but the way he is spending his money certainly is not optimized to prevent his fears of e... (read more)

Note that superforecasters read more news than the average person, and so are online a significant amount of time, yet they seem unaffected (this could be for many reasons, but is weak evidence against your theory).

I like this example.

Superforecasters are doing something real. If you make a prediction and you can clearly tell whether it comes about or not, this makes the process of evaluating the prediction mostly immune to stupefaction.

Much like being online a lot doesn't screw with your ability to shoot hoops, other than maybe taking time away from practice. You can still tell whether the ball goes in the basket.

This is why focusing on real things is clarifying. Reality reflects truth. Is truth, really, although I imagine that use of the word "truth" borders on heresy here.

Contrast superforecasters with astrologers. They're both mastering a skillset, but astrologers are mastering one that has no obvious grounding. Their "predictions" slide all over the place. Absolutely subject to stupefaction. They're optimizing for something more like buy-in. Actually testing their art against reality would threaten what they're doing, so they throw up mental fog and invite you to do the same.

W... (read more)

2Garrett Baker2y
In this case, it sounds like your theory is (I'd say 'just' here but that indicates the framing serves no purpose, and it may) a different framing on simulacra levels. In particular, most of the adversarial behavior you postulate can be explained in terms of orgs simply discovering that operating on the current simulacra level & lying about their positions is immediately beneficial. Are there nuances I'm not getting here?
That might be the same thing. I haven't familiarized myself with the simulacra levels theory. What I've gained by osmosis suggests a tweak though: It's more like, a convergent evolutionary strategy for lots of unFriendly egregores is to keep social power and basement-level reality separate. That lets the egregores paint whatever picture they need, which speeds up weapon production as they fight other egregores for resources. Some things like plumbing and electrician work are too real to do this to, and too necessary, so instead they're kept powerless over cultural flow. So it's not really about lying about which level they're at per se. It's specifically avoiding basement truth so as to manufacture memetic weapons at speed. …which is why when something real like a physical pandemic-causing virus comes storming through, stupefied institutions can't handle it as a physically real phenomenon. I'm putting basement-level reality in a special place here. I think that's simulacrum level 1, yes? It's not just "operating at different levels", but specifically about clarity of connection to things like chairs and food and kilowatt-hours. But hey, maybe I just mean what you said!
2Garrett Baker2y
That sounds like a different process than simulacra levels. If you want to convince me of your position, you should read into the simulacra levels theory, and find instances where level changes in the real world happen surprisingly faster than what the theory would predict, or with evidence of malice on the part of orgs. Because from my perspective all the evidence you've presented so far is consistent with the prevalence of simulacra collapse being correlated with the current simulacra level in an institution, or memes with high replication rate being spread further than those with low replication rate. No postulation of surprisingly competent organizations. Ex. If plumbers weren't operating at simulacra level 1, their clients will become upset with them on the order of days, and no longer buy their services. But if governments don't operate at simulacra level 1 wrt pandemic preparedness, voters will become upset with them on the order of decades, then vote people out of office. Since the government simulacra collapse time is far longer than the plumber simulacra collapse time, (ignoring effects like 'perhaps simulacra levels increase faster on the government scale, or decrease faster on the plumbing industry scale during a collapse') governments can reach far greater simulacra levels than the plumbing industry. Similar effects can be seen in social change. I would try to find this evidence myself, but it seems we very likely live in a world with surprisingly incompetent organizations, so this doesn't seem likely enough for me to expend much willpower looking into it (though I may if I get in the mood).

Who is aligning the AGI? And to what is it aligning?

Generally, I tend to think of "how do we align an AGI to literally anyone at all whatsoever instead of producing absolutely nothing of value to any human ever" as being a strict prerequisite to "who to align to"; the former without the latter may be suboptimal, but the latter without the former is useless.

My guess is, it's a fuckton easier to sort out Friendliness/alignment within a human being than it is on a computer. Because the stuff making up Friendliness is right there.

I don't think this is a given.... (read more)

Ok. So suppose we build a memetic bunker. We protect ourselves from the viral memes. A handful of programmers, aligned within themselves, working on AI. Then they solve alignment. The AI is very powerful and fixes everything else. 

Lovely fantasy. Good luck!

My conclusion: Let's start the meme that Alignment (the technical problem) is fundamentally impossible (maybe it is? why think you can control something supposedly smarter than you?) and that you will definitely kill yourself if you get to the point where finding a solution to Alignment is what could keep you alive. Pull a Warhammer 40k, start banning machine learning, and for that matter, maybe computers (above some level of performance) and software. This would put more humans in the loop for the same tasks we have now, which offers more opportunities to... (read more)

I can't upvote this enough. This is exactly how I think about it, and why I have always called myself a mystic. I have an unusual brain and I am prone to ecstatic possession experiences, particularly while listening to certain types of music. The worst thing is, people like me used to become shamans and it used to be obvious to everybody that egregores - spirits - are the most powerful force in the world - but Western culture swept that under the rug and now they are able to run amok with very few people able to perceive them. I bet if you showed a tribal ... (read more)

Gosh, um…

I think I see where you are, and by my judgment you're more right than wrong, but from where I stand it sure looks like pain is still steering the ship. That runs the risk of breaking your interface to places like this.

(I think you're intuiting that. Hence the "crazy alert".)

I mean, vividly apropos of what you're saying, it looks to me like you've rederived a lot of the essentials of how symbiotic egregores work, what it's like to ally with them, and why we have to do so in order to orient to the parasitic egregores.

But the details of what you mean by "religion" and "cult" matter a lot, and in most interpretations of "extremely missionary" I just flat-out disagree with you on that point.

…the core issue being that symbiotic memes basically never push themselves onto potential hosts.

You actually hint at this: 

And it must be able to do this while they know it perfectly well, and consent before joining to begin with to its doing so - as obviously one which does so without consent is not aligned to true human values, even though, ironically, it has to be so good at rhetoric that consent is almost guaranteed to be given.

But I claim the core strategy cannot be rhetoric. The ... (read more)

Reading this evoked an emotional reaction of stress and anxiety, the nature of which I am uncertain, so take that into consideration as you read my response. I'm not sure what "pain is steering the ship" means but it's probably true. I am motivated almost entirely by fear - and of course, by ecstasy, which is perhaps its cousin. And in particular I desperately fear being seen as a lunatic. I have to hold back, hard, in order to appear as sane as I do. Or - I believe that I have to and in fact do that, at least. I have grown up in an area surrounded by fundamentalistic religiosity. I do not see people converting to believe in science and rationality. I do not see people independently recognizing that the ex-president they voted for and still revere almost as a god lied to them. If the truth was pushed on them in an efficient way that took into account their pre-existing views and biases and emotional tendencies, they would. But the truth does not win by default - it has no teeth. Only people optimizing to modify one another's minds actually enables the spread of any egregore. The reason science succeeded is that it produces results that cannot be denied - but most truths are much subtler and easier to deny. Rationalists will fail to save the world if they cannot lie, manipulate, simplify, and use rhetoric, in order to make use of the manpower of those not yet rational enough to see through it, while maintaining their own perception of truth internally unscathed. But a devotion to truth above pragmatism will kill the entire human race. This sounds reasonable but I don't see it happening in real life. Can you point me to some examples of this actually working that don't involve physically demonstrating something before people's senses as science does (and remember, there are still many, many people who believe in neither evolution nor global warming)? Christ (that is, whichever variant of the Christianity egregore has possessed them) offers real value to his belie
Well…I’d definitely read the book.
For my own take on this, read this Spoiler Alert: I cover the same theme that AI-PONR has already happened in my TL;DR

I think I follow what you're saying, and I think it's consistent with my own observations of the world.

I suspect that there's a particular sort of spirituality-adjacent recreational philosophy whose practice may make it easier to examine the meta-organisms you're describing. Even with it, they seem to often resist being named in a way that's useful when speaking to mixed company.

Can you point out some of the existing ones that meet your definition of Friendly?

Absolutely. I used to feel victimized by this. Now I just build immunity, speak freely, and maybe some folk will hear me.   That's a beautiful question. It'd be my pleasure. I tend to focus more on the unFriendly ones since they actively blind people. So I've thought in less detail about this branch of the memetic zoo. But I'll share a few bits I think are good examples: * The art of knowing. Greek "mathema", Latin "sciencia". This is a wisdom thread that keeps building and rebuilding clarity & sanity. It's the core seed of deeply and humiliatingly sincere curiosity that gave birth to mathematics & science. * Not to be confused with the content or methods of modern social institutions though. Anytime you ossify parts of Friendly memes (like with RCTs or "the scientific method"), they provide an attack surface for Goodhart and thus for unFriendly hypercreatures. As a rule, unFriendly egregores lose power and "food" when truth rings through clearly, so they keep numbing and fogging access to the art of knowing wherever they can. (Ergo why e.g. most math classes teach computation, not math, and usually they do so via dominance & threat and in ways that feel utterly pointless and body-denying to the students.) * I think this is the core breath of life in Less Wrong. A deep remembrance that this is real and matters. (Sometimes LW drifts toward ossification though. Focus on Bayes and biases and the like tends in this direction.) * Mettā. Not the meditative practice of loving-kindness, but what which those meditations cultivate. The kindness with no opposite and no exceptions. The Buddha called this an "bramavihara", which you can translate as something like "endless abode". It's something like the inner remembrance of Friendliness that's felt, not (just) understood, such that it shapes your thinking and behavior "from below". Much like the art of knowing, this is something you embody and become, not just cognitively understand. * Death's clarity. Super unpop

I didn't get the 'first person' thing at first (and the terminal diagnosis metaphor wasn't helpful to me). I think I do now.

I'd rephrase it as "In your story about how the Friendly hypercreature you create gains power, make sure the characters are level one intelligent". That means creating a hypercreature you'd want to host. Which means you will be its host.

To ensure it's a good hypercreature, you need to have good taste in hypercreatures. Rejecting all hypercreatures doesn't work—you need to selectively reject bad hypercreatures.

This packs real emotional punch! Well done!

A confusion: in what way is our little project here not another egregore, or at least a meta-egregore? 

It is. It's just not yet fully self-aware. I'm inviting a nudge in that direction.

What kind of thing is wokism? Or Communism? What kind of thing was Naziism in WWII? Or the flat Earth conspiracy movement? Q Anon?

I'd say they are alliances, or something like it.

You can't achieve much on your own; you need other people to specialize in getting information about topics you haven't specialized in, to handle numerous object-level jobs that you haven't specialized in, and to lead/organize all of the people you are dependent on.

But this dependency on other people requires trust, particularly in the leaders. So first of all, in order for you to... (read more)

This is in fact my stance. That didn't come across clearly in the OP. But e.g. science arose as a sane egregore, at least at first. (Though not all of "science", which has been significantly Goodharted in favor of less Friendly hypercreatures.)

Note A- I assert that what the original author is getting at is extremely important. A lot of what's said here is something I would have liked to say but couldn't find a good way to explain, and I want to emphasize how important this is.

Note B- I assert that a lot of politics is the question of how to be a good person. Which is also adjacent to religion and more importantly, something similar to religion but not religion, which is basically, which egregore should you worship/host. I think that the vast majority of a person's impact in this world is what hy... (read more)

" I think that the vast majority of a person's impact in this world is what hyperbeings he chooses to host/align to, with object level reality, barely even mattering." I agree that personal values (no need to mystify) are important, but action is equally important. You can be very virtuous, but if you don't take action (by, for instance, falling for the Buddhist-like fallacy that sitting down and meditating will eventually save the world by itself), your impact will be minor. Specially in critical times like this. Maybe sitting down and meditating would be ok centuries ago where no transformative technologies were in sight. Now, with transformative technologies decade(s) off, it's totally different. We do have to save the world. "I assert that alignment is trivially easy." How can you control something vastly more intelligent than yourself (at least in key areas), or that can simply re-write its own code and create sub-routines, therefore bypassing your control mechanisms? Doesn't seem easy at all. (In fact, some people like Roman Yalmpolsky have been writing papers on how it's in fact impossible.) Even with the best compiler in the world (not no mention that nothing guarantees that progress in compilers will accompany progress in black boxes like neural networks). "If our a measurement of "do I like this" is set to "does it kill me" I see this at best ending in a permanent boxed garden where life is basically just watched like a video and no choices matter, and nothing ever changes, and at worst becoming an eternal hell (specifically, an eternal education camp), with all of the above problems but eternal misery added on top." I agree with this. The alignment community is way over-focused on x-risk and way under-focused on s-risk. But after this, your position becomes a bit ambiguous. You say: "We should work together, in public, to develop AI that we like, which will almost certainly be hostile, because only an insane and deeply confused AI would possibly be
The concepts used should not be viewed as mystical, but as straightforward physical objects. I don't think personal values is a valid simplification. Or rather, I don't think there is a valid simplification, hence why I use the unsimplified form. Preferably, egregore or hyperbeing, or shadow, or something, should just become an accepted term, like dog, or plane. If you practice "seeing" them, they should exist in a completely objective and observable sense. My version of reality isn't like a monk doing meditation to sense the arcane energies of higher beings flowing through the zeitgeist. It's more, hey look, a super-macroscopic aggregator just phase shifted. It's like seeing water turn to ice, not... angels on the head of a pin? I agree that I'm having trouble explaining myself. I blame the english language. I hold that the most important action a person is likely to make in his life is to check a box on a survey form. I think people should get really good at checking the right box. Really really good in fact. This is a super critical skill that people do not develop enough. It's amazing how easily success flows in an environment where everyone checks the right boxes and equally how futile any course of action becomes when the wrong boxes are checked. Note: I think trying to control something vastly more intelligent than yourself is a [very bad idea], and we should [not do that]. In practice, the primary recommendation here is simply and only, to stop using the term "friendly AI" and instead use a better term, the best I can come up with is "likable AI". In theory, the two terms are the same. I'm not really calling for that deep a change in motte space. In practice, I find that "friendly AI" comes with extremely dangerous baggage. This also shifts some focus from the concept of "who would you like to live with" towards "who would you like to live as". I also want an open source human centric project and am opposed to closed source government run AI projects. I
"I hold that the most important action a person is likely to make in his life is to check a box on a survey form." If only life (or, more specifically, our era, or even more specifically, AI alignment) was that simple. Yes, that's the starting point, and without it you can do nothing good. And yes, the struggle between the fragility of being altruistic and the less-fragility of being machiavellic has never been more important. But unfortunately, it's way more complicated than that. Way more complicated than some clever mathematical trick too. In fact, it's the most daunting scientific task ever, which might not even be possible. Mind you that Fermat's last theorem took 400 years to prove, and this is more than 400 times more complicated. It's simple: how to control something a) more intelligent than ourselves, b) that can re-write its own code and create sub-routines therefore bypassing our control mechanisms. You still haven't answered this. You say that we can't control something more intelligent than ourselves. So where does that leave us? Just create the first AGI, tell it to "be good" and just hope that it won't be a sophist? That sounds like a terrible plan, because our experience with computers tells us that they are the biggest sophists. Not because they want to! Simply because effectively telling them how to "do what I mean" is way harder than telling another human. Any programmer would agree a thousand times. Maybe you anthropomorphize AGI too much. Maybe you think that, because it will be human-level, it will also be human like. Therefore it will just "get" us, we just need to make sure that the first words it hears is "be good" and never "be evil". If so, then you couldn't be more mistaken. Nothing tells us that the first AGI (in fact I dislike the term, I prefer transformative AI) will be human-like. In all probability (considering 1) the vast space of possible "mind types", and 2) how an advanced computer will likely function much more similarly
We're moving towards factual disputes that aren't easy to resolve in logical space, and I fear any answers I give are mostly repeating previous statements. In general I hold that you're veering toward a maximally wrong position with completely disastrous results if implemented. With that said: I dispute this. Place an image of the status quo in the "good things" folder. Which you should absolutely not do because it's a terrible idea. This seems ridiculous to me as a concept. No, advanced AI will not function similarly to ancient long obsolete technology. I see way too much present bias in this stance, and worse, a bias towards things in the future being like things in the past, despite the past being long over since ages ago, this is like running space ships on slide rules. This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn't C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive. To reiterate, stop trying to explain the motion of planets and build a telescope. Note that, I do not desire that AI psychology be human like. That sounds like a bad idea. Who is this "we"? How will you go from a position of "we" in control, to "we" not in control? My expectation is that the first step is easy, and the second, impossible. Humans have certain powers and abilities as per human nature. Math isn't one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by "looking" at them. The art of "looking" at problems isn't easy to explain, unfortunately. Conversely, if I could explain it, I could also build AGI, or another human, right on the spot. It's that sort of question. To put it another way, using math to d
Ps: 2 very important things I forgot to touch. "This also implies that every trick you manage to come up with, as to how to get a C compiler adjacent superintelligence to act more human, is not going to work, because the other party isn't C compiler adjacent. Until we have a much better understanding of how to code efficiently, all such efforts are at best wasted, and likely counterproductive." Not necessarily. Even the first steps on older science were important to the science of today. Science happens through building blocks of paradigms. Plus, there are mathematical and logical notions which are simply fundamental and worth investigating, like decision theory. "Humans have certain powers and abilities as per human nature. Math isn't one of them. I state that trying to solve our problems with math is already a mistake, because we suck at math. What humans are good at is image recognition. We should solve our problems by "looking" at them." Ok, sorry, but here you just fall into plain absurdity. Of course it would be great just to look at things and "get" them! Unfortunately, the language of computers, and of most science, is math. Should we perhaps drop all math in physics and just start "looking" instead? Please don't actually say yes... (To clarify, I'm not devaluing the value of "looking", aka philosophy/rationality. Even in this specific problem of AI alignment. But to completely discard math is just absurd. Because, unfortunately, it's the only road towards certain problems (needless to say there would be no computers without math, for instance)).
I'm actually sympathetic towards the view that mathematically solving alignment might be simply impossible. I.e. it might be unsolvable. Such is the opinion of Roman Yalmpolsky, an AI alignment researcher, who has written very good papers on its defense. However, I don't think we lose much by having a couple hundred people working on it. We would only implement Friendly AI if we could mathematically prove it, so it's not like we'd just go with a half-baked idea and create hell on Earth instead of "just" a paperclipper. And it's not like Friendly AI is the only proposal in alignment either. People like Stuart Russell have a way more conservative approach, as in, "hey, maybe just don't build advanced AI as utility maximizers since that will invariably produce chaos?". Some of this concepts might even be dangerous, or worse than doing nothing. Anyway, they are still in research and nothing is proven. To not try to do anything is just not acceptable, because I don't think that the FIRST transformative/dangerous AI will be super virtuous. Maybe a very advanced AI would necessarily/logically be super virtuous. But we will build something dangerous before we get to that. Say, an AI that is only anything special in engineering, or even a specific type of engineering like nanotechnology. Such AI, which might even not be properly AGI, might already be extremely dangerous, for the obvious reason of having great power (from great intelligence in some key area(s)) without great values (orthogonality thesis). "Furthermore, implementation of AI regulation is much easier than its removal. I suspect that once you ban good men from building AI, it's over, we're done, that's it." Of course it wouldn't be just any kind of regulation. Say, if you restrict access/production to supercomputers globally, you effectively slow AI development. Supercomputers are possible to control, laptops obviously aren't. Or, like I also said, a narrow AI nanny. Are these and other similar measures dan

3 points:

  1. I don't know if there are superior entities to us playing these games, or if such memes are just natural collective tendencies. I don't think any of us know or can know, at least with current knowledge.

  2. I agree that aligning humanity is our only chance. Aligning AGI takes, in fact, superhuman technical ability, so that, considering current AGI timelines vs current technical alignment progress, I'd give a less than 1% probability that we make it on time. In fact some even say that technical alignment is impossible, just look at anything Yalmpo

... (read more)
I don't understand the uncertainty. What's there to know? Humans are natural collective tendencies of cells. Is an ant colony an entity? I think it's fair to say yes. It's not like there's some objective cosmic definition of an entity/agent. That's more like a human-mental way of interpreting a cluster of experiences. Entities are where we see them.   Not what I mean. Of course individuals can work toward something like this. Obviously individuals have to for the collective to solve it. Just like an ant colony can't collect food unless ants go searching for it. Although in practice most human efforts like this serve delusion. They're not nearly as helpful as they seem. They're usually anti-helpful.
"I don't understand the uncertainty. What's there to know?" The uncertainty is wether these memes are "alive" or not, as you claim in your post. You support the belief that they are. I'm just re-inforcing the fact that it can only be a belief with current knowledge. Maybe entity wasn't the best choice of a word by me, since after googling the definition it can refer to things without a will of their own / awareness / aliveness, like institutions. So, what I really meant was, whether these things are alive or not. By alive, I mean it in the animal sense, i.e. having awareness/sentience, just to be clear. "Although in practice most human efforts like this serve delusion. They're not nearly as helpful as they seem. They're usually anti-helpful." I disagree. If it wasn't for remarkable individuals (aka heroes) we wouldn't have half the social advances that we have today (humanitarian, technological, etc). Now, more than ever, it's time for heros. You might doubt the heroicness of sanguine revolutionaries, or world-dividing prophets (it's hard to weight the total result of their actions in bad vs good) but there's no doubt about the positive impact of peaceful activism, which happens to be the way forward here. (Of course some efforts turn out to be anti-helpful, but we'll never get any helpful ones if we don't try, and to think that our efforts are usually anti-helpful, aka most of them, is quite cynical specially considering peaceful activism AND probably even more importantly the stakes, which have never been this high.)
Alas, that's not clear to me. How do you know something has awareness/sentience? And in terms of egregores, what new thing would you learn in discovering they have awareness/sentience that you don't already know? And how would that discovery be relevant to what I discuss in the OP? Something doesn't need awareness or sentience to be an unFriendly superintelligence. That's core to the whole point of AI risk to begin with.   Most people aren't remarkable individuals in this sense. Most people's attempts to try come from the pain of stupefaction, not from the clarity of insight. I'm not doubting the relevance of heroes. I'm not even challenging whether most people could be heroes. I'm saying most people aren't, and their attempts to act heroically usually do more harm than good.
Alright... Then, again, just to be absolutely clear, let me pick a new word instead of alive: having agency. Your claim is that they are alive. And, fair enough, there are things that are alive that don't have awareness (the most primitive life forms). But, for something to be considered alive, it must at least have a will of its own! Do you agree? Therefore, it's impossible to know if these egregores have a will of their own (the way you seem to paint them, as Gods, definitely suggests even more than that, definitely suggests sentience as well, but let's forget that by now). They may simply be human tendencies. Tendencies don't have a will of their own, don't have agency. They are a result of something, a consequence of something, not something that can act by itself. That's all I'm trying to say. That's why I advocate a more of pragmatic approach. We should listen more to what we know for sure. Instead of trying to align ourselves with the egregore Gods of rationality as a primary focus, maybe our primary focus should consist more of real world actions. You can try to align yourself with the right egregore Gods as much as you want, but if you don't act in real-world ways, nothing will ever get accomplished, SPECIALLY in critical times like these. On heroes, not everyone needs to be one. Maybe for some people being aware is enough. Heroes themselves can do little without the help of aware masses. Again, if we don't strive for coordination in a real-world sense, with the right amount of heroes and aware masses, we won't achieve anything. We may fail, but it's our only chance, given, as a said, that these are critical times where time runs quite short. In other words: if you don't scare the hell out of people with the real possibilities of this, and at the same time build a way more cooperative and humanistic world community, there is no chance. Aligning one's self with the egregore God of rationality (aka taking care of one's own garden first) could perhaps be t

I think this is a useful abstraction.

But I think the word you're looking for is "god".  In the "Bicameral Consciousness" sense - these egregores you refer to are gods that speak to us, whose words we know.  There's another word, zeitgeist, that refers to something like the same thing.

If you look in your mind, you can find them; just look for what you think the gods would say, and they will say it.  Pick a topic you care about.  What would your enemy say about that topic?  There's a god, right there, speaking to you.

Mind, in a sense... (read more)