Review

This question is inspired by 1a3orn's comment on how there are troubling signs of epistemic issues in LW's Alignment field.

I'll quote the comment here to tell you what I mean:

I think that the above is also a good explanation for why many ML engineers working on AI or AGI don't see any particular reason to engage with or address arguments about high p(doom).

When from a distance one views a field that:

Has longstanding disagreements about basic matters

Has theories -- but many of the theories have not resulted in really any concrete predictions that differentiate from standard expectations, despite efforts to do so.

Will continue to exist regardless of how well you criticize any one part of it.

There's basically little reason to engage with it. These are all also evidence that there's something epistemically off with what is going on in the field.

Maybe this evidence is wrong! But I do think that it is evidence, and not-weak evidence, and it's very reasonable for a ML engineer to not deeply engage with arguments because of it.

So I want to ask a question: How seriously should we take the hypothesis that LW is totally wrong on AI?

Specifically, this splits into several subquestions:

  1. What's the chance that AI doesn't have that much of an impact on the world by 2100?

  2. What's the chance that we do have massive impacts, but alignment is so easy that standard ML techniques work?

  3. How well does the epistemic process on LW work? Are there any changes you would make to LW's epistemic processes?

I welcome all answers, and I especially welcome any critics of LW/negative answers to at least answer one of the questions I have.

Edit: For people that don't have a specific scenario in mind, I'll ask a specific question. It doesn't have to be answered, but any answers on this question are appreciated, especially from critics of the "AI is significant" idea.

1a. What probability will the Explosion or Collapse scenario from Cold Takes happen by 2100?

Link to the scenarios below:

https://www.cold-takes.com/this-cant-go-on/

New Answer
New Comment

3 Answers sorted by

jimrandomh

3924

What's the chance that AI doesn't have that much of an impact on the world by 2100?

Honestly, that one belongs in the settled-questions bin next to theism. Making intellectual progress requires having spaces where the basics can be taken for granted, for a definition of "the basics" that's for people trying to contribute at the intellectual frontier, rather than for the human population at large.

How well does the epistemic process on LW work? Are there any changes you would make to LW's epistemic processes?

This is never going to be perfect, anywhere, and people should always be on the lookout for epistemic problems. But there's a pretty strong outside-view reason to think LW's epistemics will outperform those of the rest of the world: it's full of people investing heavily in improving their epistemics, and having abstract discussions about them. 

What's the chance that we do have massive impacts, but alignment is so easy that standard ML techniques work?

I think this is the core question, but is a slightly incorrect framing. I also think this is the core point of disagreement between the AGI Ruin perspective and the AI Accelerationist perspective.

How hard alignment is, is a continuous variable, not a boolean. The edges of the range are "it's borderline impossible to solve before time runs out" and "it's trivial and will solve itself". The same applies to framing specific research as capabilities research or as alignment research: a lot of things live in the border research, where it makes more sense to think in terms of things having a ratio between those two.

I don't think the people leading and working in AGI research programs think alignment is easy. I do think that they think that it's easier, by a large enough amount to change their view of the cost-benefit of accelerating the timelines. And because this is a continuous variable with a lot of inputs, expanding it out doesn't yield a single large crux that distinguishes the two camps, but rather a large number of smaller, unshared differences in belief and intuition.

(I lean more towards the "it's hard" side, but am decidedly not on the edge of the scale; I think it's likely to be close enough that individual insights into alignment, and modest changes to research timelines, could potentially be decisive. I also think that my difficulty-estimation could move substantially in either direction without changing my beliefs about the correct course of action, due to a "playing to outs" argument.)

Honestly, that one belongs in the settled-questions bin next to theism. Making intellectual progress requires having spaces where the basics can be taken for granted, for a definition of "the basics" that's for people trying to contribute at the intellectual frontier, rather than for the human population at large.

Strong downvoted for tone on this: the reason it belongs in the settled bin is because it's really easy to answer the question. Simply, AI has already had an enormous impact, and more of the same would be pretty damn world-changing.

Agree voted.

226
  1. This probability is the probability of non AI apocalypse. (Large asteroid impacts, nuclear war, alien invasion, vacuum collapse, etc). Basically assuming nothing stopping humans from continuing to improve AI, the chance of "not much impact" is precisely 0. It's 0 because either it already had an impact or will in the very near future with just slight and obvious improvements to the AI systems that already exist. What sort of future history would have "no significant impact" and HOW? This is like asking after the first Trinity fission weapon test what the probability by 2022 there would be "no significant impact" from nuclear weapons. It's 0 - already the atmosphere of the earth was contaminated we just didn't know it.

  2. This is very possible. Complex deception and unstoppable plans to conquer the planet and so on require specific setups for the agent, like "long term reward". Actual models have myopia inherently, due to how they are trained and limitations on their computational resources. This means a "paperclip production agent" is probably more likely to spend all it's compute optimizing for small variables like air temperature differences and other parameters to accelerate the robots producing paperclips than to invest in a multi year plan to take over the planet that will let it tile the solar system in paperclip plants after it wins a world war.

  3. I think it isn't productive to say "let's not talk of how we would improve capabilities.". Modeling how future systems are likely to actually work helps to model how you might restrict their behavior effectively.

[-]lc815

What sort of future history would have "no significant impact" and HOW? This is like asking after the first Trinity fission weapon test what the probability by 2022 there would be "no significant impact" from nuclear weapons. It's 0 - already the atmosphere of the earth was contaminated we just didn't know it.

Zero is not a probability. What if Japan had surrendered before the weapons could be deployed, and the Manhattan project had never been completed? I could totally believe in a one in one hundred thousand probability that nuclear weapons just never saw proliferation, maybe more.

5[anonymous]
What if Japan had surrendered before the weapons could be deployed, and the Manhattan project had never been completed? I could totally believe in a one in one hundred thousand probability that nuclear weapons just never saw proliferation, maybe more Specifically, I am referring to after the Trinity test.  They had assembled a device and released about 25 kilotons.  I am claiming for AI, the "Trinity test" already happened - llms, game playing RL agents that beat humans, and all the other 2022 AI results show that larger yields are possible.  Trinity had "no direct significance" on the world in that it didn't blow up a city, and the weapon wasn't deployable on a missile, but it showed both were readily achievable ("make a reactor and produce plutonium or separate out u235") and fission yield was real.   In our world, we don't have AI better than humans at everything, and they aren't yet affecting real world products much, but I would argue that the RL agent results show that the equivalent to "fission yield", "superintelligence", is possible and also readily achievable.  (big neural network, big compute, big training data set = superintelligence) After the Trinity test, if Japan had surrendered, the information had already leaked, and motivated agents - the all the world powers - would have started power seeking to make nukes.  What sort of plausible history has them not doing it?  A worldwide agreement that they won't?  What happens when, in a nuke free world, one country attacks another.  Won't the target start rush developing fission weapons?  Won't the other powers learn of this and join in? It's unstable.  A world where all the powers agree not to build nukes is not a stable one, any pertubation will push it to history states closer to our real timeline. I would argue that such a world with agreements not to build AGI is similarly not stable.
3Celarix
I'd imagine Gerald's "probability 0" is something like Metaculus's "resolved as yes" - that is, the even in question has already happened.
2[anonymous]
Right. Because either you believe AI has already made an impact (it has, see recsys for production use of AI that matters) or it will imminently. The true probability when metaculus resolves as yes isn't actually zero but the chance you get forecaster credit if you are on the wrong side of the bet IS.
3Lone Pine
What if you just s/0/epsilon/g ?
2Noosphere89
I definitely agree that his confidence in the idea that AI is significant is unjustifiable, but 0 is a probability, it's just the extreme of improbability into impossibility. And that's coming from me, where I do believe that AI being significant has a pretty high probability.
3[anonymous]
Right. And I am saying it is impossible, except for the classes of scenarios I mentioned, due to the fact that transformative AI is an attractor state. There are many possible histories, and many possible algorithms humans could try, or current AI recursively self improving could try. But the optimization arrow is always in the direction of more powerful AI, and this is recursive. Given sufficient compute it's always the outcome. It's kinda like saying "the explosives on a fission bomb have detonated and the nuclear core is to design spec. What is the probability it doesn't detonate". Essentially 0. It's impossible. I will acknowledge there is actually a possibility that the physics work out where it fails to have any fission gain and stops, but it is probably so small it won't happen in the lifespan of the observable universe.
1[anonymous]
Can you explain why it's "unjustifiable"? What is a plausible future history, even a possible one, free of apocalypse, where humans plus existing AI systems fail to develop transformative systems by 2100.
1Noosphere89
I think that I don't have a plausible story, and I think very high 90%+ confidence in significant impact is reasonable. But the issue I have is roughly that probabilities of literally 100% or a little lower is unjustifiable due to the fact that we must always have some probability for (Our model is totally wrong.) I do think very high confidence is justifiable, though.
1[anonymous]
I accept that having some remaining probability mass for "unknown unknowns" is reasonable. And you can certainly talk about ideas that didn't work out even though they had advantages and existed 60 years ago. Jetpacks, that sort of thing. But if you do more than a cursory analysis you will see the gain from a jetpack is you save a bit of time at the risk of your life, absurd fuel consumption, high cost, and deafening noise to your neighbors. Gain isn't worth it. The potential gain of better AI unlocks most of the resources of the solar system (via automated machinery that can manufacture more automated machinery) and makes world conquest feasible. It's literally a "get the technology or lose" situation. All it takes is a belief that another power is close to having AI able to operate self replicating machinery and you either invest in the same tech or lose your entire country. Sort of how right now Google believes either they release a counter to BingGPT or lose their company. So yeah I don't see a justification for even 10 percent doubt.
1[anonymous]
0 is a perfectly valid probability estimate. Obviously the chance that an event observed to have not in fact happens is..ok fair. Maybe not zero. You could be mistaken about ground truth reality having happened. So for instance if AIs take over and wipe everyone's memories and put them in a simulation, the observed probability in 2100 is that AI didn't do anything.

If we have short term myopic misaligned AI is still misaligned. It looks like social media algorithms promoting clickbait, like self driving cars turning themselves off half a second before an inevitable crash. Like chatbot recommendation systems telling you what you want to hear, never mind if it's true. 

This is a world where AI is widely used, and is full of non-world-destroying bugs. Self driving cars have been patched and twiddled until they usually work. But on Tuesdays when the moon is waning, they will tend to move to change lanes the rightmost lane, and no one knows why. 

1[anonymous]
This is a suboptimal world that is also one where humans could survive.  You're describing dangers that are usually not worse than what we already live (and die) with.  (the SDCs can easily be measurably a lot better than human drivers and human pilots, even if they do have odd tendencies) Note the "SDC turning itself off half a second before a crash" is actually Working as Intended.  If a crash is detected as inevitable - because the high level policy failed to work - you want a microcontroller running plain C code to order maximum braking force.  Current systems more or less do work this way.
2Donald Hobson
I wasn't claiming this was a particularly bad world. I was just disagreeing with the idea that myopic AI=Aligned AI.  The turning itself off thing, I was thinking of the tetris bot that learned to pause the game.

I think that it is wrong. If instead of dropping nukes on mostly wooden cities they used them against enemy troops (or ships, or even cities which aren't built out of bamboo), the conclusion would be that a nuke is a not that powerful and cost-inefficient weapon.

As for "significant impact" - what impact counts as "significant"? Here are some technologies which on my opinion had no significant impact so far:

  • human spaceflight
  • superconductors
  • genetic engineering

It is totally possible that AI goes into the same bag.

1[anonymous]
human spaceflight - you're correct superconductors - there is a large amount of scientific and research equipment that relies on superconductors.  Simplest is NMR magnets.  Chemists would not be as productive, but you can argue they would have used alternative technologies not fully exploited in a universe without superconductivity but all the other exploitable natural phenomena we have.  So semi correct. genetic engineering - were you aware that most of the USA corn crop is GMO?  And other deliberate changes?  The mRNA vaccines were all based on it?   I suspect you meant to make narrower claims:   (1) power transmission and maglev transport superconductivity   (2) human genetic engineering I would agree completely with your narrower claims.   And would then ask for you to examine the "business case" in the era these things were first discovered.  Explain how : 1.  Human spaceflight 2. superconducting power transmission/maglev transport        3.   human genetic engineering Would ever, even at the heyday after discovery, provide ROI.  Think of ROI in terms of real resources and labor instead of money if it helps.  Assume the government is willing to loan limitless money at low interest if it helps, just it does have to produce ROI. Finally, look at the business case for AI. These are not the same classes of technology.  Human spaceflight has zero ROI.  Superconductors for the 2 purposes increase efficiency, but often only on the order of 10-20%, so unless the cost of energy saved > cost of equipment it has zero ROI.  And human genetic engineering takes too long to give any ROI, even with low interest rates, you have to wait basically for the edited humans to become adults and be productive, and you pay a human cost for all the failures that has enormous reputational costs to any institution. AI has explosive, self amplifying ROI that is a real business case for effectively "unfathomably large money/material resources".  This is with very conservative
1Lalartu
My claim is different - that there is no defined threshold for significance, but on the spectrum from useless to world-changing some technologies which looked very promising decades ago still lie closer to lower end. So it is possible that in 2053 AI products would be about as important as MRI scanners and GMO crops in 2023.
1[anonymous]
Ok. But how. GMO crops at their theoretical limit cannot fix carbon any faster than thermodynamics will allow. Given all the parts the edited genes spec for come from nature's codon space, this is what, 100 percent gain at the limit? So you might get double the growth rate, probably with tradeoffs that make the crop more fragile and more expensive to grow. MRI well, it lets you crudely see inside the human body in a different way the x-rays. It lets you watch helplessly as tumors kill someone- it provides no tooling to do anything about it. Presumably with the right dyes and alternate techniques like ct scanning you can learn about the same information. Please try to phrase how AI, with it's demonstrated abilities, lumps into the above. Does it not let you build self replicating robots? Why?
1Archimedes
"human genetic engineering" If you mean human genetic enhancement like designer babies, then sure. Not much impact because ethical concerns prevent it. However, the advent of tech like CRISPR allows for significant impact like gene therapy, though this is still an emerging field. (Just ask someone with sickle cell disease if they think a cure would be significant.)
1[anonymous]
Lalartu's claim was that the technology offered no major benefit so far. Note that treating a few people with severe genetic disease provides no ROI. This is because those people are rare (most will have died), and there is simply not the market to support the expensive effort to develop a treatment.  This is why gene therapy efforts are limited.
1Archimedes
Treating diseases isn't much of a positive feedback loop but claiming "no ROI" strikes me as extremely callous towards those afflicted. Maybe it doesn't affect enough people to be sufficiently "significant" in this context but it's certainly not zero return on investment unless reducing human suffering has no value.
1[anonymous]
I don't dispute this and there are publicly funded efforts that at a small scale, do help people where there isn't ROI.  A few people with blindness or paralysis have received brain implants.  A few people have received gene therapies.  But the overall thread is it significant.  Is the technology mainstream, with massive amounts of sales and R&D effort going into improving it?  Is it benefitting most living humans?  And the answer is no and no.  The brain implants and gene therapies are not very good: they are frankly crap, for the reason that there is not enough resources to make them better. And from a utilitarian perspective this is correct: in a world where you have very finite resources, most of those resources should be spent on activities that give ROI, as in more resources than you started with.  This may sound "callous" but having more resources allows more people to benefit overall from a general sense.   This is why AI and AGI is so different : it trivially gives ROI.  Just the current llms produce more value per dollar, on the subset of tasks they are able to do, than any educated human, even from the cheapest countries.  
1Noosphere89
Unfortunately, for our purposes it kinda is. There are 2 issues: 1. Most people don't have diseases that can be cured or prevented this way 2. CRISPR is actually quite limited, and in particular the requirement that it only affects your children basically makes it a dealbreaker for human genetic engineering, especially if you're trying to make superpowered people. Genetic engineering for humans needs to be both seriously better and they need to be able to edit the somatic cells as well as the gametes cells, or it doesn't matter.

thefirechair

16-20

Imagine LessWrong started with an obsessive focus on the dangers of time-travel. 

Because the writers are persuasive there are all kinds of posts filled with references that are indeed very persuasive regarding the idea that time-travel is ridiculously dangerous, will wipe out all human life and we must make all attempts to stop time-travel.

So we see some new quantum entanglement experiment treated with a kind of horror. People would breathlessly "update their horizon" like this matters at all. Physicists completing certain problems or working in certain areas would be mentioned by name and some people would try to reach out to them to convince them how dangerous time-travel and what they're doing is.

Meanwhile, from someone not taken in by very persuasive writing, vast holes are blindingly obvious. When those vast holes are discussed... well, they're not discussed. They get nil traction, are ignored, aren't treated with any seriousness. 

Examples of magical thinking (they're going to find unobtainium and that'll be it, they'll have a working time-machine within five years) are rife but rarely challenged.

I view a lot of LessWrong like this. 

I'll provide two examples.

  1. AI will improve itself very quickly, becoming the most intelligent being that can exist and then will have the power to wipe humans out.
  2. AI will be able to make massive technological jumps, here come nanites, bye humans

For 1 - we don't have any examples of this in nature. We have evolution over enormous timelines which has eventually produced intelligence in humans and varying degrees of it in other species. We don't have any strong examples of computers improving code which in turn improves code which in turn improves code. ChatGPT for all the amazing things it can do -- okay, so here's the source code for Winzip, make compression better. I do agree "this slow thing but done faster" is possible but it is an extraordinarily weak claim that self-improvement can exist at all. Just because learning exists, does not mean fundamental architecture upgrades can be made self-recursively. 

For 2 - AI seems to always be given near godlike magical powers. It will be able to "hack" any computer system. Oh, so it worked out how to break all cryptography? It will be able to take over manufacturing to make things to kill people? How exactly? It'll be able to work up a virus to kill all humans and then hire some lab to make it... are we really sure about this? 

I wrote about the "reality of the real world" recently. So many technologies and processes aren't written down. They're stored in meat minds, not in patents, and embodied in plant equipment and vast, intricate supply chains. Just trying to take over Taiwan chip manufacturing would be near impossible because they're so far out on the the cutting edge they jealously guard their processes.

I love sci-fi but there are more than a few posts that are pretty close to sci-fi fan fiction than actual real problems.

The risk of humans using ChatGPT and so on to distort narratives, destroy opponents, screw with political processes and so on seems vastly more deadly and serious than an AI will self-improve and kill us all. 

Going back to the idea of LessWrong obsessed with time-travel - what would you think of such a place? It would have all the predictions, and persuasive posts, and people very dedicated to it... and they could all just be wrong. 

For what it's worth, I strongly support the premise that anything possible in nature is possible for humans to replicate with technology. X-rays exist, we learn how to make and use them. Fusion exists, we will learn how to make fusion. Intelligence/sentience/sapience exists - we will learn how to do this. But I rarely see anyone touch on the idea of "what if we only make something as smart as us?"

For 1 - we don’t have any examples of this in nature.

We don't have any examples of steam engines, supersonic aircraft or transistors in nature either. Saying that something can't happen because it hasn't evolved in nature is an extraordinarily poor argument.

-1thefirechair
We do have examples of these things in nature, in degrees. Like flowers turning to the sun because they contain light-sensing cells. Thus, it exists in nature and we eventually replicate it. Steam engines is just energy transfer and use, and that exists. So does flying fast.  Something not in nature (as far as we can tell) is teleportation. Living inside a star.  I don't mean specific narrow examples in nature. I mean the broader idea.  So I can see intelligence evolving over enormous time-frames, and learning exists, so I do concur we can speed up learning and replicate it... but the underlying idea of a being modifying itself? Nowhere in nature. No examples anywhere on any level. 
7quanticle
Any form of learning is a being modifying itself. How else would learning occur?
1thefirechair
You have no control down on the cellular level over your body. No deliberate conscious control. No being does. This is what I mean by does not exist in nature. Like teleportation.
2Richard_Kennaway
If I do weight training, my muscles get bigger and stronger. If I take a painkiller, a toothache is reduced in severity. A vaccination gives me better resistance to some disease. All of these are myself modifying myself. Everything you have written on this subject seems to be based on superficial appearances and analogies, with no contact with the deep structure of things.
-2thefirechair
You have no atomic level control over that. You can't grow a cell at will or kill one or release a hormone. This is what I'm referring to. No being that exists has this level of control. We all operate far above the physical reality of our bodies. But we suggest an AI will have atomic control. Or that code control is the same as control. Total control would be you sitting there directing cells to grow or die or change at will. No AI will be there modifying the circuitry it runs on down at the atomic level.
2Raemon
Quick very off the cuff mod note: I haven't actually looked into the details of this thread and don't have time today, but skimming it it looks like it's maybe spiralling into a Demon Thread and it might be good for people to slow down and think more about what their goals are. (If everyone involved is actually just having fun hashing an idea out, sorry for my barging in)
1green_leaf
Your argument is fundamentally broken, because nature only contains things that happen to biologically evolve, so it first has to be the result of the specific algorithm (evolution) and also the result of a random roll of a dice (the random part of it). Even if there were no self-modifying beings in nature (humans do self-modify) or self-modifying AI, it would still be prima facie possible for it to exist because all it means it is for the being to turn its optimization power at itself (this is prima facie possible, since the being is a part of the environment). So instead of trying to think of an argument about why something that already exists is impossible, you should've simply considered the general principle.
-1thefirechair
No being has cellular level control. Can't direct brain cells to grow or hormones to release etc. This is what I mean by it does not exist in nature. There is no self modification that is being propagated that AI will have. Teleportation doesn't exist so we shouldn't make arguments where teleportation is part of it.
4green_leaf
Humans can already do that, albeit indirectly. Once again, you're "explaining" why something that already exists is impossible. It's sufficient for a self-modifying superhuman AI that it can do that indirectly (for it to be self-modifying), but self-modification of the source code is even easier than manipulation on the level of individual molecules.

1) True, we don't have any examples of this in nature. Would we expect them?  

Lets say that to improve something, it is necessary and sufficient to understand it and have some means to modify it. Plenty of examples, most of the complicated ones are with humans understanding some technology and designing a better version.  

At the moment, the only minds able to understand complicated things are humans, and we haven't got much human self improvement because neuroscience is hard. 

I think it is fairly clear that there is a large in practice gap b... (read more)

But I rarely see anyone touch on the idea of "what if we only make something as smart as us?"

 

But why would intelligence reach human level and then halt there? There's no reason to think there's some kind of barrier or upper limit at that exact point.

Even in the weird case where that were true, aren't computers going to carry on getting faster? Just running a human level AI on a very powerful computer would be a way of creating a human scientist that can think at 1000x speed, create duplicates of itself, modify it's own brain. That's already a superintelligence isn't it?  

-2thefirechair
The assumption there is that the faste the hardware underneath, the faster the sentience running on it will be. But this isn't supported by evidence. We haven't produced a sentient AI to know whether this is true or not. For all we know, there may be a upper limit to "thinking" based on neural propagation of information. To understand and integrate a concept requires change and that change may move slowly across the mind and underlying hardware. Humans have sleep for example to help us learn and retain information. As for self modification - we don't have atomic level control over the meat we run on. A program or model doesn't have atomic level control over its hardware. It can't move an atom at will in its underlying circuitry to speed up processing for example. This level of control does not exist in nature in any way. We don't know so many things. For example, what if consciousness requires meat? That it is physically impossible on anything other than meat? We just assume it's possible using metal and silica.

A helpful way of thinking about 2 is imagining something less intelligent than humans trying to predict how humans will overpower it.

You could imagine a gorilla thinking "there's no way a human could overpower us. I would just punch it if it came into my territory." 

The actual way a human would overpower it is literally impossible for the gorilla to understand (invent writing, build a global economy, invent chemistry, build a tranquilizer dart gun...)

The AI in the AI takeover scenario is that jump of intelligence and creativity above us. There's literally no way a puny human brain could predict what tactics it would use. I'd imagine it almost definitely involves inventing new branches of science.

-2thefirechair
I'd suggest there may be an upper bound to intelligence because intelligence is bound by time and any AI lives in time like us. They can't gather information from the environment any faster. They cannot automatically gather all the right information. They cannot know what they do not know. The system of information, brain propagation, cellular change runs at a certain speed for us. We cannot know if it is even possible to run faster. One of the magical thinking criticisms I have of AI is that it suddenly is virtually omniscient. Is that AI observing mold cultures and about to discover penicillin? Is it doing some extremely narrow gut bateria experiment to reveal the source of some disease? No it's not. Because there are infinite experiments to run. It cannot know what it does not know. Some things are Petri dishes and long periods of time in the physical world and require a level of observation the AI may not possess.
4Archimedes
Yes, physical constraints do impose an upper bound. However, I would be shocked if human-level intelligence were anywhere close to that upper bound. The James Webb Space Telescope has an upper bound on the level of detail it can see based on things like available photons and diffraction but it's way beyond what we can detect with the naked human eye.
[+][comment deleted]10
[-][anonymous]1-14

This is approximately my experience of this place.  

That, and the apparent runaway cult generation machine that seems to have started.

Seriously, it is apparent that over the last few years the mental health of people involved with this space has collapsed and started producing multiple outright cults. People should stay out of this fundamentally broken epistemic environment. I come closer to expecting a Heaven's Gate event every week when I learn about more utter insanity. 

-3thefirechair
I agree. When you look up criticism of LessWrong you find plenty of very clear, pointed, and largely correct criticisms.  I used time-travel as my example because I didn't want to upset people but really any in-group/out-group forum holding some wild ideas would have sufficed. This isn't at Flat Earther levels yet but it's easy to see the similarities.  There's the unspoken things you must not say otherwise you'll be pummeled, ignored or fought. Blatantly obvious vast holes are routinely ignored. A downvote mechanism works to push comments down.  Talking about these problems just invites people in the problems to attempt to draw you in with the flawed arguments.  Saying, hey, take three big steps back from the picture and look again doesn't get anywhere. Some of the posts I've seen on here are some sort of weird doom cosplay. A person being too scared to criticize Bing Chatgpt? Seriously? That can't be real. It reminds me of the play-along posts I've seen in antivaxxer communities in a way. The idea of "hey, maybe you're just totally wrong" isn't super useful to move anything but it seems obvious that fan-fiction of nanites and other super techs that exist only in stories could probably be banned and this would improve things a lot.  But beyond that, I'm not certain this place can be saved or eventually be useful. Setting up a place proclaiming it's about rationality is interesting and can be good but it also implicitly states that those who don't share your view are irrational, and wrong. As the group-think develops any voice not in line is pushed out all the ways they can be pushed out and there's never a make-or-break moment where people stand up and state outright that certain topics/claims are no longer permitted (like nanites killing us all). The OP may be a canary, making a comment but none of the responses here produced a solution or even a path. I'd suggest one: you can't write nanite until we make nanites. Let's start with that. 
8Daniel Kokotajlo
If you link me to 1-3 criticisms which you think are clear, pointed, and largely correct, I'll go give them a skim at least. I'm curious. You are under no obligation to do this but if you do I'll appreciate it.
-14thefirechair
17 comments, sorted by Click to highlight new comments since:

the chance that [...] alignment is so easy that standard ML techniques work

I think this is probably true for LLM AGIs at least in the no-extinction sense, but has essentially no bearing on transitive AI risk (danger of AI tech that comes after first AGIs, developed by them or their successors). Consequently P(extinction) by 2100 only improves through alignment of first AGIs if they manage to set up reliable extinction risk governance, otherwise they are just going to build some more AGIs that don't have the unusual property of being aligned by default.

And there is no indication that LLM AGIs would be in a much better position to delay AGI capability research until alignment theory makes it safe than we are, though the world order disruption from change in serial speed of thought probably gives them a chance to set this up.

[-]Signer2-11

Presumably we will build ML AGIs because they are safe and they won't build unsafe non-ML AGI for the same reason we didn't - because it wouldn't be safe. So the idea is that alignment is so easy it actually transitive.

Presumably we will build ML AGIs because they are safe

I don't see anything in the structure of humanity's AGI-development process that would ensure this property. LLM human imitations are only plausibly aligned because they are imitations of humans. There are other active lines of research vying with them for the first AGI, with no hope for their safety.

For the moment, LLM characters have the capability advantage of wielding human faculties, not needing to reinvent alternatives for them from scratch. This is an advantage for crossing the AGI threshold, which humans already crossed, but not for improving further than that. There is nothing in this story that predicates the outcome on safety.

I'm not sure, but Nate's recent post updated me towards this opinion significantly in many ways. I still think there's significant risk, but I trust the cultural ensemble a lot more after reading nate's post.

There are a lot of highly respected researchers who have similar opinions, though.

and it's not like machine learning has consensus on much in the domain of speculative predictions, even ones by highly skilled researchers with track records are doubted by significant portions of the field.

science is hard yo.

I will say, people who think the rationality sphere has bad epistemics, very fair, but people who think the rationality sphere on less wrong has bad epistemics, come fight me on less wrong! let's argue about it! people here might not change their minds as well as they think they do, but the software is much better for intense discussions than most other places I've found.

I think the lesswrong community is wrong about x-risk and many of the problems about ai, and I've got a draft longform with concrete claims that I'm working on...

But I'm sure it'll be downvoted because the bet has goalpost-moving baked in, and lots of goddamn swearing, so that makes me hesitant to post it.

if you think it's low quality, post it, and warn that you think it might be low quality, but like, maybe in less self-dismissive phrasing than "I'm sure it'll be downvoted". I sometimes post "I understand if this gets downvoted - I'm not sure how high quality it is" types of comments. I don't think those are weird or bad, just try to be honest in both directions, don't diss yourself unnecessarily.

And anyway, this community is a lot more diverse than you think. it's the rationalist ai doomers who are rationalist ai doomers - not the entire lesswrong alignment community. Those who are paying attention to the research and making headway on the problem, eg wentworth, seem considerably more optimistic. The alarmists have done a good job being alarmists, but there's only so much being an alarmist to do before you need to come back down to being uncertain and try to figure out what's actually true, and I'm not impressed with MIRI lately at all.

[-]gjm43

"the bet" -- what bet?

A word of advice: don't post any version of it that says "I'm sure this will be downvoted". Saying that sort of thing is a reliable enough signal of low quality that if your post is actually good then it will get a worse reception than it deserves because of it.

don't post any version of it that says "I'm sure this will be downvoted"

For sure. The actual post I make will not demonstrate my personal insecurities.

what bet?

I will propose a broad test/bet that will shed light on my claims or give some places to examine.

Good news is that it mostly doesn't matter for the question of what should be done - even if doom scenarios are unlikely, researchers definitely don't have enough certainty to justly ignore them and continue developing ML.

Why is this being downvoted?

From what I am seeing people here are focusing way too much on having a precisely calibrated P(doom) value.

It seems that even if P(doom) is 1% the doom scenario should be taken very seriously and alignment research pursued to the furthest extent possible.

The probability that after much careful calibration and research you would come up with a P(doom) value less than 1% seems very unlikely to me. So why invest time into refining your estimate?

because it fails to engage with the key point: that the low predictiveness of the dynamics of ai risk makes it hard for people to believe there's a significant risk at all. I happen to think there is; that's why I clicked agree vote. but I clicked karma downvote because of failing to engage with the key epistemic issue at hand.

I find Eliezer and Nates' arguments compelling but I do downgrade my p(doom) somewhat (-30% maybe?) because there are intelligent people (inside and outside of LW/EA) who disagree with them.

I had some issues with the quote

Will continue to exist regardless of how well you criticize any one part of it.

I'd say LW folk are unusually open to criticism. I think if there were strong arguments they really would change people's minds here. And especially arguments that focus on one small part at a time.

But have there been strong arguments? I'd love to read them.

 

There's basically little reason to engage with it. These are all also evidence that there's something epistemically off with what is going on in the field.

For me the most convincing evidence that LW is doing something right epistemically is how they did better than basically everyone on Covid. Granted that's not the alignment forum but it was some of the same people and the same weird epistemic culture at work.

There are intelligent people who disagree, but I was under the impression there was a shortage of intelligent disagreement. Most of the smart disagreement sounds like smart people who haven't thought in great depth about AI risk in particular, and are often shooting down garbled misunderstandings of the case for AI risk.

I think that's true of people like: Steven Pinker and Neil deGrasse Tyson. They're intelligent but clearly haven't engaged with the core arguments because they're saying stuff like "just unplug it" and "why would it be evil?"

But there's also people like...

Robin Hanson. I don't really agree with him but he is engaging with the AI risk arguments, has thought about it a lot and is a clever guy.

Will MacAskill. One of the most thoughtful thinkers I know of, who I'm pretty confident will have engaged seriously with the AI Risk arguments. His p(doom) is far lower than Eliezer's. I think he says 3% in What We Owe The Future.

Other AI Alignment experts who are optimistic about our chances of solving alignment and put p(doom) lower (I don't know enough about the field to name people.)

And I guess I am reserving some small amount of probability for "most of the world's most intelligent computer scientists, physicists, mathematicians aren't worried about AI Risk, could I be missing something?" My intuitions from playing around on prediction markets is you have to adjust your bets slightly for those kind of considerations.

Robin Hanson is weird. He paints a picture of a grim future where all nice human values are eroded away, replaced with endless frontier replicators optimized and optimizing only for more replication. And then he just accepts it as if that was fine. 

Will Macaskill seems to think AI risk is real. He just thinks alignment is easy. He has a specific proposal involving making anthropomorphic AI and raising it like a human child that he seems keen on.

I just posted a detailed explanation of why I am very skeptical of the traditional deceptive alignment story. I'd love to hear what you think of it! 

Deceptive Alignment Skepticism - LessWrong