My AI Vibes are Shifting

What do you think I am wrong about here? What considerations am I missing? What should I focus more attention on?

I mean... I really hope this isn't the list of considerations you consider most load-bearing for your estimates of AI risk. The biggest determinants are of course how hard it is to control system much smarter than us, and whether that will happen any time soon. It seems like the answer to the first one is quite solidly "very hard" (though not like arbitrarily hard) and the answer to the second one seems very likely "yes, quite soon, sometime in the coming decades, unless something else seriously derails humanity".

After that, the questions are "can humanity coordinate to have more time, or somehow get much better at controlling systems smarter than us?". The answer to the former is "probably not, but it's really hard to say" and the answer to the latter is IMO "very likely not, but it's still worth a shot".

Another relevant crux/consideration would be: "will it turn out to be the case that controlling AI systems scales very smoothly, i.e. you can use a slightly superhuman system to control an even smarter system, etc. in an inductive fashion?"

My current answer to that is "no", and it seems quite hard to learn more before it becomes a load-bearing guess. Digging into that requires building pretty detailed models.

I think you basically don't talk about any of these in your post above? I also don't know where your 5% comes from, so I don't really know how to interface with it. Like do you actually mean that with 95% probability we will end up with an aligned superintelligence that will allow us to genuinely capture most of our cosmic endowment? If so, I have no idea why you believe that. Maybe you mean that with 90% probability AI will not really matter and never reach superhuman capabilities? That seems very unlikely to me, but seems fine to write about.

[-]J Bostock2mo92

I would be interested to know how you think things are going to go in the 95-99% of non-doom worlds. Do you expect AI to look like "ChatGPT but bigger, broader, and better" in the sense of being mostly abstracted and boxed away into individual usage cases/situations? Do you expect AIs to be ~100% in command but just basically aligned and helpful?

[-]Nathan Young2mo0-7

These are vibes, not predictions.

But in the other worlds I expect governance to sit between many different AI actors and ensure that no single actor controls everything. And then to tax them to pay for this function.

Why doesn't SpaceX run a country?

[-]habryka2mo109

But in the other worlds I expect governance to sit between many different AI actors and ensure that no single actor controls everything. And then to tax them to pay for this function.

I mean... this still sounds like total human disempowerment to me? Just because the world is split up between 5 different AI systems doesn't mean anything good is happening? What does "a single actor controls everything" have to do with AI existential risk? You can just have 4 or 40 or 40 billion AI systems control everything and this is just the same.

[-]Nathan Young1mo30

Does the current world look like total human disempowerment to you? Currently it's split between like 1000 large companies?

[-]habryka1mo411

Those companies are run by humans, so no, of course the world does not look like total human disempowerment to me?

If practically all of the world's governments and corporations were run by AIs... well, then I expect we would be dead, but if for some reason we were not, it seems very likely that yes, that would constitute total human disempowerment.

[-]the gears to ascension1mo60

Also, those companies are not controlling, for example, what I write in this comment, or what room I go into next.

[-]J Bostock2mo42

SpaceX doesn't run a country because rockets+rocket building engineers+money cannot perform all the functions of labour, capital, and government and there's no smooth pathway to them expanding that far. Increasing company scale is costly and often decreases efficiency; since they don't have a monopoly on force, they have to maintain cost efficiency and can't expand into all the functions of government.

An AGI has the important properties of labour and capital and government (i.e. no "Lump of Labour" so it does 't devalue the more of it there is, but it can be produced at scale by more labour, but also it can organize itself without external coordination or limitations). I expect any AGI which has these properties to very rapidly outscale all humans, regardless of starting conditions, since the AGI won't suffer from the same inefficiencies of scale or shortages of staff.

I don't expect AGIs to respect human laws and tax codes once they have the capability to just kill us.

[-]Nathan Young2mo20

That seems more probably in a world where AI companies can bring all the required tools in house. But what if they have large supply chains for minerals and robotics and renting factory space and employign contractors to do the .0001% of work they can't.

At that point I still expect it to be hard for them to control bits of land without being governed, which I expect to be good for AI risk.

[-]J Bostock2mo20

I think that AI companies being governed (in general) is marginally better than them not being governed at all, but I also expect that the AI governance that occurs will look more like "AI companies have to pay X tax and heed Y planning system" which still leads to AI(s) eating ~100% of the economy, while not being aligned to human values, and then the first coalition (which might be a singleton AI, or might not be) which is capable of killing off the rest and advancing its own aims will just do that, regulations be damned. I don't expect that humans will be part of the winning coalition that gets a stake in the future.

[-]Hastings2mo31

This seems a little bit like a homunculus sitting behind the eyes- the governance makes the AIs aligned and helpful, but why is the governance basically aligned and helpful? I am particularly concerned about the permanent loss of labor strikes and open rebellion as negotiation options for the non-governance people.

[-]Nathan Young2mo2-23

Do you think governance is currently misaligned. It seems fine to me?

[-]Hastings2mo23

I think current governments are kept in check, which scales differently than being aligned when the capabilities of the government are increased.

[-]the gears to ascension2mo00

How do you explain the news? Why do MM predictors keep missing negative surprises there?

[-]Nathan Young1mo2-2

I said "fine" not good. I think it's been a steady upward trend on everything but animal welfare (and AI but that's currently what we are discussing)

[-]MichaelLowe2mo51

Thanks for publishing this!

My main disagreement is about a missing consideration: Shrinking time to get alignment right. Despite us finding out that frontier models are less misaligned by default than ^[1]most here would have predicted, the bigger problem to me is that we have made only barely progress about crossing the remaining alignment gap. As a concrete example: LLMs will in conversation display a great understanding and agreement with human values, but in agentic settings (Claude 4 system card examples of blackmail) act quite differently. More importantly on the research side: to my knowledge, there has neither been a recognized breakthrough nor generally recognized smooth progress towards actually getting values into LLMs.

Similarly, at least for me a top consideration that AFAICT is not in your list: the geopolitical move towards right-wing populism (particularly in the USA) seems to reduce the chances of sensible governance quite severely.

Less risk. AI is progressing fast, but there is still a huge amount of ground to cover. Median AGI timeline vibes seem to be moving backwards. This increases the chance of a substantial time for regulation while AI grows. It decreases the chance that AI will just be 50% of the economy before governance gets its shoes on.

This seems basically true to me if we are comparing against early 2025 vibes, but not against e.g. 2023 vibes ("I think vibes-wise I am a bit less worried about AI than I was a couple of years ago"). Hard to provide evidence for this, but I'd gesture at the relatively smooth progress between the release of ChatGPT and now, which I'd summarize as "AI is not hitting a wall, at the very most a little speedbump".

Less risk. AI revenue seems more spread.

This is an interesting angle, and feels important. The baseline prior should imo be: governing more entities with near 100% effectiveness is harder than governing fewer. While I agree that conditional on having lots of companies it is likelier that some governance structure exists, it seems that the primary question is whether we get a close to zero miss rate for "deploying dangerous AGI". And that seems much harder to do when you have 20 to 30 companies that are in a race dynamic, rather than 3. Having said that, I agree with your other point about AI infrastructure becoming really expensive and that the exact implications are poorly understood.

^{^}
I think about two/thirds of this perceived effect are due to LLMs not having much goals at all rather than them having human compatible goals.

[-]StanislavKrym2mo10

As for shrinking time to get alignment right, my worse-case scenario is that someone commits a breakthrough in AGI capabilities research and the breakthrough is algorithmic, not achieved by concentrating the resources, as the AI-2027 forecast assumes.

However, even this case can provide a bit of hope. Recall that GPT-3 was trained by using just about 3e23 FLOP and ~300B tokens. If it was OpenBrain who trained a thousand of GPT-3-scaled models with the breakthrough by using different parts of training data, then they might even be able to run a Cannell-like experiment and determine models' true goals, alignment or misalignment...

[-]AnthonyC2mo30

AI infrastructure seems really expensive. I need to actually do the math here (and I haven’t! hence this is uncertain) but do we really expect growth on trend given the cost of this buildout in both chips and energy? Can someone really careful please look at this?

This is not a really careful look, but: The world has managed extremely fast (well, trains and highways fast, not FOOM-fast) large-scale transformations of the planet before. Mostly this requires that 1) the cost is worth the benefit to those spending and 2) we get out of our own way and let it happen. I don't think money or fundamental feasibility will be the limiters here.

Also, consider that training is now, or is becoming, a minority of compute. More and more is going towards inference - aka that which generates revenue. If building inference compute is profitable and becoming more profitable, then it doesn't really matter how little of the value is captured by the likes of OpenAI. It's worth building, so it'll get built. And some of it will go towards training and research, in ever-increasing absolute amounts.

Even if many of the companies building data centers die out because of a slump of some kind, the data centers themselves, and the energy to power them, will still exist. Plausibly the second buyers then get the infrastructural benefits at a much lower price - kinda like the fiber optic buildout of the 1990s and early 2000s. AKA "AI slump wipes out the leaders" might mean "all of a sudden there's huge amounts of compute available at much lower cost."

[-]Nathan Young2mo40

I think this is a question on which we should spend lots of time actually thinking and writing. I'm not sure my approximations will be good at guessing the final result.

[-]the gears to ascension2mo20

Please give takes on

https://www.lesswrong.com/posts/evYne4Xx7L9J96BHW/video-and-transcript-of-talk-on-can-goodness-compete

https://www.lesswrong.com/posts/4hCca952hGKH8Bynt/nina-panickssery-s-shortform?commentId=quPNTp46CRMMJoamB (my comment)

[-]the gears to ascension2mo20

Action conditional or action unconditional? If people update "my world saving plan might work if I keep going hard" from this, does that change your view? What about if this makes alignment-concerned people stand down, how does that change your view?

[-]Nathan Young2mo12

I see I have 4 votes, with neutral karma overall. I should hope that the downvotes thought this wasn't worth reading, as opposed to that they disagreed.

[-]Seth Herd2mo*2018

Since you imply you want feedback, I'll give it.

First, to answer your question: the big thing you're missing are the technical arguments for the difficulty of alignment. That's where most of the variance lies. All of the factors you list are small potatoes if it just turns out that alignment is quite hard.

The other big factor is the overall societal dynamics. You casually mention"then we shouldn't build it". The big question is if we SHOULD stop, COULD we stop? I think there's only maybe a 10% chance we could if we should. The incentives are just too strong and the coordination problem is unsolved. And the people in power on this are looking quite incompetent to deal with it. They'll probably start taking it seriously at some point, but whether that's soon enough to help is a big question. You could look at my Whether governments will control AGI is important and neglected if you wanted a little more of my logic on that.

It's a problem for logic based on careful observations and data. Vibes-based arguments are actively unhelpful. It's been fine to just guess at most other stuff in history, but not this.

I'm being kind of judgmental because I think having good estimates of AGI risk is quite important for our collective odds of survival and/or flourishing And LW is, among other things, the best place on earth to get good guesses on that.. I apologize for being a little harsh.

If you titled this "some factors maybe in AI risk" or "some changes that have shifted my p(doom)" or something and left out the p(doom) I'd have upvoted because you have some interesting observations.

As it is, I did very much think this is anti-worth reading in the context of LW. I couldn't decide between normal and big downvotes.

I think this is polluting the community epistemics by making predictions based on vibes. Then you deny they're predictions in the comments. P(doom) is very much a prediction and you shouldn't make it publicly if you aren't really trying. Or maybe that's what twitter or reddit are for?

It would be pretty rare that 30 minutes of work would be worth reading on LW compared to the many excellent high effort posts here. I do appreciate you putting that caveat. The exception would be an expert or unique insight or unique idea. Just voicing non-expert quick takes isn't really what LW is for IMO. At most it would be a quick take.

Or to put it this way: LW is for the opposite of vibes-based opinions.

[-]Nathan Young2mo50

I think considerations are in important input into decision making and if you downvote anyone who writes clear considerations without conforming to your extremely high standards then you will tax disagreement.

Perhaps you are very confident that you are only taxing bad takes and not just contrary ones, but I am not as confident as you are.

Overall, I think this is poor behaviour from a truth-seeking community. I don't expect every critic to be complimented to high heaven (as sometimes happens on the EA forum) but I think that this seems like a bad equilibrium for a post that is (in my view) fine and presented in the way this community requests (transparent and with a list of considerations).

As for the title:

If you titled this "some factors maybe in AI risk" or "some factors changes that have shifted my p(doom)" or something and left out the p(doom) I'd have upvoted because you have some interesting observations.

This is particular seems like a dodge. The actual title "My AI Vibes are Shifting" is hardly confident or declarative. Are you sure you would actully upvote if I had titled as you suggest?

[-]Seth Herd2mo52

I went back and reread it. Because you did mark that p(doom) as vibes based and Said you weren't making strong predictions near the top, I removed my small downvote.

I said I'd have upvoted if you removed the prediction. The prediction is the problem here because it is appears to be based on bad logic - vibes instead of gears.

I have never downvoted something that disagrees with my stance if it tries to come to grips with the central problem of how difficult alignment is.

Nor have I downvoted pieces that scope to address only part of the pmquestion and don't make a P(doom) prediction.

I have frequently complained on new authors' behalf that the LW community has downvoted unfairly.

[-]the gears to ascension2mo40

Or to put it this way: LW is for the opposite of vibes-based opinions.

I think vibes which are actually gestalts of seeing a lot of mechanisms are potentially okay, but then I expect to see the vibes be responsive to evidence. Predictors who consistently get things right are likely to be pretty good at getting an accurate vibe, but then in order to export their view to others, I want to see them disassemble their vibe into parts. A major flaw in prediction markets in that they don't demand you share the reasoning, or even have any particular reasoning. They allow being right for arbitrarily wrong reasons, which generalizes poorly.

[-]MichaelLowe2mo63

I upvoted. While I disagree with most of the reasoning, it seems relatively clear to me that going against community opinion is the main reason for the downvotes. Consider this: If an author well known for his work in forecasting had pregresitered that he was going to write a bunch of not fully fleshed out arguments in favor or against updating in a particular direction, most people here would be encouraging of publishing it. I dont think there has ever been a consistent standard for "only publish highly thought out arguments" here, and we should not engage in isolated demands for rigor here, even if the topic is somewhat dicey.

[-]StanislavKrym2mo12

do we really expect growth on trend given the cost of this buildout in both chips and energy?

What I expect is another series of algorithmic breakthroughs (e.g. neuralese) which rapidly increases the AIs' capabilities if not outright FOOMs them into the ASI. These breakthroughs would likely make mankind obsolete.

[-]Nathan Young2mo30

When do you expect this to happen by?

[-]StanislavKrym2mo30

I don't know. As I discussed with Kokotajlo, he recently claimed that "we should have some credence on new breakthroughs e.g. neuralese, online learning, whatever. Maybe like 8%/yr?", but I doubt that it will be 8%/year. Denote the probability that the breakthrough wasn't discovered as of time t by . Then one of the models is $d P / d t = - P N c,$ where N is the effective progress rate. This rate is likely proportional to the amount of researchers hired and to progress multipliers, since new architectures and training methods can be cheaply tested (e.g. on GPT-2 or GPT-3), but need the ideas and coding.

The number of researchers and coders was estimated in the AI-2027 security forecast to increase exponentially until the intelligence explosion (which the scenario's authors assumed to start in March 2027 with superhuman coders). What I don't understand how to estimate is the constant c which symbolises the difficulty^[1] of discovering the breakthrough. If, say, c was 200 per million of human-years, then 5K human years would likely be enough and the explosion would likely start in 3 years. Hell, if c was 8%/yr in a company with 1K humans, then the company would need to have 12.5K human-years, shifting the timelines to at most 5-6 years from Dec 2024...

EDIT: Kokotajlo promised to write a blog post with a detailed explanation of the models.

^{^}
The worse-case scenario is that diffusion models are already a breakthrough.

[-]Seth Herd2mo40

You estimate c by looking at how many breakthroughs we've had in AI per person year so far. That's where the 8% per year comes from. It seems low to me with the large influx of people working on AI, but I'm sure Daniel's math makes sense given his estimate of breakthroughs to date

^{^}

I guess I am building up to some kind of more robust calculation, but this is kind of the information/provocation phase.

^{^}

You might argue that China seems not to want to race or put AI in charge of key processes, and I’d agree. But given we would have had the West regardless, this seems to make things less worse than they could have been, rather than better.

^{^}

Did FTX try? Like what was the Bahamas like in 10 years in the FTX success world?

^{^}

I may be double counting here but there feels like something different about the general geopolitical instability and specifically how US/China might react.

LESSWRONG
LW

LESSWRONG
LW

22

My AI Vibes are Shifting

22

22