Simulating the *rest* of the political disagreement

[-]Seth Herd2mo158

On the epistemic point: yes, and this is something current-gen LLMs seem actually useful for, with little risk. This is where their idea generation is useful, and their poor taste and sycophancy doesn't matter.

I've had success asking LLMs for counterarguments. Most of them are dumb and you can dismiss them. But they're smart enough to come up with some good ones once you've steelmanned them and judged their worth for yourself.

This seems less helpful than getting pushback from informed people. But that's hard to find; I've had experiences like yours with Zac HD, in which a conversation fails to surface pretty obvious-in-hindsight counterarguments, just because the conversation focused elsewhere. And I have gotten good pushback by asking LLMs repeatedly in different ways, as far back as o1.

On the object level on your example: I assume a lot of us aren't very engaged with pause efforts or hopes because it seems more productive and realistic to work on reducing ~70% toward ~35% misalignment risks. It seems very likely, that we're gonna barrel forward through any plausible pause movement, but not clear (even after trying to steelman every major alignment difficulty argument) that alignment is insoluble - if we can just collectively pull our shit halfway together while racing toward that cliff.

[-]Raemon2mo60

I assume a lot of us aren't very engaged with pause efforts or hopes because it seems more productive and realistic to work on reducing ~70% toward ~35% misalignment risks.

Nod. I do just, like, don't think that's actually that great a strategy – it presupposes it is actually easier to get from 70% to 35% than from 35% to 5%. I don't see Anthropic-et-al actually really getting ready to ask the sort of questions that would IMO be necessary to actually do-the-reducing.

[-]Seth Herd2mo82

I'm not getting your 35% to 5% reference? I just have no hope of getting as low as 5%, but a lot of hope for improving on just letting the labs take a swing.

I fully agree that Anthropic and the other labs doen't seem engaged with the relevant hard parts of the problem. That's why I want to convince more people that actually understand the problem to identify and work like mad on the hard parts like the world is on fire, instead of hoping it somehow isn't or can be put out.

It may not be that great a strategy, but to me it seems way better than hoping for pause. I think we can get a public freakout before gametime, but even that won't produce a pause once the government and military is fully AGI-pilled.

This is a deep issue I've been wanting to write about, but haven't figured out how to address without risking further polarization within the alignment community. I'm sure there's a way to do it productively.

[-]Raemon2mo90

That's why I want to convince more people that actually understand the problem to identify and work like mad on the hard parts like the world is on fire, instead of hoping it somehow isn't or can be put out.

FYI something similar to this was basically my "last year's plan", and it's on hold because I think it is plausible right now to meaningfully move the overton window around pauses or at least dramatic slowdowns. (This is based on seeing the amount of traffic AI 2027 got, and the number of NatSec endorsements that If Anyone Builds It Got, and having recently gotten to read it and thinking it is pretty good)

I think if Yoshua Bengio, Geoffrey Hinton, or Dario actually really tried to move overton windows instead of sort of trying to manuever within the current one, it'd make a huge difference. (I don't think this means it's necessarily tractable for most people to help. It's a high-skill operation)

(Another reason for me putting "increase the rate of people able to think seriously about the problem" on hold is that my plans there weren't getting that much traction. I have some models of what I'd try next when/if I return to it but it wasn't a slam dunk to keep going)

[-]Seth Herd2mo82

What I think would be really useful is more dialogue across "party lines" on strategies. I think I'm seeing nontrivial polarization, because attempted dialogues seem to usually end in frustration rather than progress.

I'm thinking of a slightly different plan than "increase the rate of people being able to think seriously about the problem" I'd like to convince people who already understand the problem to accept that pause is unlikely and alignment is not known to be impossibly hard even on short timelines. If they agreed with both of those it seems like they'd want to work on aligning LLM-based AGI, on what looks like the current default path. I think just a few more might help nontrivially. The number of people going "straight for the throat" is very small.

I'm interested in the opposite variant too, trying to convince people working on "aligning" current LLMs to focus more on the hard parts we haven't encountered yet.

I do think shifting the Overton window is possible. Actually I think it's almost inevitable; I just don't know if it happens soon enough to help. I just think a pause is unlikely even if the public screams for it - but I'm not sure, particularly if that happens sooner than I think. Public opinion can shift rapidly.

The Bengio/Hinton/Dario efforts seem like they are changing the Overton window, but cautiously. PR seems to require both skill and status.

Getting entirely new people to understand the hard parts of the problem and then understand all of the technical skills or theoretical subtleties is another route. I haven't thought as much about that one because I don't have a public platform, but I do try to engage newcomers to LW in case they're the type to actually figure things out enough to really help.

[-]Raemon2mo60

I'm thinking of a slightly different plan than "increase the rate of people being able to think seriously about the problem" I'd like to convince people who already understand the problem to accept that pause is unlikely and alignment is not known to be impossibly hard even on short timelines. ...

...Getting entirely new people to understand the hard parts of the problem and then understand all of the technical skills or theoretical subtleties is another route. I haven't thought as much about that one because I don't have a public platform,

I think it's useful to think of "rate of competent people think seriously about the right problems" is, like, the "units" of success for various flavors of plans here. There are different bottlenecks.

I currently think the rate-limiting reagent is "people who understand the problem". And I think that's in turn rate-limited on:

"the problem is sort of wonky and hard with bad feedbackloops and there's a cluster of attitudes and skills you need to have any traction sitting and grokking the problem."
"we don't have much ability to evaluate progress on the problem, which in turn means it's harder to provide a good funding/management infrastructure for it."

[-]Seth Herd1mo42

Better education can help with your first problem, although that pulls people who understand the problem away from working on it.

I agree that the difficulty of evaluating progress is a big problem. One solution is to just fund more alignment research. I am dismayed if it's true that Open Phil is holding back available funding because they don't see good projects. Just fund them and get more donations later when the whole world is properly more freaked out. If it's bad research now, at least those people will spend some time thi8nking about and debating what might be better research.

I'd also love to see funding directly on people understanding the whole problem including the several hard parts. It is a lot easier to evaluate whether someone is learning a curriculum than doing good research. Exposing people to a lot of perspectives and arguments and sort of paying and forcing them to think hard about it should at least improve their choice of research and understanding of the problem.

I definitely agree that understanding the problem is the rate-limiting factor. I'd argue that it's not just the technical problem you need to understand, but the surrounding factors, eg how likely is a pause or slowdown and how likely is it we reach AGI how soon on the default path. I'm afraid some of our best technical thinkers understand the technical problem but are confused about how unlikely it is that any approach but directLLM descendents will be the first critical attempt at aligning AGI. But arguments for or against that are quite complex.

[-]Raemon2mo50

I think "moving an overton window" is a sort of different operation than what Bengio/Hinton/Dario are doing. (Or, like, yes, they are expanding an overton window, but, their entire strategy for doing so seems predicated on a certain kind of caution/incrementalness)

I think there are two pretty different workable strategies:

say things somewhat outside the window, picking your battles
make bold claims, while believing in your convictions with enough strength and without looking "attackable for mispeaking".

Going halfway from one to the other doesn't actually work, and the second one doesn't really work unless you actually do have those convictions. There are a few people trying to do the latter, but, most of them just don't actually have the reputation that'd make anyone care (and also there's a lot of skill to doing it right). I think if at least one of Yoshua/Geoffrey/Dario/Demis switched strategies it'd make a big difference.

[-]Raemon2mo20

I'm not getting your 35% to 5% reference? I just have no hope of getting as low as 5%, but a lot of hope for improving on just letting the labs take a swing.

i.e, if basically anything other than a long pause will be insufficient to actually work, you might as well swing for the pause.

[-]Seth Herd1mo117

Sure. If you're confident of that.

It drives me a bit nuts that many of our otherwise best thinkers are confident of aligniing LLM AGI being almost impossible, when the arguments they're citing just don't stack to near certainty even with active steelmanning. I've been immersing myself in the arguments for inner misalignment as a strong default. They're strong, they should make you afraid, but they're nowhere near certainty.

Few people who take that aligment difficulty seriously are even proposing ways around it for LLM AGI. We have barely begun working on the most relevant hard problem. Calling it hopeless without working on it is... questionable. I see why you might, for various reasons, but at this point I think it's a huge mistake.

We can call for pause/slowdown and emphasize the difficulty, while also working on alignment on the default path. We're in a bad situation, and that looks to me like our biggest potential out by an order of magnitude.

[-]Cole Wyeth2mo13

Deserves a name.

[-]Raemon2mo30

Do you mean like a short pithy name for the fallacy/failure-mode?

[-]Cole Wyeth2mo20

Yes.

[-]Kaj_Sotala2mo20

Agree. My first thought was something like "belief correlation" or "belief interconnectedness", as in "I forgot that beliefs are correlated". Also a vague reference to everything is correlated.

[-]lemonhope2mo-10

I was at this weird party where everyone started drinking poison. I tried explaining it to them but I didn't have any proof or anything and I said "look that's poison sir" and "m'am if you think there's any chance I'm right you should stop drinking that" but no luck. They said "I'm thirsty" or "i already have this cup in my hand" or "maybe water is poison and this is antidote, you know more people die from drowning than random poisoning". I realized I was failing to think about it from their perspective. If I had known all these people for years and this big party was basically what we've been planning for a while and it was blowing up online then I would definitely drink it too.

After all the poison-is-actually-antidote guy could've also said "if there's any chance I'm right you have to drink it right now" which would be just political manuvineering clearly.

Anyways I miss those guys they were fun

[-]Jasnah Kholin23d10

I didn't have any proof or anything

then how did you know it's a poison?

LESSWRONG
LW

LESSWRONG
LW

125

Simulating the rest of the political disagreement

125

125

The Takeaways

Appendix: The prior arguments