Wiki Contributions

Comments

I mean the full option space obviously also includes "bargain with Russia and China to make credible commitments that they stop rearming (possibly in exchange for something)", and I think we should totally explore that path aswell, I just don't have much hope in it at this stage which is why I'm focusing on the other option, even if it is a fucked up local nash equilibrium. 

I've been thinking a lot recently about taxonomizing AI risk related concepts to reduce the dimensionality of AI threat modelling while remaining quite comprehensive. It's in the context of developing categories to assess whether labs plans cover various areas of risk.

There are two questions I'd like to get takes on. Any take on one of these 2 wd be very valuable.

  1. In the misalignment threat model space, a number of safety teams tend to assume that the only type of goal misgeneralization that could lead to X-risks is deceptive misalignment. I'm not sure to understand where that confidence comes from. Could anyone make or link to a case that rules out the plausibility of all other forms of goal misgeneralization? 
  2. It seems to me that to minimize the dimensionality of the threat modelling, it's sometimes more useful to think about the threat model (e.g. a terrorist misuses an LLM to develop a bioweapon) and sometimes more useful to think about a property which has many downstream consequences on the level of risk. I'd like to get takes on one such property:
    1. Situational awareness: It seems to me that it's most useful to think of this property as its own hazard which has many downstream consequences on the level of risk (most prominently that a model with it can condition on being tested when completing tests). Do you agree or disagree with this take? Or would you rather discuss situational awareness only in the context of the deceptive alignment threat model?

Rephrasing based on an ask: "Western Democracies need to urgently put a hard stop to Russia and China war (preparation) efforts" -> Western Democracies need to urgently take actions to stop the current shift towards a new World order where conflicts are a lot more likely due to Western democracies no longer being a hegemonic power able to crush authoritarians power that grab land etc. This shift is currently primarily driven by the fact that Russia & China are heavily rearming themselves whereas Western democracies are not.

@Elizabeth

Answer by simeon_cApr 10, 202430

I liked this extension (https://chrome.google.com/webstore/detail/whispering/oilbfihknpdbpfkcncojikmooipnlglo), which I use for long messages. I press a shortcut, it starts recording with Whisper, then repress and it puts the transcript in my clipboard.

In those, Ukraine committed to pass laws for Decentralisation of power, including through the adoption of the Ukrainian law "On temporary Order of Local Self-Governance in Particular Districts of Donetsk and Luhansk Oblasts". Instead of Decentralization they passed laws forbidding those districts from teaching children in the languages that those districts wants to teach them. 

Ukraines unwillingness to follow the agreements was a key reason why the invasion in 2022 happened and was very popular with the Russian population

I ignored that, that's useful, thank you. 

My (simple) reasoning is that I pattern matched hard to the Anschluss (https://en.wikipedia.org/wiki/Anschluss) as a prelude to WW2 where democracies accepted a first conquest hoping that it would stop there (spoiler: it didn't). 

Minsk really much feels the same way. From the perspetive of democracies it seems kinda reasonable to try one time a peaceful resolution accepting a conquest and see if Putin stops (although in hindsight it's unreasonable to not prepare to the possibility he doesn't). Now that he started invading Ukraine as a whole, it seems really hard for me to believe "once he'll get Ukraine, he'll really stop". I expect many reasons to invade other adjacent countries to come up aswell.

The latest illegal land grab was done by Israel without any opposition by the US. If you are truly worried about land grabs being a problem why not speak against that US position of being okay with some land grabs instead of just speaking for buying more weapons?

Two things on this. 

  1. Object-level: I'm not ok with this. 
  2. At a meta-level, there's a repugnant moral dilemma fundamental to this:
    1. The American hegemonic power was abused, e.g. see https://en.wikipedia.org/wiki/July_12,_2007,_Baghdad_airstrike or a number of wars that the US created for dubious reasons (i.e. usually some economic or geostrategic interests). (same for France, I'm just focusing on the US here for simplicity)
    2. Still, despite those deep injustice, the 2000s have been the least lethal in interstate conflicts because hegemony with threat of being crushed by the great power disincentivizes heavily anyone to fight. 
      1. It seems to me that hegemony of some power or coalition of powers is the most stable state for that reason. So I find this state quite desirable.
    3. Then the other question is, who should be in that position?
      1. I have the chance to be able to write this about my country without ending up in jail for. And if I do end up in jail, I have higher odds than in most other countries to be able to contest it. 
      2. So, although western democracies are quite bad and repugnant in a bunch of ways, I find them the least worse and most beneficial existing form of political power to currently defend and preserve the hegemony of.

Indeed. One consideration is that the LW community used to be much less into policy adjacent stuff and hence much less relevant on that domain. Now, with AI governance becoming an increasingly big deal, I think we could potentially use some of that presence to push for certain things in defense. 

Pushing for things in the genre of what Noah describes in the first piece I shared seems feasible for some people in policy.

simeon_c10d4731

Idk what the LW community can do but somehow, to the extent we think liberalism is valuable, the Western democracies need to urgently put a hard stop to Russia and China war (preparation) efforts. I fear that rearmament is a key component of the only viable path at this stage.

I won't argue in details here but link to Noahpinion, who's been quite vocal on those topics. The TLDR is that China and Russia have been scaling their war industry preparation efforts for years, while Western democracies industries keep declining and remain crazily dependent from the Chinese industry. This creates a new global equilibrium where the US is no longer powerful enough to disincentivize all authoritarians regime from grabbing more land etc.

Some readings relevant to that:

I know this is not a core LW theme but to the extent this threat might be existential to liberalism, and to the existence of LW as a website in the first place, I think we should all care. It would also be quite terrible for safety if AGI was developed during a global war, which seems uncomfortably likely (~10% imo).

Reply4321

If you wanna reread the debate, you can scroll through this thread (https://x.com/bshlgrs/status/1764701597727416448). 

There was a hot debate recently but regardless, the bottom line is just "RSPs should probably be interpreted literally and nothing else. If a literal statement is not strictly there, it should be assumed it's not a commitment."

I've not seen people doing very literal interpretation on those so I just wanted to emphasize that point.

simeon_c12d171

Given the recent argument on whether Anthropic really did commit to not push the frontier or just misled most people into thinking that it was the case, it's relevant to reread the RSPs in hairsplitting mode. I was rereading the RSPs and noticed a few relevant findings:

Disclaimer: this is focused on negative stuff but does not deny the merits of RSPs etc etc.

  1. I couldn't find any sentence committing to not significantly increase extreme risks. OTOH I found statements that if taken literally could imply an implicit acknowledgment of the opposite: "our most significant immediate commitments include a high standard of security for ASL-3 containment, and a commitment not to deploy ASL-3 models until thorough red-teaming finds no risk of catastrophe.". 
    Note that it makes a statement on the risk only bearing on deployment measures and not on security. Given that the lack security is probably the biggest source of risk of ASL-3 systems & the biggest weakness of RSPs, I find it pretty likely that this is not random.
  2. I found a number of commitments that are totally unenforceable in hairsplitting mode. Here are two examples: 
    1. "World-class experts collaborating with prompt engineers should red-team the deployment thoroughly and fail to elicit information at a level of sophistication, accuracy, usefulness, detail, and frequency which significantly enables catastrophic misuse." 
      1. The use of five underdefined adjectives + "significantly" is a pretty safe barrier against any enforcement.
    2. "When informed of a newly discovered model vulnerability enabling catastrophic harm (e.g. a jailbreak or a detection failure), we commit to mitigate or patch it promptly (e.g. 50% of the time in which catastrophic harm could realistically occur)."
      1. The combination of "or", of the characterization of promptly as "50% of the time, the use of "e.g." and of "realistically" is also a safe barrier against enforceability. 
  3. It's only my subjective judgment here and you don't have to trust it but I also found Core Views on AI Safety to have a number of similar patterns.
Load More