The big problem with pre-x LLMs, especially into the deep past, is sampling bias. That is to say, the set of texts that survive is not the set of texts that were present, which is an extra bias on the already biased data collection for regular LLMs. There is also the temptation to support (especially for placing a LLM as a strategic) with analysis and data that comes after the period, which was collected with a hindsight bias for what is interesting.
So far, I have not had Claude Code refuse a request from me, not even once.
When Claude reinterprets the request to be "the request, but safe", this is invisible, and good on the margin.
I think doing this is mostly fine.
The problem with spontaneous agreement is that explicit agreement gives a chance for parties A and B to exchange hidden information without letting third parties know.
The reason for agreements being formalized is also because since you need to negotiate to reach the efficient settlement, since you have to reveal secret information, but publicly implementing the results of secret negotiations leads to suspiscion of secret alliances. If your public allies think you have secret allies, they are not happy, since that is bad for them. So publicizing agreements makes you more predictable, which is good for you.
I am assuming the first step is to count the chromosomes, and isolate that you have exactly a collection of full sets, which came from the same cells, so that the property, (there are k of each exactly) holds. I would need to look in literature to see the sucess rate of isolating all chromosomes, not just (at least one)
It seems for this method you must get 22 sucessful sequences in a row, which is hard if sequencing fails even sometimes, or you lose a chromosome even sometimes.
For 1. resampling is not (asking the AI to try again) it is forcibly rewriting it's history so that it made a different one of its plausible decisions. Every resampled continuation plausibly could have been the original continuation, given a different rng seed, and there is no nonshared place to put a scratchpad that would pass the information of the resampling event. It would probably increase the risk that you resample into a less suspicious bad path, but it is outweighed by the fact that the AI has to mostly bide it's time (in order to not get caught in the fraction with good monitoring done in the first requrests) (this is a previous important result), so on net it is a gain.
For 2. The model used in this paper is investigating that the untrusted model is behaving optimally given it's ability to actually do sneaky bad stuff. This requires calculating the optimum knowing the monitoring protocol, so the untrusted AI must know about the monitor. There is not in this model (does not know about the monitoring setup), because if the model actually thinks that there is no monitoring, it just tries immediately and then loses. If it thinks there is monitoring, it is really hard to trick it about the monitoring strategy.
for 3. No, you want to use the same model, since you can prevent the resampled branches from knowing about each other (basically they only share the shared history (history before resampling) and weights). This is basically what you would want from another "instance". The modeling assumption is that there is exactly one untrusted AI, which wins if any path breaks out, so there is not any meaningfully different untrusted AI to ask to try from that starting point.
One box P1 and Two box P1 are not the only strategies. You can also try to find some source of randomness to take a mixed strategy from. If you are not able to take mixed strategies, Game theory and decision theory are not guaranteed to converge.
How does the typical construction of Newcomb's problem handle mixed strategies?
The question of why no "might kill us" as a class is simple. There is such a class, but if it lost the fight to kill us, it obviously was not ASI (picking a fight with the world and losing is pretty dumb), or it might win, at which case it won, we die. And then we will be in the same scenario for every AI stronger than it, and for AI weaker than it that might yet get lucky, just as we might get lucky and win at bad odds. The next AI we make will also want to fight us for the same reasons, and we will need to either fight it to (including preemptively, e.g. turning it off because a dumber model did something), or get a reason to believe that we will never fight it. And if you know you will fight your AI eventually, and you will win now, fight now.
if it were competent enough to reflect and actively reconcile its own inner drives
Why do we think that reflection is neccesary for competence. That is, competence does not seem to imply coherence, unless I missed an argument.
This analysis assumes that the government of Taiwan has no power, that the US is the one to negotiate with. If you are thinking of this, the question is not what deals the US would agree to, but what deal Taiwan would agree to. The US is not actually in a position to veto a reunification agreed between both governments, though they likely would not be able to backstop it either (which creates a little negotiating friction).
All of these concession are concessions to the US. What really matters for a peaceful settlement (since Taiwan can destroy the surplus) is whether the CCP can give meaningful concessions on the terms of reunification with Taiwan.
That is to say, there is no "process to gradually and peacefully hand over Taiwan to China" as an option for the US. It might exist for the government of Taiwan, but because the US does not actually control any of it's partners among middle powers.
as for "China care more about Taiwan than anybody else. "
This is false
The US might care less, but the government of Taiwan probably cares more.