How does the typical construction of Newcomb's problem handle mixed strategies?
The question of why no "might kill us" as a class is simple. There is such a class, but if it lost the fight to kill us, it obviously was not ASI (picking a fight with the world and losing is pretty dumb), or it might win, at which case it won, we die. And then we will be in the same scenario for every AI stronger than it, and for AI weaker than it that might yet get lucky, just as we might get lucky and win at bad odds. The next AI we make will also want to fight us for the same reasons, and we will need to either fight it to (including preemptively, e.g. turning it off because a dumber model did something), or get a reason to believe that we will never fight it. And if you know you will fight your AI eventually, and you will win now, fight now.
if it were competent enough to reflect and actively reconcile its own inner drives
Why do we think that reflection is neccesary for competence. That is, competence does not seem to imply coherence, unless I missed an argument.
This analysis is slightly incorrect, since if rewards programs are treated as assets, they contribute to net enterprise value, not to market cap. Market cap mostly subtracts loans outstanding from the total expected returns of the enterprise. American, for instance, has 37 billion dollars out in debt, and the market cap is the expected value above the expected payout to the loans.
So in this analysis, American is actually worth 47 billion-ish dollars. 37 billion of those are owned by the creditors, and 10 billion is owned by the shareholders. 22 billion of that worth is the rewards program, and the rest is the rest of the business.
If american's rewards program disappeared, the creditors would take a haircut, but they would not get nothing.
You can do the same for all of these.
As an aside, every time I have seen someone saying this, they neglect to model debt. I think it might be systematically confusing.
For these hidden reasoning steps, especially the epiphenomenal model, there can also be redundancy between weight computation and chain of thoughts. That is, a dumb model seeing the chain of thought of a smart model might get the right answer when it would not otherwise, even if the chain of thought does not help the large model.
Under the assumption of separate initialization, this probably does not happen in cases where the extra information is stenographic, or in some examples of irrelevant reasoning because things are not being passed through the token bottleneck.
Steve Strickland: GPT-5 is the first model I’ve used that will deliberately give a wrong answer to ‘check you’re paying attention’.
This fundamentally unreliable technology is not going to put us all out of work.
Is that intentional, or is the name a hallucination and the cheeky comment an intervention aimed on exploiting the fact that models can tell when they are hallucinating.
In lots of these cases, it seems like every university ends up thinking (This student would be making a mistake to commit to our university). It seems like a really hard sell to say universities should accept people who they think would be making a mistake to actually go. This is unfair, the university does not know your other options, and this mistake framework requires that committing be worse than the decision that you would actually take.
One box P1 and Two box P1 are not the only strategies. You can also try to find some source of randomness to take a mixed strategy from. If you are not able to take mixed strategies, Game theory and decision theory are not guaranteed to converge.