Another difference would be expectations for when the coin gets tossed more than once.
With "Type 1" if I toss coin 2 times I expect "HH", "HT", "TH", "TT" - each with 25% probability
With "Type 2" I'd expect "HH" or "TT" with 50% each.
The Biden Administration disagrees, as part of its ongoing determination to screw up basic economic efficiency and functionality.
Did this happen during the previous administration, or is it Trump administration?
you can always reset your personalization.
If persuasion is good enough you don't want to reset personalization.
Could be classic addiction. Or you could be persuaded to care about different things.
Sam Altman was publicly talking about this in 2024-02 (WSJ). I think this was the 1st time I've encountered the idea. Situational awaness I think was published ~4 months later, 2024-06 (https://situational-awareness.ai/ says "June 2024")
Apparently no. Scott wrote he used one image from Google maps, and 4 personal images that are not available online.
People tried with personal photos too.
I tried with personal photos (screenshotted from Google photos) and it worked pretty well too :
Another one identified as taken in a big polish city, the correct answer was among 4 candidates it listed
I didn’t use a long prompt like the one Scott copies in his post, just short „You’re in GeoGuesser, where was this picture taken” or something like that
So far, the answer seems to be that it transfers some, and o1 and o1-pro still seem highly useful in ways beyond reasoning, but o1-style models mostly don’t ‘do their core thing’ in areas where they couldn’t be trained on definitive answers.
Based on:
It seems likely to me that thinking skills transfer pretty well. But then this s trained out because this results in answers that raters don't like. So model memorizes answers its supposed to go with.
If they can’t do that, why on earth should you give up on your preferences? In what bizarro world would that sort of acquiescence to someone else’s self-claimed authority be “rational?”
Well if they consistently make recommendations that in retrospect end up looking good then maybe you're bad at understanding. Or maybe they're bad at explaining. But trusting them when you don't understand their recommendation is exploitable so maybe they're running a strategy where they deliberately make good recommendations with poor explanations so when you start trusting them they can start mixing in exploitative recommendations (which you can't tell apart because all recommendations have poor explanations).
So I'd really rather not do that in community context. There are ways to work with that. Eg. boss can skip some details of employees recommendations and if results are bad enough fire the employee. On the other hand I think it's pretty common for employee to act in their own interest. But yeah, we're talking principal-agent problem at that point and tradeoffs what's more efficient...
I'll try.
TL;DR I expect the AI to not buy the message (unless it also thinks it's the one in the simulation; then it likely follows the instruction because duh).
The glaring issue (to actually using the method) to me is that I don't see a way to deliver the message in a way that:
If "god tells" the AI the message then there is a god in their universe. Maybe AI will decide to do what it's told. But I don't think we can have Hermes deliver the message to any AIs which consider killing us.
If the AI reads the message in its training set or gets the message in similarly mundane way I expect it will mostly ignore it, there is a lot of nonsense out there.
I can imagine that for thought experiment you could send message that could be trusted from a place from which light barely manages to reach the AI but a slower than light expansion wouldn't (so message can be trusted but it mostly doesn't have to worry about the sender of the message directly interfering with its affairs).
I guess AI wouldn't trust the message. It might be possible to convince it that there is a powerful entity (simulating it or half a universe away) sending the message. But then I think it's way more likely in a simulation (I mean that's an awful coincidence with the distance and also they're spending a lot more than 10 planets worth to send a message over that distance...).
It seems to me like your own post answers this question?
Any individual is unlikely to notice the difference, but if we treat those like ELO[1] ChatGPT tells me ELO 100 wins 50.14% of the time. Which is not a lot, but with 1 mllion people thats on average some 2800 people more saying they prefer 100 option than 99 option.
[1] Which might not be right, expected utility sounds like we want to add and average utility numbers and it's not obvious to me to do stuff like averaging ELO.