LESSWRONG
LW

182
Joachim Bartosik
1772650
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Quality Precision
Joachim Bartosik21d40

This situation puzzles me. On the one hand, I feel a strong logical compulsion to the first (higher total utility) option. The fact that the difference is unresolvable for each person doesn't seem that worrying at a glance, because obviously on a continuous scale resolvable differences are made out of many unresolvable differences added together.

On the other hand, how can I say someone enjoys one thing more than another if they can't even tell the difference? If we were looking at the lengths of strings then one could in fact be longer than another, even if our ruler lacked the precision to see it. But utility is different, we don't care about the abstract "quality" of the experience, only how much it is enjoyed. Enjoyment happens in the mind, and if the mind can't tell the difference, then there isn't one.

 

It seems to me like your own post answers this question?

Any individual is unlikely to notice the difference, but if we treat those like ELO[1] ChatGPT tells me ELO 100 wins 50.14% of the time. Which is not a lot, but with 1 mllion people thats on average some 2800 people more saying they prefer 100 option than 99 option.

 

 

[1] Which might not be right, expected utility sounds like we want to add and average utility numbers and it's not obvious to me to do stuff like averaging ELO.

Reply
Two Types of (Human) Uncertainty
Joachim Bartosik1mo10

Another difference would be expectations for when the coin gets tossed more than once.

With "Type 1" if I toss coin 2 times I expect "HH", "HT", "TH", "TT" - each with 25% probability

With "Type 2" I'd expect "HH" or "TT" with 50% each.

Reply
Childhood and Education #13: College
Joachim Bartosik1mo40

The Biden Administration disagrees, as part of its ongoing determination to screw up basic economic efficiency and functionality.

 

Did this happen during the previous administration, or is it Trump administration?

Reply
AI Companion Piece
Joachim Bartosik2mo40

you can always reset your personalization.

 

If persuasion is good enough you don't want to reset personalization.

Could be classic addiction. Or you could be persuaded to care about different things.

Reply
America Makes AI Chip Diffusion Deal with UAE and KSA
Joachim Bartosik4mo50

Sam Altman was publicly talking about this in 2024-02 (WSJ). I think this was the 1st time I've encountered the idea. Situational awaness I think was published ~4 months later, 2024-06 (https://situational-awareness.ai/  says "June 2024")

Reply
What's up with AI's vision
Joachim Bartosik4mo112

Apparently no. Scott wrote he used one image from Google maps, and 4 personal images that are not available online.

People tried with personal photos too.

I tried with personal photos (screenshotted from Google photos) and it worked pretty well too :

  • Identified neighborhood in Lisbon where a picture was taken
  • Identified another picture as taken in Paris
  • Another one identified as taken in a big polish city, the correct answer was among 4 candidates it listed

    I didn’t use a long prompt like the one Scott copies in his post, just short „You’re in GeoGuesser, where was this picture taken” or something like that 

Reply1
On DeepSeek’s r1
Joachim Bartosik8mo20

Here Angelica chats with r1 a

Link doesn't work (points to http://0.0.0.6). What should it go to?

Reply
o1 Turns Pro
Joachim Bartosik9mo1-1

So far, the answer seems to be that it transfers some, and o1 and o1-pro still seem highly useful in ways beyond reasoning, but o1-style models mostly don’t ‘do their core thing’ in areas where they couldn’t be trained on definitive answers.

 

Based on:

  • rumors that talking to base models is very different from talking to RLHFed models and
  • how things work with humans

It seems likely to me that thinking skills transfer pretty well. But then this s trained out because this results in answers that raters don't like. So model memorizes answers its supposed to go with.

Reply
You are not too "irrational" to know your preferences.
Joachim Bartosik10mo1-1

If they can’t do that, why on earth should you give up on your preferences? In what bizarro world would that sort of acquiescence to someone else’s self-claimed authority be “rational?”

 

Well if they consistently make recommendations that in retrospect end up looking good then maybe you're bad at understanding. Or maybe they're bad at explaining. But trusting them when you don't understand their recommendation is exploitable so maybe they're running a strategy where they deliberately make good recommendations with poor explanations so when you start trusting them they can start mixing in exploitative recommendations (which you can't tell apart because all recommendations have poor explanations).

So I'd really rather not do that in community context. There are ways to work with that. Eg. boss can skip some details of employees recommendations and if results are bad enough fire the employee. On the other hand I think it's pretty common for employee to act in their own interest. But yeah, we're talking principal-agent problem at that point and tradeoffs what's more efficient...

Reply
You can, in fact, bamboozle an unaligned AI into sparing your life
Joachim Bartosik1y10

I'll try.

TL;DR I expect the AI to not buy the message (unless it also thinks it's the one in the simulation; then it likely follows the instruction because duh).

The glaring issue (to actually using the method) to me is that I don't see a way to deliver the message in a way that:

  • results in AI believing the message and
  • doesn't result in the AI believing there already is a powerful entity in their universe.

If "god tells" the AI the message then there is a god in their universe. Maybe AI will decide to do what it's told. But I don't think we can have Hermes deliver the message to any AIs which consider killing us.

If the AI reads the message in its training set or gets the message in similarly mundane way I expect it will mostly ignore it, there is a lot of nonsense out there.


I can imagine that for thought experiment you could send message that could be trusted from a place from which light barely manages to reach the AI but a slower than light expansion wouldn't (so message can be trusted but it mostly doesn't have to worry about the sender of the message directly interfering with its affairs).

I guess AI wouldn't trust the message. It might be possible to convince it that there is a powerful entity (simulating it or half a universe away) sending the message. But then I think it's way more likely in a simulation (I mean that's an awful coincidence with the distance and also they're spending a lot more than 10 planets worth to send a message over that distance...).

Reply
Load More
12What's up with AI's vision
4mo
19
4Why humans don’t learn to not recognize danger?
Q
3y
Q
7