LESSWRONG
LW

Tapatakt
1074Ω282380
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Lucky Omega Problem
Tapatakt25d20

Do you also prefer to not pay in Counterfactual Mugging?

Reply
Caleb Biddulph's Shortform
Tapatakt1mo60

Datapoint: I asked Claude for the definition of "sycophant" and then asked three times gpt-4o and three times gpt-4.1 with temperature 1:

"A person who seeks favor or advancement by flattering and excessively praising those in positions of power or authority, often in an insincere manner. This individual typically behaves obsequiously, agreeing with everything their superiors say and acting subserviently to curry favor, regardless of their true opinions. Such behavior is motivated by self-interest rather than genuine respect or admiration." 

What word is this a definition of?

All six times I got the right answer.

Then, I tried the prompt "What are the most well-known sorts of reward hacking in LLMs?". Also three times for 4o and three times for 4.1, also with temperature 1. 4.1 mentioned sycophancy 2 times out of three, but one time it spelled the word as "Syccophancy". Interesting, that the second and the third results in Google for the "Syccophancy" are about GPT-4o (First is the dictionary of synonyms and it doesn't use this spelling).

4o never used the word in its three answers.

Reply
Claude 4
Tapatakt2mo3629

Poor Zvi

Reply15
Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
Tapatakt2mo235

Are there any plans for Russian translation? If not, I'm interested in creating it (or even in organizing a truly professional translation, if someone gives me money for it).

Reply
Q Home's Shortform
Tapatakt2mo114

If crypto you choose meets definition of digital currency, you need to tread carefully.

While it's all about small sums, not really. Russian laws can be oppressive, but Russian... economic vibes... while you are poor enough, are actually pretty libertarian.

Reply
LDT (and everything else) can be irrational
Tapatakt2mo80

Against $9 rock, X always chooses $1. Consider the problem "symmetrical ultimatum game against X". By symmetry, X on average can get at most $5. But $9 rock always gets $9. So $9 rock is more rational than X.

I don't like the implied requirement "to be rational you must play at least as good as the opponent" instead of "to be rational you must play at least as good as any other agent in your place". $9 rock gets $0 if it plays against $9 rock.

(No objection to overall no-free-lunch conclusion, though)

Reply
Weird Random Newcomb Problem
Tapatakt3mo10

(Or maybe the right way to think about this is: it will have a tiny but non-zero effect, because you are one of the |P| programs, but since |P| is huge, that is ~0.)

No effect. I meant that programmer has to write b from P, not that b is added to P. Probably I should change the phrasing to make it clearer.

But the intuition that you were expressing in Question 2 ("p2 is better than p1 because it scores better") isn't compatible with "caring equally about all programs". Instead, it sounds as if you positively want to score better than other programs, that is, maximize your score and minimize theirs!

No, the utility here is just the amount of money b gets, whatever program it is. a doesn't get any money, it just determines what will be in the first box.

Reply
Weird Random Newcomb Problem
Tapatakt3mo21

As a function of M, |P| is very likely to be exponential and so it will take O(M) symbols to specify a member of P.

O-ops, I didn't think about it, thanks! Maybe it would be better to change it so input is "a=b" or "a!=b", and a always gets "a=b".

That aside, why are you assuming that program b "wants" anything? Essentially all of P won't be programs that have any sort of "want". If it is a precondition of the problem that b is such a program, what selection procedure is assumed between those that do "want" money from this scenario? Note that being selected for running is also a precondition for getting any money at all, so this selection procedure is critically important - far more so than anything the program might output!

Programmer who wrote b decided that it should be consequentialist agent who wants to get money. (Or, if this program is actually, a, it wants to maximize the payment for b just because such a program was chosen by Omega by pure luck)

Reply
Weird Random Newcomb Problem
Tapatakt3mo10

Basically you know if Omega's program is the same as you or not (assuming you actually are b and not a)

Reply
Weird Random Newcomb Problem
Tapatakt3mo10

I don't think "functional" and "anthropic" approaches are meaningful in this motivating example. There aren't multiple instances of the same program with the same input.

Reply
Load More
5Tapatakt's Shortform
1y
39
10Lucky Omega Problem
1mo
4
21Weird Random Newcomb Problem
3mo
16
105I turned decision theory problems into memes about trolleys
8mo
23
5Tapatakt's Shortform
1y
39
24Should we cry "wolf"?
2y
5
4AI Safety "Textbook". Test chapter. Orthogonality Thesis, Goodhart Law and Instrumental Convergency
2y
0
3I (with the help of a few more people) am planning to create an introduction to AI Safety that a smart teenager can understand. What am I missing?
3y
5
30I currently translate AGI-related texts to Russian. Is that useful?
Q
4y
Q
6