All of Taleuntum's Comments + Replies

Taleuntum's Shortform

You are right, that is also a possibility. I only considered cases with one intervention, because the examples I've heard given for Goodhart's law only contain one (I'm thinking of UK monetary policy, Soviet nail factory and other cases where some "manager" introduces an incentive toward a proxy to the system). However, multiple intervention cases can also be interesting. Do you know of a real world example where the first intervention on the proxy raised the target value, but the second, more extreme one, did not (or vica versa)? My intuition suggests that in the real world those type of causal influences are rare and also, I don't think we can say that "P causes V" in those cases. Do you think that is too narrow of a definition?

1Measure7dEven in your weightlifting example, there is a point where adding more weight no longer improves your outcome.
3Pattern7dHere's a fictional story: You decide to study more. Your grades go up. You like that, so you decide to study really really hard. You get burnt out. Your grades go down. (There's also an argument here that the metric - grades - isn't necessarily ideal, but that's a different thing.)* *There might be a less extreme version involving 'you stay up late studying', and 'because you get less sleep it has less effect (memory stuff)'. This isn't meant as an unsolvable problem - it's just that: * You have limits and * You can grow are both true. Maybe this style of mechanism, or 'causal influence' is rare. But its (biological) nature arguably, may characterize a domain (life). So in that area at least, it's worth taking note of. I guess I'm saying, if you want to know if you have to be worried about Goodhart's Law, in general, I think it depends. Just spend time optimizing your metric, and spend time optimizing for you metric, and see what happens. If you want more specific feedback, I think you'll probably have to be more specific.
Taleuntum's Shortform

Have P proxy and V value. Based on past observances P is correlated with V.

Increase P! (Either directly or by introducing a reward to the agents inside the system for increasing P, who cares)

Two cases:

P does not cause V

P causes V

Case 1: Wow, Goodhart is a genius! Even though I had a correlation, I increased one variable and the other did not increase!

Case 2: Wow, you are pedantic. Obviously if the relationship between the variables is so special that P causes V, Goodhart's law won't apply. If I increase the amount of weight lifted (proxy), then obviously I... (read more)

3Pattern7dIf you keep increasing P, the connection might break.
Open & Welcome Thread – October 2020

Replication Markets is going to start a new project focusing on COVID studies. Infos:

  • Surveys open on October 28, 2020.
  • Markets open on November 11, 2020.
  • A total of $14,520 in prizes will be awarded.
  • Contest will forecast (1) publication, (2) citation, (3) replication, and (4) usefulness for the Top-400 claims from COVID-19 research, using both surveys and markets.
The Darwin Game

Your betrayal of the clique is very nice, hats off to you. I also liked your idea of getting others not that interested in the game to submit bots helping you, It's a pity it did not occur to me.

However, I think you are, too, overconfident in you winning. I've run a simulation of the whole tournament till the 160th round with 8 bots (MatrixCrashingBot, TitforTatBot, PasswordBot1, PasswordBot2, EmptyCloneBot, earlybird, incomprehensiblebot, CliqueZviBot) and in the final equilibrium state there are three bots: earlybird, incomprehensiblebot and CliqueZviBot... (read more)

1Multicore8moThat does revise down my expectations of winning, but my bot having run thousands of times on someone else's computer and not crashing (or failing the clone check?) is good to hear. Maybe I'm overestimating the snowball effects of an early pool. If the late game has everyone cooperating with everyone else, your matches with others are only giving a tiny bit fewer points than matches against your copies.
The Darwin Game

Disqualifying players for things they obviously wouldn't do if they knew the rules of the game seems pretty cruel. I hope isusr just deletes that line for you.

7lsusr8mosimon will not be disqualified.
The Darwin Game

The links you posted do not work for me. (Nevermind)

Wow, you are really confident in you winning. There are 10 players in the clique, so even if there are no players outside the clique (a dubious assumption) a priori there is 10% chance. If I had money I would bet with you.

I also think there is a good chance that a CloneBot wins. 10 possible member is a good number imo. i would say 80%.

I would say 70% for the (possibly accidental) betrayal.

Without seeing your jailbreak.py I can't say how likely that others are able to simulate you.

What does "act out" mean in this context?

2philh8moUpdated with a link to my code. I also put yours in to see how we'd fare against each other one-on-one - from quick experimentation, looks like we both get close to 2.5 points/turn, but I exploit you for approximately one point every few hundred turns, leaving me the eventual victor. :D I haven't looked closely to see where that comes from. Of course too much depends on what other bots are around.
3philh8moConditioned on "any CloneBot wins" I've given myself about 25%. 10% in that conditional would definitely be too low - I think I have above-baseline chances on all of "successfully submit a bot", "bot is a CloneBot" and "don't get disqualified". I think I expect at least three to fall to those hurdles, and five wouldn't surprise me. And of the rest, I still don't necessarily expect most of them to be very serious attempts. By "act out" I mean it's a bot that's recognized as a CloneBot by the others but doesn't act like one - most likely cooperating with non-clones, but not-cooperating with clones would also count, it would just be silly as far as I can tell. I also include such a bot as a CloneBot for the 75%.
The Darwin Game

Yes, I feared that some might think my friend is in the clique. However I couldn't just say that they are not in the clique, because that would have been too obvious. (like my other lie: "Yeah, I totally have another method for detecting being in a simulation even if the simulation runs in a separate process, but unfortunately I can't reveal it.") So I tried to imply it by speaking about him as if he is not in the conversation and him not commenting after I mentioned him. I hoped in case someone was planning to submit a simulator outside the clique they would try to sneakily inquire about whether my friend is in the clique or not and then I would have asked a random, not competing lesswronger to play the part of my friend.

The Darwin Game

Good to know. I'm a C++ guy which has a "one definition rule" not only for the translation unit, but for the whole program, so I incorrectly assumed that python is the same even though the languages are obviously very different.

The Darwin Game

Maybe it's a little cheap to say this after you've revealed it, but it did actually occur to me that you might have deliberately made this weakness. Had I known that in Python you can redefine methods, I might have reported it, but the exploit with __new__() seemed pretty obscure (even though I didn't know the other way and I did know this). The possibility of this being a test was also the reason I went with the "Oh I'm so busy, I didn't have time to review the code.." excuse. I'm also curious whether Larion calculated with you deliberately planting the m... (read more)

5Emiya8moOh, that was me I think. I had simply thought your comment meant you were preparing code with someone else. Whether he was inside the clique, outside it, or a non player helping you out I wasn't sure, but I still recommended caution. I did think it was weird that you'd let slip such information, but couldn't see any reason for making people think you had allies, so I just thought that the most likely explanation was that a non player was helping you. Still, being cautious wouldn't hurt. I have to say I didn't made the connection about simulation crashing software being outside the clique, likely because I wasn't playing a simulator so I didn't thought much about it. All in all... I think it's a lie that would work best on the people it wouldn't need to work on. If I had thought to change a plan I had going based on the information you provided, I would have wondered a bit more about why you did that, perhaps getting suspicious. But I still think it wouldn't really be obvious as a lie to anyone. On a side note, I really love this site. I can't really recall any other game I've been in getting this tangled.
2Lanrian8moI didn't think about reporting the bug as making a sub-optimal but ethical choice – I just wanted to be part of a clique that worked instead of a clique where people defected. My aversion to lying might have affected my intuitions about what the correct choice was, though, idk ¯\_(ツ)_/¯
3Vanilla_cabs8moI didn't know about __new__(), I only knew about redifining methods, so based on what you knew, your reasoning was correct. I knew no one before starting the clique. Lanrian joined the same way as the others. If anything, Lanrian was suspicious because they insisted we put the random.seed() inside move() and make it pseudorandom so that simulators can accurately emulate our behaviour. The reason they gave was to better collaborate, and have the simulators play 2 against 3 instead of 3 against 3. I was mildly convinced and I still am suspicious of that move. They only late in the week reported the weakness, after you and philh passed on the chance to do so. But they did so soon after I showed them the code. The secrecy on the members was used to: * prevent members and potential members from worrying if there were too few current members. That was the purpose I had in mind when I made that choice. A few days before the end I still was not sure we'd be enough. I was also worried some members would drop if we were too little. So the 2 members who joined in the last 2 days really helped. * avoid any collusion between members that would not include me. And more generally receive any valuable information that members would like to share. So I used that advantage only in a defensive way. But I did receive an offer that did inform me on more offensive uses, and impacted my payload, which I will elaborate on if the sender allows it.
The Darwin Game

Explanation of my strategy and thought process in chronological order

After seeing Vanilla_Cabs's comment I lied to them about wanting to join the clique. I was undecided, but I figured seeing the code of the clique can be a great advantage if I can exploit some coding mistake and I can still decide to join later anyway if I want.

The first versions of CloneBot (the name of the program for our clique) did actually contain a mistake I could exploit (by defining the __new__() method of the class after the payload) and so this was my plan until Vanilla_Cabs fi... (read more)

3Lanrian8moI believed all lies! And I might've submitted a simulator if you hadn't told the first, and would definitely have tried harder to simulator-proof my bot, so you did change my behaviour. Leaving the clique wouldn't have been worth it, though. Even knowing that you lied about the 2nd thing, I assign decent probability to someone crashing all the simulators outside the clique. (I think this is incorrect, though – if you can figure out that you're in a simulation, it's way better to claim that you'll be submitting 3 to scare the simulator into playing 2.)

The first versions of CloneBot (the name of the program for our clique) did actually contain a mistake I could exploit (by defining the __new__() method of the class after the payload) and so this was my plan until Vanilla_Cabs fixed this mistake. After they fixed it, I didn't notice any way I can take advantage, so I joined the clique in spirit.

Little did you know that I was aware of this weakness from the beginning, and left it as a test to find whom I could trust to search for the weaknesses I didn't know. Of the 3 (I think) to whom I showed the code ea... (read more)

4philh8moIncidentally, you could also just redefine existing methods, which was how I planned to do it. Like, class Foo(): def __init__(self): self.x = 1 def __init__(self): self.x = 2 Foo().x # 2
4philh8moI believed both of these lies, though if I'd come to rely on them at all I might have questioned them. But I assumed your friend was in the clique.
The Darwin Game

In what order do programs get disqualified? For example, if I submit a program with an infinite loop, every other program using simulation will also go into infinite loop when meeting with my program as detecting infinite loops generally isn't theoretically feasible. Is my program disqualified before the others? What is the general principle? 

EDIT: An unrelated question: Do round numbers start from 0 or 1? In the post you write "Unlike Zvi's original game, you do get to know what round it is. Rounds are indexed starting at 0.", but also: "Your class must have an __init__(self, round=1) [..]". Why not have the default initializer also use 0 if the round numbers start from zero?

3lsusr8moFirst a run your program against one or more simple programs without any simulations. If your program hits an infinite loop there then you will be disqualified before you get to infect any other programs. In this way you can be disqualified before the others. If your program passes the simple tests then it will join the pool and against all other programs which have passed the initial test. All programs which fail to terminate at this stage will be removed simultaneously. Thank you for the bug report. I have corrected __init__(self, round=1) [..] to __init__(self, round=0) [..]
The Darwin Game

You should also check whether 'exec' is in the program code string, because someone could call getopponentsource with exec and caesar-encryption, otherwise you will be DQ'd if someone submits a program like that. (However, rebinding getopponentsource is probably more elegant than this type of static analysis.)

4Austin Chen9moJust found the New Orleans study: https://www.medrxiv.org/content/10.1101/2020.04.24.20075838v1 [https://www.medrxiv.org/content/10.1101/2020.04.24.20075838v1] , I believe. This was posted early on (April 28th) with a very small sample size (n=20), so I'm discounting this rather heavily now.