Ege Erdil

If you have any questions for me or just want to talk, feel free to reach out by sending a private message on this site or by sending an e-mail to

You can also find me on Metaculus at, or on Discord using the tag Starfall#7651.

Wiki Contributions


I wanted to thank you for writing this comment - while I have also been reasonably active on social media about this topic, and playing level 3+ games is sometimes necessary in the real world, I don't think this post actually offers any substantive content that goes beyond "fraud is bad and FTX was involved in fraudulent activities".

I agree that it's not a good fit for LW, though I think the post does fit in the EA Forum given recent events.

In my opinion, the fact that the original tweet, along with Bankman-Fried's other tweets in the same thread, has been deleted in an attempt to shift the narrative is noteworthy in and of itself.

I think the tweet is more likely to be deleted in this way in worlds where the claim in it was made in bad faith rather than not, though perhaps unknown details about FTX's strategy for getting out of this quagmire necessitated such a move even in worlds where the claim had originally been made in good faith. Still, the optics are not good, and I have appropriately updated towards FTX having engaged in dishonest conduct.

I agree with evhub that we don't yet know if the claim was made in bad faith, but we can still make some guesses in the absence of such definitive knowledge.

Most of the people on freenode migrated over to, from what I know. Is this true of the LessWrong channel?

It's likely not an uncommon attitude among Eastern Europeans, given the relevant history. I've heard similar sentiments expressed by Czech and Polish people before.

I agree that in this context it seems nonsensical, though. Ukrainian national socialists wanting to take down communist monuments is quite overdetermined; as the monuments are not only symbols of an ideology that they strongly oppose, but also of a foreign state that was an occupying force in Ukraine in living memory (and is an occupying force right now, depending on your interpretation of the continuity between the USSR and the Russian Federation).

I think we're not communicating clearly with each other. If you have the time, I would be enthusiastic about discussing these potential disagreements in a higher frequency format (such as IRC or Discord or whatever), but if you don't have the time I don't think the format of LW comments is suitable for hashing out the kind of disagreement we're having here. I'll make an effort to respond regardless of your preferences about this, but it's rather exhausting for me and I don't expect it to be successful if done in this format.

That's in part why we had this conversation in the format that we did instead of going back-and-forth on Twitter or LW, as I expected those modes of communication would be poorly suited to discussing this particular subject. I think the comments on Katja's original post have proven me right about that.

Yeah, that matches my experience with chess engines. Thanks for the comment.

It's probably worthwhile to mention that people have trained models that are more "human-like" than AlphaGo by various parts of the training process. One improvement they made on this front is that they changed the reward function so that while almost all of the variance in reward is from whether you win or lose, how many points you win by still has a small influence on the reward you get, like so:


This is obviously tricky because it could push the model to take unreasonable risks to attempt to win by more and therefore risk a greater probability of losing the game, but the authors of the paper found that this particular utility function works well in practice. On top of this, they also took measures to widen the training distribution by forcing the model to play handicap games where one side gets a few extra moves, or forcing the model to play against weaker or stronger versions of itself, et cetera.

All of these serve to mitigate the problems that would be caused by distributional shift, and in this case I think they were moderately successful. I can confirm from having used their model myself that it indeed makes much more "human-like" recommendations, and is very useful for humans wanting to analyze their games, unlike pure replications of AlphaGo Zero such as Leela Zero.

I think you're misunderstanding the reason I'm making this argument.

The reason you in fact don't fix the discriminator and find the "maximally human like face" by optimizing the input is that this gives you unrealistic images. The technique is not used precisely because people know it would fail. That such a technique would fail is my whole point! It's meant to be an example that illustrates the danger of hooking up powerful optimizers to models that are not robust to distributional shift.

Yes, in fact GANs work well if you avoid doing this and stick to using them in distribution, but then all you have is a model that learns the distribution of inputs that you've given the model. The same is true of diffusion models: they just give you a Langevin diffusion-like way of generating samples from a probability distribution. In the end, you can simplify all of these models as taking many pairs of (image, description) and then learning distributions such as P(image), P(image | description), etc.

This method, however, has several limitations if you try to generalize it to training a superhuman intelligence:

  1. The stakes for training a superhuman intelligence are much higher, so your tolerance of making mistakes is much lower. How much do you trust Stable Diffusion to give you a good image if the description you give it is sufficiently out of distribution? It would make a "good attempt" by some definition, but it's quite unlikely that it would give you what you actually wanted. That's the whole reason there's a whole field of prompt engineering to get these models to do what you want!

    If we end up in such a situation with respect to a distribution of P(action | world state), then I think we would be in a bad position. This is exacerbated by the second point in the list.

  2. If all you're doing is learning the distribution P(action humans would take | world state), then you're fundamentally limited in the actions you can take by actions humans would actually think of. You might still be superhuman in the sense that you run much faster and with much less energy etc. than humans do, but you won't come up with plans that humans wouldn't have come up with on their own, even if they would be very good plans; and you won't be able to avoid mistakes that humans would actually make in practice. Therefore any model that is trained strictly in this way has capability limitations that it won't be able to get past.

    One way of trying to ameliorate this problem is to have a standard reinforcement learning agent which is trained with an objective that is not intended to be robust out of distribution, but then combine it with the distribution P such that P regularizes the actions taken by the model. Quantilizers are based on this idea, and unfortunately they also have problems: for instance, "build an expected utility maximizer" is an action that probably would get a relatively high score from P, because it's indeed an action that a human could try.

  3. The distributional shift that a general intelligence has to be robust to is likely much greater than the one that image generators need to be robust to. So on top of the high stakes we have in this scenario per point (1), we also have a situation where we need to get the model to behave well even with relatively big distributional shifts. I think no model that exists right now is actually as robust to distributional shift as much as we would want an AGI to be, or in fact even as much as we would want a self-driving car to be.

  4. Unlike any other problem we've tried to tackle in ML, learning the human action distribution is so complicated that it's plausible an AI trained to do this ends up with weird strategies that involve deceiving humans. In fact, any model that learns the human action distribution should also be capable of learning the distribution of actions that would look human to human evaluators instead of actually being human. If you already have the human action distribution P, you can actually take your model itself as a free parameter Q and maximize the expected score of the model Q under the distribution P: in other words, you explicitly optimize to deceive humans by learning the distribution humans would think is best rather than the distribution that is actually best.

    No such problems arise in a setting such as learning human faces, because P is much more complicated than actually solving the problem you're being asked to solve, so there's no reason for gradient descent to find the deception strategy instead of just doing what you're being asked to do.

Overall, I don't see how the fact that GANs or diffusion models can generate realistic human faces is an argument that we should be less worried about alignment failures. The only way in which this concern would show up is if your argument was "ML can't learn complicated functions", but I don't think this was ever the reason people have given for being worried about alignment failures: indeed, if ML couldn't learn complicated functions, there would be no reason for worrying about ML at all.

I agree with this, but for the reason you specified I think that precision would be of greater utility elsewhere.

Not for the purposes of Tegmark's calculation. Did you check how he uses this number?

If you think it's higher than 50% but lower than 80%, it seems like there isn't much room there to me?

Load More