Wiki Contributions


I don't understand the new unacceptability penalty footnote. In both of the terms, there is no conditional sign. I presume the comma is wrong?

They're unconditional, not conditional probabilities. The comma is just for the exists quantifier.

Also, for me \mathbb{B} for {True, False} was not standard, I think it should be defined.



From my perspective, training stories are focused pretty heavily on the idea that justification is going to come from a style more like heavily precedented black boxes than like cognitive interpretability

I definitely don't think this—in fact, I tend to think that cognitive interpretability is probably the only way we can plausibly get high levels of confidence in the safety of a training process. From “How do we become confident in the safety of a machine learning system?”:

Nevertheless, I think that transparency-and-interpretability-based training rationales are some of the most exciting, as unlike inductive bias analysis, they actually provide feedback during training, potentially letting us see problems as they arise rather than having to get everything right in advance.

See also: “A transparency and interpretability tech tree

In the video, the human wins precisely because they exploit this fact about the AI.

Biggest thing that stood out to me watching this was that while the AI's tactics seemed quite good, its game theory seemed quite poor—e.g. it wasn't sufficiently vindictive if you betrayed it, which made it vulnerable to exploitation by a human aware of that fact.

Sort of a side note, but one takeaway I've had from the whole FTX fiasco—particularly given SBF's comments here—is that being really careful about teaching and understanding Kelly betting is more important than I would have thought.

The idea is that we're thinking of pseudo-inputs as “predicates that constrain X” here, so, for , we have .

  • The path-independent case is probably overall easier to analyze—we understand things like speed and simplicity pretty well, and in the path-independent case we don't have to keep track of complex sequencing effects.
  • I think my path-independent analysis is more robust, for the same reasons as in the first point.
  • Presumably that assumption is path-dependence? I'm not sure what you mean.

I am not being asked to do something because it is moral. I am being asked to do something because it is signaling. Evan is primarily telling me I'm obligated to do PR-control for EA, but that is something I do not actually care that much about and do not believe I am obligated to do, and that's why I strong-downvoted the post.

Seems like a pretty blatant misrepresentation of what I wrote. In justifying why I think you have an obligation to condemn fraud in the service of effective altruism, I say:

Assuming FTX's business was in fact fraudulent, I think that we—as people who unknowingly benefitted from it and whose work for the world was potentially used to whitewash it—have an obligation to condemn it in no uncertain terms.

That's pretty clearly a moral argument and not about PR at all.

The Twitter post is literally just title + link. I don't like Twitter, and don't want to engage on it, but I figured posting this more publicly would be helpful, so I did the minimum thing to try to direct people to this post.

From my perspective, I find it pretty difficult to be criticized for a “feeling” that you get from my post that seems to me to be totally disconnected from anything that I actually said.

I agree that the title does directly assert a claim without attribution, and that it could be misinterpreted as a claim about what all EAs think should be done rather than just what I think should be done. It's a bit tricky because I want the title to be very clear, but am quite limited in the words I have available there.

I think the latter quote is pretty disingenuous—if you quote the rest of that sentence, the beginning is “I think the best course of action is”, which makes it very clear that this is a claim about what I personally believe people should do:

Right now, I think the best course of action is for us—and I mean all of us, anyone who has any sort of a public platform—to make clear that we don't support fraud in the service of effective altruism.

To be clear, “in the service of effective altruism” there is meant to refer to fraud done for the purpose of advancing effective altruism, not that we have an obligation to not support fraud and that obligation is in the service of effective altruism.

Edit: To make that last point more clear, I chainged “to make clear that we don't support fraud in the service of effective altruism” to “to make clear that we don't support fraud done in the service of effective altruism”.

Load More