All of IlyaShpitser's Comments + Replies

It's important to internalize that the intellectual world lives in the attention economy, like eveything else.

Just like "content creators" on social platforms think hard about capturing and keeping attention, so do intellectuals and academics.  Clarity and rigor is a part of that.

No one has time, energy, (or crayons, as the saying goes) for half-baked ramblings on a blog or forum somewhere.

If you think you can beat the American __ Association over a long run average, that's great news for you!  That means free money!

Being right is super valuable, and you should monetize it immediately.


Anything else is just hot air.

The entire argument of Inadequate Equilibria is that there are ways in which systems can be inadequate without being exploitable by relevant actors. Adequacy is a stronger condition than inexploitability, because all the latter requires is that all of the available free energy in the system has been consumed for any reason at all, whereas the former requires also that the energy consumed specifically translates to results in the real world.

In other words: no, it's not always the case that being able to see things a large organization is doing wrong means "free money". That line of thought is naive to the difference between adequacy and inexploitability.

It is a useful exercise to try to map any complaint that something sucks, or that "these people who are in charge are wrong", into a business plan for making a profit by doing something better, and see what obstacles you hit.  However, sometimes the obstacle is "The law forbids you from doing something better". For example, my contention is that the requirements for becoming a doctor in America are too high; that one could train slightly less intelligent people to do doctor-work, and certainly drop the requirement for a bachelor's degree before going to medical school, and still get doctors who are more than good enough.  (I'm not 100% sure that this requirement ultimately comes from the American Medical Association, but let's say 75% sure.)  How would I monetize it?

Lots of Bayes fans, but can't seem to define what Bayes is.

Since Bayes theorem is a reformulation of the chain rule, anything that is probabilistic "uses Bayes theorem" somewhere, including all frequentist methods.

Frequentists quantify uncertainty also, via confidence sets, and other ways.

Continuous updating has to do with "online learning algorithms," not Bayes.


Bayes is when the target of inference is a posterior distribution.  Bonus Bayes points: you don't care about frequentist properties like consistency of the estimator.

Also, people on Less Wrong are normally interested in a different kind of Bayesianism: Bayesian epistemology. The type philosophers talk about. But Bayesian ML is based on the other type of Bayesianism: Bayesian statistics. The two have little to do with each other.

Does your argument fail for

If so, can you explain why?  If not, it seems your argument is no good, as a good proof of this (weaker) claim exists.

Not that you asked my advice, but I would stay away from number theory unless you get a lot of training.

For the benefit of other readers: this post is confused.

Specifically on this (although possibly also on other stuff): (a) causal and statistical DAGs are fundamentally not the same kind of object, and (b) no practical decision theory used by anyone includes the agent inside the DAG in the way this post describes.


"So if the EDT agent can find a causal structure that reflects their (statistical) beliefs about the world, then they will end up making the same decision as a CDT agent who believes in the same causal structure."

A -> B -> C and A <- B <- C reflect the same statistical beliefs about the world.

I can't tell if this is a terminological or substantive disagreement (it sounds terminological, but I don't think I yet understand it). Could you say something about the difference and how it is relevant to this post? Like, which claim made in the post is this contradicting? Is this an objection to "If a system is well-described by a causal diagram, then it satisfies a complex set of statistical relationships"? Or maybe "To an evidential decision theorist, these kinds of statistical relationships are the whole story about causality, or at least about its relevance to decisions."? What is EDT if you don't include the agent inside the model of the world? Doesn't almost all philosophical discussion of EDT vs CDT involve inferences about the process generating the decision, and hence presume that we have beliefs about this process? Are you saying that "practical" communities use this language in a different way from the philosophical community? Or that "beliefs about the process generating decisions" aren't captured in the DAG? That's true but I don't understand its relevance. I think this is probably related to the prior point about the agent including itself in the causal diagram. (Since e.g.  decision --> A --> B --> C and decision --> A <-- B <-- C correspond to very different beliefs about the world.)
I disagree, it doesn't look confused to me. The post explicitly discusses the different views of causality. That seems in line with what the post describes: "and so it will be (much) too large for me to reason about explicitly".

If you think it's a hard bet to win, you are saying you agree that nothing bad will happen.  So why worry?

I meant it's a hard bet to win because how exactly would I collect. That said, I'm genuinely not sure if it's a good field for betting. Roughly speaking, there's two sorts of bets: "put your money where your mouth is" bets and "hedging" bets. The former are "for fun" and signaling/commitment purposes; the latter are where the actual benefit comes in. But with both bets, it's difficult to figure out a bet structure that works if the market gets destroyed in the near future! We could bet on confidence, but I'm genuinely not sure if there'll be one or two "big papers" before the end shifting probabilities. So the model of the world might be one where we see nothing for years and then all die. Hard for markets to model. Doing a "money now/money later" bet structure works, I guess, like the other commenter said, but I don't know of any prediction markets that are set up for that.

Wanna bet some money that nothing bad will come of any of this on the timescales you are worried about?

7Evan R. Murphy1y
There's a new post today specifying such bets [].
That seems like a hard bet to win. I suggest instead offering to bet on "you will end up less worried" vs "I will end up more worried", though that may not work.
5Not Relevant1y
To clarify, the above argument isn't a slam dunk in favor of "something bad will happen because of AGI in the next 7 years". I could make that argument, but I haven't, and so I wouldn't necessarily hope that you would believe it. But what about the 10 years after that, if we still have no global coordination? What's the mechanism by which nobody ever does anything dumb, without most people being aware of safety considerations and no restriction whatsoever on who gets to do what.

Some reading on this:


From my experience it pays to learn how to think about causal inference like Pearl (graphs, structural equations), and also how to think about causal inference like Rubin (random variables, missing data).  Some insights only arise from a synthesis of those two views.

Pearl is a giant in the field, but it is worth remembering that he's unusual in another way (compared to a typical ... (read more)

Classical RL isn't causal, because there's no confounding (although I think it is very useful to think about classical RL causally, for doing inference more efficiently).

Various extensions of classical RL are causal, of course.

A lot of interesting algorithmic fairness isn't really causal.  Classical prediction problems aren't causal.

However, I think domain adaptation, covariate shift, semi-supervised learning are all causal problems.


I think predicting things you have no data on ("what if the AI does something we didn't foresee") is sort of an impossible problem via tools in "data science."  You have no data!

I think value learning might be causal because human preferences cannot be observed, and therefore can act as a confounder, similar to the work in Zhang, J., Kumor, D., Bareinboim, E. Causal Imitation Learning with Unobserved Confounders. In Advances in Neural Information Processing Systems 2020. At least that was one of my motivations. Sure, I agree. I think I was quite inaccurate. I am referring to transportability analysis, to be more specific. This approach should help in new situations where we have not directly trained our system, and in which our preferences could change.

A few comments:

(a) I think "causal representation learning" is too vague, this overview ( talks about a lot of different problems I would consider fairly unrelated under this same heading.

(b) I would try to read "classical causal inference" stuff.  There is a lot of reinventing of the wheel (often, badly) happening in the causal ML space.

(c) What makes a thing "causal" is a distinction between a "larger" distribution we are interested in, and a "smaller" distribution we have data on.  Lots of problems might look "causal" but really aren't (in an interesting way) if formalized properly.

Please tell Victor I said hi, if you get a chance :).

Hi Ilya! Thanks a lot for commenting :) Yes, you're right. I had found this, and other reviews by similar authors. In this one, I was mostly thinking of section VI (Learning causal variables) and its applications to RL (section VII-E). Perhaps section V on causal discovery is also relevant. Probably there is, I have to get to speed on quite a few things if I get the grant. But thanks for the nudge! I think this makes sense. But part of the issue here is that AI will probably change things that we have not foreseen, so it could be good to take this point of view, in my opinion. Do you have `interesting' examples of not-causal problems? Thanks again!

I gave a talk at FHI ages ago on how to use causal graphs to solve Newcomb type problems.  It wasn't even an original idea: Spohn had something similar in 2012.

I don't think any of this stuff is interesting, or relevant for AI safety.  There's a pretty big literature on model robustness and algorithmic fairness that uses causal ideas.

If you want to worry about the end of the world, we have climate change, pandemics, and the rise of fascism.

Why did you give a talk on causal graphs if you didn't think this kind of work was interesting or relevant? Maybe I'm misunderstanding what you're saying isn't interesting or relevant.

Counterfactuals (in the potential outcome sense used in statistics) and Pearl's structural equation causality semantics are equivalent.

What are your thoughts on Newcomb's, ect?

Could you do readers an enormous favor and put references in when you say stuff like this:

"Vitamin D and Zinc, and if possible Fluvoxamine, are worth it if you get infected, also Vitamin D is worth taking now anyway (I take 5k IUs/day)."

In case it helps, here [] is a brief discussion of this topic.

Ilya, I respect your expertise in causal modeling, and I appreciate when you make contributions to the site sharing things you've learned and helping others see the parts of the world you understand, like this and this and this. In contrast your last 5 comments on the site have net negative karma scores, getting into politics and substance-less snark on the community. A number of your comments are just short and snarky and political (example, example) or gossiping about the site (many of your comments are just complaining about Eliezer and rationalists).

I'... (read more)

This comment raises to mind an interesting question, which is: to what lengths does a commenter have to go, to what extent do they have to make it clear that they are not interested in the least to contributing to productive discussion (and moreover very interested in detracting from it), before the moderation team of LW decides to take coordinated action? I ask, not as a thinly veiled attempt to suggest that Ilya be banned (though I will opine that, were he to be banned, he would not much be missed), but because his commenting pattern is the most obvious example I can think of in recent memory of something that is clearly against, not just the stated norms of LW, but the norms of any forum interested in anything like collaborative truthseeking. It is an invitation to turn the comments section into something like a factionalized battleground, something more closely resembling the current iteration of Reddit than any vision anyone might have of something better. The fact that these invitations have so far been ignored does not obviate the fact that that is clearly and obviously what they are. So I think this is an excellent opportunity to inquire into LW moderation policy. If such things as Ilya's "contributions" to this thread are not considered worthy of moderator action, what factors might actually be sufficient to prompt such action? (This is not a rhetorical question.)

I'm happy to disengage, then. But for the record, I don't actually know what you mean by formalism in this context; you seem to think that it's a dreadful thing to accuse someone of, but the OP doesn't use the word at all, you never define it, and the Wikipedia page is irrelevant enough to the post that I'm pretty sure you must mean something else.

(I wonder why you say "Moldbug" rather than "Yarvin" but "Siskind" rather than "Alexander" or "Scott".)

If your reading of anything Scott's written is that he favours anything like neoreaction, then it's a very different reading from mine. My reading is that he thinks neoreaction is mostly garbage but with occasional valuable insights. His actual words in what I suspect is the same leaked email as you're talking about: "Neoreactionaries provide a vast stream of garbage with occasional nuggets of absolute gold in them." My mental model of Scott is not excite... (read more)

Downvoted for ad hominem. Having drawn inspiration from an author you don't like is not an argument against anything. Saying you don't like some authors without making reference to any specific positions those authors have is an invitation to contentless flamewar.
I would appreciate if you could expand a bit, or sober up, or whatever.

My response is we have fancy computers and lots of storage -- there's no need to do psychometric models of the brain with one parameter anymore, we can leave that to the poor folks in the early 1900s.

How many parameters does a good model of the game of Go have, again?  The human brain is a lot more complicated, still.

There are lots of ways to show single parameter models are silly, for example discussions of whether Trump is "stupid" or not that keep going around in circles.

This seems to be an argument for including more variables than just g (which most psychometric models IME already do btw), but it doesn't seem to support your original claim that g doesn't exist at all. (Also, g isn't a model of the brain.)

"Well, suppose that factor analysis was a perfect model. Would that mean that we're all born with some single number g that determines how good we are at thinking?"

"Determines" is a causal word.  Factor analysis will not determine causality for you.

I agree with your conclusion, though, g is not a real thing that exists.

What would your response be to my defense of g here []? (As far as I can tell, there are only three problems with the study I linked: 1. due to population structure, the true causal effects of the genes in question will be misestimated (this can be fixed with within-family studies, as was done with a similar study on Externalizing tendencies [], 2. the study might lack the power to detect subtle differences between the genes in their specific degrees of influences on abilities, which if detected might 'break apart' g into multiple distinct factors, 3. the population variance in g may be overestimated when fit based on phenotypic rather than causally identified models. Of these, I think issue 2 is unlikely to be of practical importance even if it is real, while issue 1 is probably real but will gradually get fixed, and issue 3 is concerning and lacks a clear solution. But your "g is not a real thing that exists" sounds like you are more pessimistic about this than I am.)
(You seem to have put your comments in the quote-block as well as the thing actually being quoted.) Since immediately after the bit you quote OP said: it doesn't seem to me necessary to inform them that "determines" implies causation or that factor analysis doesn't identify what causes what. (Entirely unfairly, I'm amused by the fact that you write '"Determines" is a causal word' and then in the very next sentence use the word "determine" in a non-causal way. Unfairly because all that's happening is that "determine" means multiple things, and OP's usage does indeed seem to have been causal. But it may be worth noting that if the model were perfect, then indeed g would "determine how good we are at thinking" in the same sense as that in which factor analysis doesn't "determine causality for you" but one might have imagined it doing so.)
👍 Could you please elaborate on how it relates to the Bayesian interpretation of test-set performance?

Should be doing stuff like this, if you want to understand effects of masks: (this really is preliminary, e.g. they have not yet uploaded a newer version that incorporates peer review suggestions).


Can't do stuff in the second paper without worrying about stuff in the first (unless your model is very simple).

Pretty interesting.

Since you are interested in policies that operate along some paths only, you might find these of interest:

We have some recent stuff on generalizing MDPs to have a causal model inside every state ('path dependent structural equation models', to appear in UAI this year).

Thanks Ilya for those links, in particular the second one looks quite relevant to something we’ve been working on in a rather different context (that's the benefit of speaking the same language!) We would also be curious to see a draft of the MDP-generalization once you have something ready to share!

3: No, that will never work with DL by itself (e.g. as fancy regressions).

4: No, that will never work with DL by itself (e.g. as fancy regressions).

5: I don't understand this question, but people already use DL for RL, so the "support" part is already true.  If the question is asking whether DL can substitute for doing interventions, then the answer is a very qualified "yes," but the secret sauce isn't DL, it's other things (e.g. causal inference) that use DL as a subroutine.


The problem is, most folks who aren't doing data science for a living them... (read more)

If there is, I don’t know it.

There's a ton of work on general sensitivity analysis in the semi-parametric stats literature.

If there is really both reverse causation and regular causation between Xr and Y, you have a cycle, and you have to explain what the semantics of that cycle are (not a deal breaker, but not so simple to do.  For example if you think the cycle really represents mutual causation over time, what you really should do is unroll your causal diagram so it's a DAG over time, and redo the problem there).

You might be interested in this paper ( that splits the outcome rather than the treatment (although I don't really endorse... (read more)

I agree, but I think this is much more dependent on the actual problem that one is trying to solve. There's tons of assumptions and technical details that different approaches use, but I'm trying to sketch out some overview that abstracts over these and gets at the heart of the matter. (There might also be cases where there is believed to be a unidirectional causal relationship, but the direction isn't know.) Indeed that is the big difficulty. Considering how often people use these methods in social science, it seems like there is some general belief that one can have Xc be unconfounded with Y, but this is rarely proven and seems often barely even justified. It seems to me that the general approach is to appeal to parsimony and assume that if you can't think of any major confounders, then they probably don't exist. This obviously doesn't work well. I think people find it hard to get an intuition for how poorly it works, and I personally found that it made much more sense to me when I framed it in terms of the "Know your Xc!" point; the goal shouldn't be to think of possible confounders, but instead to think of possible nonconfounded variance. I also have an additional blog post in the works arguing that parsimony is empirically testable and usually wrong, but it will be some time before I post this.
I don't think that there's evidence that it's for a good reason. The Kefauver-Harris Drug Amendments of 1962 coincide with a drop in the rate of life-span increase.  That's a decade before all a decade because most of the Great Stagnation metrics got problems. 
I just read Christian's post and I don't see what 'comes close to health fraud.' Please explain?

Please name those reasons instead of vaguely alluding to them. The above post and many other writings on LessWrong address some of the most commonly made arguments for why there is indeed not a particularly good reason for the incredibly slow process of the FDA trial system. I would be glad to hear more arguments, but am not very excited about vague allusions without any substance.

"That’s the test. Would you put it in your arm rather than do nothing? And if the answer here is no, then, please, show your work."

Seems to be an odd position to take to shift the burden of proof onto the vaccine taker rather than than the scientist.


I think a lot of people, you included, are way overconfident on how transmissible B.1.1.7. is.

Do you mean with overconfident that it's likely more transmissible or likely less transmissable?

90% of the work ought to go into figuring out what fairness measure you want and why.  Not so easy.  Also not really a "math problem."  Most ML papers on fairness just solve math problems.

A whole paper, huh.


I am contesting the whole Extremely Online Lesswrong Way<tm> of engaging with the world whereby people post a lot and pontificate, rather than spending all day reading actual literature, or doing actual work.

which literature do you recommend?

"Unless you’d put someone vulnerable at risk, why are you letting another day of your life go by not living it to its fullest? "

As soon as you start advocating behavior changes based on associational evidence you leave the path of wisdom.


You sure seem to have a lot of opinions about statisticians being conservative about making claims without bothering to read up on the relevant history and why this conservativism might have developed in the field.

Zvi has repeatedly cited a paper arguing that the FDA kills more people by preventing effective treatments than it saves by preventing bad treatments. Not having followed that link personally, the results suggest that pub health statisticians are miscalibrated in expectation. One explanation for why pub health statisticians are miscalibrated in a causing-death-by-inaction direction is that they are punished for deaths caused by action but not deaths caused by inaction []. I’m this model, conservatism (aka miscalibration toward inaction) is a result of the lived and publicized experiences of people in the field. This seems a great explanation for both the experts norms and the paper results. Are you contesting that statisticians are miscalibrated in expectation in the utility they cause? I think the “Bailey to your Motte” is that people are bad at predicting who they might infect, so this advice could lead to greater deaths. I think Zvi could have phrased it more carefully. But the broader point needed emphasis, that we are loosing so much for something we could fix. And that fix might not be so hard. That point is more important than quibbling one darn sentence.

You can read Halpern's stuff if you want an axiomatization of something like the responses to the do-operator.

Or you can try to understand the relationship of do() and counterfactual random variables, and try to formulate causality as a missing data problem (whereby a full data distribution on counterfactuals and an observed data distribution on factuals are related via a coarsening process).

Ah, thanks; that looks pretty relevant. I'll try to read it in the next day or so.

How is this different from just a regular imperative programming language with imperative assignment?

Causal models are just programs (with random inputs, and certain other restrictions if you want to be able to represent them as DAGs). The do() operator is just imperative assignment.

It's mostly the same as a regular imperative programming language - indeed, that's largely the point of the post. The do() operator isn't quite just imperative assignment, though; it has the wrong type-signature for that. It's more like an operator which creates a subclass on-the-fly, by overriding the getters for a few fields.

Here are directions:

I think the sorts of people I want to see this blog website will know what to do with the information on it. <- Spread this to your biomedical engineering friends, or any hobbyist who can build things. We need to ramp up ventilator capacity, now. Even if they are 80% as good as a high tech one, but cheap to make, they will save lives.

There's a long history of designing and making devices like these for the Third World places that need them. We will need these soon, here and everywhere.

We want to include more things exactly like this in the DB, thanks for posting. Is there a particular on ramp page we can link to? I worry if people go to the front page they will bounce off.

Some references to lesswrong, and value alignment there.

David Abel has provided a fairly nice summary set of notes: []

anyone going to the AAAI ethics/safety conf?

Some references to lesswrong, and value alignment there.

One of my favorite examples of a smart person being confused about something is ET Jaynes being confused about Bell inequalities.

Smart people are confused all the time, even (perhaps especially) in their area.

Critical Rationalists think that E. T. Jaynes is confused about a lot of things. There has been discussion about this on the Fallible Ideas list.

You are really confused about statistics and learning, and possibly also about formal languages in theoretical CS. I neither want nor have time to get into this with you, just wanted to point this out for your potential benefit.

I am summarizing a view shared by other Critical Rationalists, including Deutsch. Do you think they are confused too?
Thank you!

Dear Christian, please don't pull rank on my behalf. I don't think this is productive to do, and I don't want to bring anyone else into this.

well, using philosophy i did that hard part and figured out which ones are good.

Who are you talking to? To the audience? To the fourth wall?

Surely not to me, I have no sway here.

Well, this comes back to the problem of LW Paths Forward []. curi has made himself publicly available for discussion, by anyone. Yudkowsky not so much. So what to do?

Your sockpuppet: "There is a shortage of good philosophers."

Me: "Here is a good philosophy book."

You: "That's not philosophy."

Also you: "How is Ayn Rand so right about everything."

Also you: "I don't like mainstream stuff."

Also you: "Have you heard that I exchanged some correspondence with DAVID DEUTSCH!?"

Also you: "What if you are, hypothetically, wrong? What if you are, hypothetically, wrong? What if you are, hypothetically, wrong?" x1000

Part of rationality is properly dealing with ... (read more)

curi has given an excellent response to this. I would like to add that I think Yudkowsky should reach out to curi. He shares curi's view about the state of the world and the urgency to fix things, but curi has a deeper understanding. With curi, Yudkowsky would not be the smartest person in the room and that will be valuable for his intellectual development.
I don't have a sock puppet here. I don't even know who Fallibilist is. (Clearly it's one of my fans who is familiar with some stuff I've written elsewhere. I guess you'll blame me for having this fan because you think his posts suck. But I mostly like them, and you don't want to seriously debate their merits, and neither of us thinks such a debate is the best way to proceed anyway, so whatever, let's not fight over it.) People can't be patched like computer code. They have to do ~90% of the work themselves. If they don't want to change, I can't change them. If they don't want to learn, I can't learn for them and stuff it into their head. You can't force a mind, nor do someone else's thinking for them. So I can and do try to make better educational resources to be more helpful, but unless I find someone who honestly wants to learn, it doesn't really matter. (This is implied by CR and also, independently, by Objectivism. I don't know if you'll deny it or not.) I believe you are incorrect about my lack of scale and context, and you're unfamiliar with (and ridiculing) my intellectual history. I believe you wanted to say that claim, but don't want to argue it or try to actually persuade me of it. As you can imagine, I find merely asserting it just as persuasive and helpful as the last ten times someone told me this (not persuasive, not helpful). Let me know if I'm mistaken about this. I was generally the smartest person in the room during school, but also lacked perspective and context back then. But I knew that. I used to assume there were tons of people smarter than me (and smarter than my teachers), in the larger intellectual community, somewhere. I was very disappointed to spend many years trying to find them and discovering how few there are (an experience largely shared by every thinker I admire, most of whom are unfortunately dead). My current attitude, which you find arrogant, is a change which took many years and which I heavily resisted. When I was more igno
Load More