I thought about Agency Q4 (counterargument to Pearl) recently, but couldn't come up with anything convincing. Does anyone have a strong view/argument here?

21y

Just a quick logistical thing: do you have any better source of Pearl making
that argument? The current quanta magazine link isn't totally satisfactory, but
I'm having trouble replacing it.

41y

I don't see any claim that it's impossible for neural nets to handle causality.
Pearl's complaining about AI researchers being uninterested in that goal.
I suspect that neural nets are better than any other approach at handling the
hard parts of causal modeling: distinguishing plausible causal pathways from
ridiculous ones.
Neural nets currently look poor at causal modeling for roughly the same reason
that High Modernist approaches weren't willing to touch causal claims: without a
world model that's comprehensive enough to approximate common sense, causal
modeling won't come close to human-level performance.
A participant in Moderna's vaccine trial was struck by lightning
[https://www.ladbible.com/news/news-moderna-volunteer-struck-by-lightning-month-after-receiving-vaccine-20201219].
How much evidence is that for our concern that the vaccine is risky?
If I try to follow the High Modernist approach, I think it says something like
we should either be uncertain enough to avoid any conclusion, or we should treat
the lightning strike as evidence of vaccine risks.
As far as I can tell, AI approaches other than neural nets perform like
scientists who blindly follow a High Modernist approach (assuming the
programmers didn't think to encode common sense about whether vaccines affect
behavior in a lightning-strike-seeking way).
Whereas GPT-3 has some hints about human beliefs that make it likely to guess a
little bit better than the High Modernist.
GPT-3 wasn't designed to be good at causality. It's somewhat close to being a
passive observer. If I were designing a neural net to handle causality, I'd give
it an ability to influence an environment that resembles what an infant has.
If there are any systems today that are good at handling causality, I'd guess
they're robocar systems. What I've read about those suggests they're limited by
the difficulty of common sense, not causality.
I expect that when causal modeling becomes an important aspect of what AI needs
for fu

I like the idea a lot.

However, I really need simple systems in my work routine. Things like "hitting a stopwatch, dividing by three, and carrying over previous rest time" already feels like it's a lot. Even though it's just a few seconds, I prefer if these systems take as little energy as possible to maintain.

What I thought was using a simple shell script: Just start it at the beginning of work, and hit a random key whenever I switch from work to rest or vice versa. It automatically keeps track of my break times.

I don't have Linux at home, but what I tried...

31y

Great, thanks for this. Indeed, I was thinking the whole thing could be handled
neatly by an app, or Alexa skill.

Thanks, I finally got it. What I just now fully understood is that the final inequality holds with high probability (i.e., as you say, is the data), while the learning bound or loss reduction is given for .

Thanks, I was wondering what people referred to when mentioning PAC-Bayes bounds. I am still a bit confused. Could you explain how and depend on (if they do) and how to interpret the final inequality in this light? Particularly I am wondering because the bound seems to be best when . Minor comment: I think ?

12y

The term π is meant to be a posterior distribution after seeing data. If you
have a good prior you could take π=π0. However, note L(π) could be high. You
want trade-off between the cost of updating the prior and the loss reduction.
Example, say we have a neural network. Then our prior would be the
initialization and the posterior would be the distribution of outputs from SGD.
(Btw thanks for the correction)

The main thing that caught my attention was that random variables are often assumed to be independent. I am not sure if it is already included, but if one wants to allow for adding, multiplying, taking mixtures etc of random variables that are not independent, one way to do it is via copulas. For sampling based methods, working with copulas is a way of incorporating a moderate variety of possible dependence structures with little additional computational cost.

The basic idea is to take a given dependence structure of some tractable multivariate random...

33y

Thanks for the suggestion.
My background is more in engineering than probability, so have been educating
myself on probability and probability related software for this. I've looked
into copulas a small amount but wasn't sure how tractable they would be. I'll
investigate further.

Another intuition I often found useful: KL-divergence behaves more like the square of a metric than a metric.

The clearest indicator of this is that KL-divergence satisfies a kind of Pythagorean theorem established in a paper by Csiszár (1975), see https://www.jstor.org/stable/2959270#metadata_info_tab_contents . The intuition is exactly the same as for the euclidean case: If we project a point A onto a convex set S (say the projection is B), and if C is another point in the set S, then the standard Pythagorean theorem would tell us that the angle of the tr... (read more)