To me, the fact that I have been told to assume that I believe the Predictor seems extremely relevant. If we assume that I am able to believe that, then it would likely be the single most important fact that I had ever observed, and to say that it would cause a *significant* update on my beliefs regarding causality would be an understatement. On the basis that I would have strong reason to believe that causality could flow backwards, I would likely choose the one box.

If you tell me that somehow, I still also believe that causality always flows forward with r...

010y

The standard formulation to sidestep that is that the Predictor treats choosing
a mixed strategy as two-boxing.

Just to clarify, the soylent we're talking about here is not the original recipe. It is a more frugal version made from soybeans, rice and oil.

The whole point of dimensional analysis as a method of error checking is that fudging the units *doesn't* work. If you have to use an arbitrary constant with no justification besides "making the units check out", then that is a very bad sign.

If I say "you can measure speed by dividing force by area", and you point out that that gives you a unit of pressure rather than speed, then I can't just accuse you of nitpicking and say "well obviously you have to multiply by a constant of 1 m²s/kg". You wouldn't have to tell me why that operation isn't allowed. I would have to explain why it's justified.

Yes, sort of, but a) a linear classifier is not a Turing-complete model of computation, and b) there is a clear resemblance that can be seen by merely glancing at the equations.

It's interesting to me that the proper linear model example is essentially a stripped down version of a very simple neural network with a linear activation function.

012y

I would argue that neurons, neural nets, SPRs, and everyone else doing linear
regression use those techniques because it's the simplest way to aggregate data.

112y

Is that really true? Couldn't one say that of just about any Turing-complete (or
less) model of computation?
'Oh, it's interesting that they are really just a simple unary fixed-length
lambda-calculus function with constant-value parameters.'
'Oh, it's interesting that they are really just restricted petri-nets with
bounded branching factors.'
'Oh, it's interesting that these are modelable by finite automata.'
etc. (Plausible-sounding gobbledygook included to make the point.)

I salute your ability to troll all of these groups in a post about what kind of groups are easy to troll. I *almost* started to argue on some of these points before I saw your game.

312y

I don't troll. I never have. I will attempt to reasonably argue a point of view,
but (as an example) I will not go to www.ar15.com and start a post containing my
(slightly negative) opinion of it, nor discussing my preferences to that device
(Sig 55x, AK etc.).
However I have been on the intarwebs since 1993 and have engaged in some fairly
vigorous debates. I have also noticed who responds and why. This is not building
interplanetary transport devices.
I will admit I should have put an etc. on the end of points 2 and 3.
In retrospect I should have put Linux Users in there as well.

012y

He almost got me, too.

Surely you aren't implying that a desire to prolong one's lifespan can only be motivated by fear.

I think it was on This American Life that I heard the guy's story. They even contacted a physicist to look at his "theory", who tried to explain to him that the units didn't work out. The guy's response was "OK, but besides that …"

He really seemed to think that this was just a minor nitpick that scientists were using as an excuse to dismiss him.

111y

Why isn't it a minor nitpick? I mean, we use dimensioned constants in other
areas; why, in principle, couldn't the equation be E=mc (1 m/s)? If that was the
only objection, and the theory made better predictions (which, obviously, it
didn't, but bear with me), then I don't see any reason not to adopt it. Given
that, I'm not sure why it should be a significant* objection.
Edited to add: Although I suppose that would privilege the meter and second
(actually, the ratio between them) in a universal law, which would be very
surprising. Just saying that there are trivial ways you can make the units check
out, without tossing out the theory. Likewise, of course, the fact that the
units do check out shouldn't be taken too strongly [http://xkcd.com/687] in a
theory's favor. Not that anyone here hadn't seen the XKCD, but I still need to
link it, lest I lose my nerd license.

This raises a good point, but there are circumstances where the "someone would have noticed" argument is useful. Specifically, if the hypothesis is readily testable, if the consequences, if true, would be difficult to ignore, and if the hypothesis is, in fact, regularly tested by many of the same people who have told you that the hypothesis is false, then "somebody would have noticed" is reasonable evidence.

For example, "there is no God who reliably answers prayers" is a testable hypothesis, but it is easy for the religious to...

713y

That guy needed to be taught basic dimensional analysis, apparently. E=mc has
units of kg-m/s, which is the unit of momentum, not energy.

LW is pretty much the only site I visit where I feel significantly intimidated about commenting. I've left a couple of comments, but I seem to be more self-conscious about exposing my ignorance here than I am elsewhere – probably because I know that the chances of such ignorance being noticed are higher. It occurs to me that this is completely backwards and ridiculous, but there you have it.

Consider not the abstract situation of B = dA/dt, but the concrete example of the signal generator. It would be a perverse reading of the word "cause" to say that the voltage does not cause the current. You can make the current be anything you like by suitably manipulating the voltage.

But you can make a similar statement for just about *any* situation where B = dA/dt, so I think it's useful to talk about the abstract case.

For example, you can make a car's velocity anything you like by suitably manipulating its position. Would you then say that t...

314y

I think intervention is the key idea missing from the above discussion of which
of the the derivative function and the integrated function is the cause and
which is the effect. In the signal generator example, voltage is a cause of
current because we can intervene directly on the voltage. In the car example,
acceleration is a cause of velocity because we can intervene directly on
acceleration. This is not too helpful on its own, but maybe it will point the
discussion in a useful direction.

That is what A and B are: a randomly wandering variable A and its rate of change B.

Maybe I'm not quite understanding, but it seems to me that your argument relies on a rather broad definition of "causality". B may be dependent on A, but to say that A "causes" B seems to ignore some important connotations of the concept.

I think what bugs me about it is that "causality" implies a directness of the dependency between the two events. At first glance, this example *seems* like a direct relationship. But I would argue that B is not...

514y

Very true. Once again, I'm going to have to recommend in the context of a
Richard Kennaway post, the use of more precise concepts. Instead of
"correlation", we should be talking about "mutual information", and it would be
helpful if we used Judea Pearl's definition of causality.
Mutual information between two variables means (among many equivalent
definitions) how much you learn about one variable by learning the other.
Statistical correlation is one way that there can be mutual information between
two variables, but not the only way.
So, like what JGWeissman said, there can be mutual information between the two
series even in the absence of a statistical correlation that directly compares
time t in one to time t in the other. For example, there is mutual information
between sin(t) and cos(t), even though d(sin(t))/dt = cos(t), and even though
they're simultaneously uncorrelated (i.e. uncorrelated when comparing time t to
time t). The reason there is mutual information is that if you know sin(t), a
simple time-shift tells you cos(t).
As for causation, the Pearl definition
[http://lesswrong.com/lw/qr/timeless_causality/] is (and my apologies I may not
get this right) that:
"A causes B iff, after learning A, nothing else at the time of A or B gives you
information about B. (and A is the minimal such set for which this is true)"
In other words, A causes B iff A is the minimal set for which B is conditionally
independent given A.
So, anyone want to rephrase Kennaway's post with those definitions?

014y

Consider not the abstract situation of B = dA/dt, but the concrete example of
the signal generator. It would be a perverse reading of the word "cause" to say
that the voltage does not cause the current. You can make the current be
anything you like by suitably manipulating the voltage.
But let this not degenerate into an argument about the "real" meaning of
"cause". Consider instead what is being said about the systems studied by the
authors referenced in the post.
Lacerda, Spirtes, et al. [http://www.optimizelife.com/cyclic-discovery.pdf] do
not use your usage. They talk about time series equations in which the current
state of each variable depends on the previous states of some variables, but
still they draw causal graphs which do not have a node for every time instant of
every variable, but a node for every variable. When x(i+1) = b y(i) + c z(i),
they talk about y and z causing x.
The reason that none of their theorems apply to the system B = dA/dt is that
when I discretise time and put this in the form of a difference equation, it
violates the precondition they state in section 1.2.2. This will be true of the
discretisation of any system of ordinary differential equations. It appears to
me that that is a rather significant limitation of their approach to causal
analysis.

414y

This is the right idea. For small epsilon, B(t) should have a weak negative
correlation with A(t - epsilon), a weak positive correlation with A(t +
epsilon). and a strong positive correlation with the difference A(t + epsilon) -
A(t - epsilon).
The function A causes the function B, but the value of A at time t does not
cause the value of B at time t. Therefore the lack of correlation between A(t)
and B(t) does not contradict causation implying correlation.

114y

I don't think you are arguing in a circle. B is caused by current and previous
As. Obviously we're not going to see a correlation unless we control for the
previous state of A. Properly controlled the relationship between the two
variables will be one-to-one, won't it?

My initial reaction is to find that aggravating and to try to come up with another experiment that would allow me to poke at the universe by exploiting the Predictor, but it seems likely that this too would be sidestepped using the same tactic. So we could generalize to say that

anyexperiment you come up with that involves the Predictor and gives evidence regarding the temporal direction of causation will be sidestepped so as to give you no new information.But intuitively, it seems like this condition itself gives new information in the paradox, yet I hav... (read more)