All of mistercow's Comments + Replies

My initial reaction is to find that aggravating and to try to come up with another experiment that would allow me to poke at the universe by exploiting the Predictor, but it seems likely that this too would be sidestepped using the same tactic. So we could generalize to say that any experiment you come up with that involves the Predictor and gives evidence regarding the temporal direction of causation will be sidestepped so as to give you no new information.

But intuitively, it seems like this condition itself gives new information in the paradox, yet I hav... (read more)

To me, the fact that I have been told to assume that I believe the Predictor seems extremely relevant. If we assume that I am able to believe that, then it would likely be the single most important fact that I had ever observed, and to say that it would cause a significant update on my beliefs regarding causality would be an understatement. On the basis that I would have strong reason to believe that causality could flow backwards, I would likely choose the one box.

If you tell me that somehow, I still also believe that causality always flows forward with r... (read more)

The standard formulation to sidestep that is that the Predictor treats choosing a mixed strategy as two-boxing.

Just to clarify, the soylent we're talking about here is not the original recipe. It is a more frugal version made from soybeans, rice and oil.

The whole point of dimensional analysis as a method of error checking is that fudging the units doesn't work. If you have to use an arbitrary constant with no justification besides "making the units check out", then that is a very bad sign.

If I say "you can measure speed by dividing force by area", and you point out that that gives you a unit of pressure rather than speed, then I can't just accuse you of nitpicking and say "well obviously you have to multiply by a constant of 1 m²s/kg". You wouldn't have to tell me why that operation isn't allowed. I would have to explain why it's justified.

Yes, sort of, but a) a linear classifier is not a Turing-complete model of computation, and b) there is a clear resemblance that can be seen by merely glancing at the equations.

It's interesting to me that the proper linear model example is essentially a stripped down version of a very simple neural network with a linear activation function.

I would argue that neurons, neural nets, SPRs, and everyone else doing linear regression use those techniques because it's the simplest way to aggregate data.
Is that really true? Couldn't one say that of just about any Turing-complete (or less) model of computation? 'Oh, it's interesting that they are really just a simple unary fixed-length lambda-calculus function with constant-value parameters.' 'Oh, it's interesting that they are really just restricted petri-nets with bounded branching factors.' 'Oh, it's interesting that these are modelable by finite automata.' etc. (Plausible-sounding gobbledygook included to make the point.)

I salute your ability to troll all of these groups in a post about what kind of groups are easy to troll. I almost started to argue on some of these points before I saw your game.

I don't troll. I never have. I will attempt to reasonably argue a point of view, but (as an example) I will not go to and start a post containing my (slightly negative) opinion of it, nor discussing my preferences to that device (Sig 55x, AK etc.). However I have been on the intarwebs since 1993 and have engaged in some fairly vigorous debates. I have also noticed who responds and why. This is not building interplanetary transport devices. I will admit I should have put an etc. on the end of points 2 and 3. In retrospect I should have put Linux Users in there as well.
He almost got me, too.

Surely you aren't implying that a desire to prolong one's lifespan can only be motivated by fear.

I think it was on This American Life that I heard the guy's story. They even contacted a physicist to look at his "theory", who tried to explain to him that the units didn't work out. The guy's response was "OK, but besides that …"

He really seemed to think that this was just a minor nitpick that scientists were using as an excuse to dismiss him.

Why isn't it a minor nitpick? I mean, we use dimensioned constants in other areas; why, in principle, couldn't the equation be E=mc (1 m/s)? If that was the only objection, and the theory made better predictions (which, obviously, it didn't, but bear with me), then I don't see any reason not to adopt it. Given that, I'm not sure why it should be a significant* objection. Edited to add: Although I suppose that would privilege the meter and second (actually, the ratio between them) in a universal law, which would be very surprising. Just saying that there are trivial ways you can make the units check out, without tossing out the theory. Likewise, of course, the fact that the units do check out shouldn't be taken too strongly [] in a theory's favor. Not that anyone here hadn't seen the XKCD, but I still need to link it, lest I lose my nerd license.

This raises a good point, but there are circumstances where the "someone would have noticed" argument is useful. Specifically, if the hypothesis is readily testable, if the consequences, if true, would be difficult to ignore, and if the hypothesis is, in fact, regularly tested by many of the same people who have told you that the hypothesis is false, then "somebody would have noticed" is reasonable evidence.

For example, "there is no God who reliably answers prayers" is a testable hypothesis, but it is easy for the religious to... (read more)

That guy needed to be taught basic dimensional analysis, apparently. E=mc has units of kg-m/s, which is the unit of momentum, not energy.

LW is pretty much the only site I visit where I feel significantly intimidated about commenting. I've left a couple of comments, but I seem to be more self-conscious about exposing my ignorance here than I am elsewhere – probably because I know that the chances of such ignorance being noticed are higher. It occurs to me that this is completely backwards and ridiculous, but there you have it.

Consider not the abstract situation of B = dA/dt, but the concrete example of the signal generator. It would be a perverse reading of the word "cause" to say that the voltage does not cause the current. You can make the current be anything you like by suitably manipulating the voltage.

But you can make a similar statement for just about any situation where B = dA/dt, so I think it's useful to talk about the abstract case.

For example, you can make a car's velocity anything you like by suitably manipulating its position. Would you then say that t... (read more)

I think intervention is the key idea missing from the above discussion of which of the the derivative function and the integrated function is the cause and which is the effect. In the signal generator example, voltage is a cause of current because we can intervene directly on the voltage. In the car example, acceleration is a cause of velocity because we can intervene directly on acceleration. This is not too helpful on its own, but maybe it will point the discussion in a useful direction.

That is what A and B are: a randomly wandering variable A and its rate of change B.

Maybe I'm not quite understanding, but it seems to me that your argument relies on a rather broad definition of "causality". B may be dependent on A, but to say that A "causes" B seems to ignore some important connotations of the concept.

I think what bugs me about it is that "causality" implies a directness of the dependency between the two events. At first glance, this example seems like a direct relationship. But I would argue that B is not... (read more)

Very true. Once again, I'm going to have to recommend in the context of a Richard Kennaway post, the use of more precise concepts. Instead of "correlation", we should be talking about "mutual information", and it would be helpful if we used Judea Pearl's definition of causality. Mutual information between two variables means (among many equivalent definitions) how much you learn about one variable by learning the other. Statistical correlation is one way that there can be mutual information between two variables, but not the only way. So, like what JGWeissman said, there can be mutual information between the two series even in the absence of a statistical correlation that directly compares time t in one to time t in the other. For example, there is mutual information between sin(t) and cos(t), even though d(sin(t))/dt = cos(t), and even though they're simultaneously uncorrelated (i.e. uncorrelated when comparing time t to time t). The reason there is mutual information is that if you know sin(t), a simple time-shift tells you cos(t). As for causation, the Pearl definition [] is (and my apologies I may not get this right) that: "A causes B iff, after learning A, nothing else at the time of A or B gives you information about B. (and A is the minimal such set for which this is true)" In other words, A causes B iff A is the minimal set for which B is conditionally independent given A. So, anyone want to rephrase Kennaway's post with those definitions?
Consider not the abstract situation of B = dA/dt, but the concrete example of the signal generator. It would be a perverse reading of the word "cause" to say that the voltage does not cause the current. You can make the current be anything you like by suitably manipulating the voltage. But let this not degenerate into an argument about the "real" meaning of "cause". Consider instead what is being said about the systems studied by the authors referenced in the post. Lacerda, Spirtes, et al. [] do not use your usage. They talk about time series equations in which the current state of each variable depends on the previous states of some variables, but still they draw causal graphs which do not have a node for every time instant of every variable, but a node for every variable. When x(i+1) = b y(i) + c z(i), they talk about y and z causing x. The reason that none of their theorems apply to the system B = dA/dt is that when I discretise time and put this in the form of a difference equation, it violates the precondition they state in section 1.2.2. This will be true of the discretisation of any system of ordinary differential equations. It appears to me that that is a rather significant limitation of their approach to causal analysis.
This is the right idea. For small epsilon, B(t) should have a weak negative correlation with A(t - epsilon), a weak positive correlation with A(t + epsilon). and a strong positive correlation with the difference A(t + epsilon) - A(t - epsilon). The function A causes the function B, but the value of A at time t does not cause the value of B at time t. Therefore the lack of correlation between A(t) and B(t) does not contradict causation implying correlation.
I don't think you are arguing in a circle. B is caused by current and previous As. Obviously we're not going to see a correlation unless we control for the previous state of A. Properly controlled the relationship between the two variables will be one-to-one, won't it?