LESSWRONG
LW

1291
transhumanist_atom_understander
3905610
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Prospects for Solartopia
Why you should eat meat - even if you hate factory farming
transhumanist_atom_understander14d10

Wait, you think people need to eat collagen? Collagen is just a kind of protein, it'll get broken down into raw amino acids in the stomach. There can be issues with a vegan diet not getting complete protein (that is, low on one or more essential amino acids) but there's nothing special about collagen specifically.

Reply
All Exponentials are Eventually S-Curves
transhumanist_atom_understander1mo10

I'm surprised at how hard it is for me to think of counterexamples.

I thought surely whale populations due to the slow generation time, but it looks like humpback whale populations have already recovered from whaling, and blue whales will get there before long.

Thinking again—in my baseball example, gravity is pulling the ball into the domain of applicability of the constant acceleration model.

Maybe what's special about the exponential growth model is it implies escape from its own domain of applicability, in time that grows slowly (logarithmically) with the threshold.

Reply
Löb's Lemma: an easier approach to Löb's Theorem
transhumanist_atom_understander1mo10

I remember this by analogy to Curry's paradox.

Where the sentence from Curry's paradox says "If this statement is true, then p", Ψ says "if this statement is provable, then p", that is, □Ψ→p.

In Curry's paradox, if the sentence is true, that would indeed imply that p is true. And with Ψ, the situation is analogous, but with truth replaced by provability: if Ψ is provable, then p is provable. That is, □Ψ→□p.

But, unlike in Curry's paradox, this is not what Ψ itself says! Replacing truth with provability has attenuated the sentence, destroyed its ability to cause paradox.

If only □p→p, then we would have our paradox back... and that's Löb's theorem.

This is all about □Ψ→□p, just about one direction of the biimplication, whereas the post proves not just that but the other direction. It seems that only this forward direction is used in the proof at the end of the post though.

Reply
All Exponentials are Eventually S-Curves
transhumanist_atom_understander1mo42

You say "if we are to accurately model the world"...

If I am modelling the path of a baseball, and I write "F = mg", would you "correct" me that it's actually inverse square, that the Earth's gravitation cannot stay at this strength to arbitrary heights? If you did, I would remind you that we are talking about a baseball game, and not shooting it into orbit—or conclude that you had an agenda other than determining where the ball lands.

What if I'm sampling from a population, and you catch me multiplying probabilities together, as if my draws are independent, as if the population is infinite? Yes there is an end to the population, but as long as it's far away, the dependence induced by sampling without replacement is negligible.

Well, that's the question, whether to include an effect in the model or whether it's negligible. An effect like finite population size, diminishing gravity, or the "crowding" effects that turn an exponential growth model logistic.

And the question cannot be escaped just by noting the effect is important eventually.

Reply
Yudkowsky on "Don't use p(doom)"
transhumanist_atom_understander1mo20

Eliezer in 2008, in When (Not) To Use Probabilities, wrote:

To be specific, I would advise, in most cases, against using non-numerical procedures to create what appear to be numerical probabilities. Numbers should come from numbers.

Reply
A quantum equivalent to Bayes' rule
transhumanist_atom_understander1mo10

Yeah... well, I thought of the Z because it sounds like we're getting the probabilities of Y from some experiment. So Z=z is the results of the experiment, which in this case is a vector of frequencies. When I put it like that, it sounds like it's is just a rhetorical device for saying that we have given probabilities of Y.

But I still seem to need Z for my dictionary. I have γ(x)=P[X=x]. What is γ′(x)? It is some kind of updated probability of X=x, right? Like we went from one probability to the other by doing an experiment. If I didn't write γ′(x)=P[X|Z=z], I'd need something like γ(x)=P1[X=x] and γ′(x)=P2[X=x].

Reading again, it seems like this is exactly Jeffrey conditionalization. So whether you include some extra variable just depends on what you think of Jeffrey conditionalization.

I feel like I'm missing something, though, about what this experiment is and means. For example, I'm not totally clear on whether we have one state X, and a collection of replicates of state Y; or is it a collection of replicates of (X,Y) pairs?

Looking at the paper, I see the connection to Jeffrey conditionalization is made explicitly. And it mentions Pearl's "virtual evidence method"; is this what he calls introducing this Z? But no clarity on exactly what this experiment is. It just says:

But how should the above be generalized to the situation where the new information does not come in the form of a definite value y0 for Y, but as “soft evidence,” i.e., a probability distribution τ(y)?"

By the way, regarding your coin toss example, I can at least say how this is handled in Bayesian statistics. There are separate random variables for each coin toss. Y1 is the first, Y2 is the second, etc. If you have n coin tosses, then your sample is a vector →Y containing Y1 to Yn. Then the posterior probability is P[loaded|→Y=→y]. This will be covered in any Bayesian statistics textbook as "the Bernoulli model". My class used Hoff's book, which provides a quick start.

I guess this example suggests a single unknown X (whether the coin is loaded or not) and replicates of Y.

Reply
A quantum equivalent to Bayes' rule
transhumanist_atom_understander1mo80

The "Classical derivation" made more sense to me after translating it to standard probability notation, so I'm commenting to share the "dictionary" I made for it, as well as the unexpected extra assumption I had to make.

The obvious:

γ(x)=P[X=x]

φ(y|x)=P[Y=y|X=x]

^φ(x|y)=P[X=x|Y=y]

It got tricky with τ. Instead of observing Y=y, we observe something else that gives us a probability distribution over Y. I considered this "something else" to be the value of some other unknown: Z=z. The probability distribution over y is a conditional distribution:

τ(y)=P[Y=y|Z=z]

Hate to have z on only one side like that... maybe I should have called it τz... but I'll leave it as is.

Then,

γ′(x)=∑jP[X=x|Y=yj]P[Y=yj|Z=z]

Not quite the right formula for a simple interpretation of γ′... if only

P[X=x|Y=yj]=P[X=x|Y=yj,Z=z]

This is conditional independence, which could be represented with this Bayes net:

Z→Y→X

Then, we have

γ′(x)=P[X=x|Z=z]

That completes the dictionary.

So to do what feels like ordinary probability theory, I had to introduce this extra unknown Z so that we have something to observe, and then to assume that Z only provides information about Y (and indirectly about X, through Y).

The way you described τ as some probability distribution resulting from an observation, but not a conditional distribution, is in philosophy called Jeffrey conditionalization. The Stanford Encyclopedia of Philosophy gives this example:

A gambler is very confident that a certain racehorse, called Mudrunner, performs exceptionally well on muddy courses. A look at the extremely cloudy sky has an immediate effect on this gambler’s opinion: an increase in her credence in the proposition (muddy) that the course will be muddy—an increase without reaching certainty. Then this gambler raises her credence in the hypothesis (win) that Mudrunner will win the race, but nothing becomes fully certain. (Jeffrey 1965 [1983: sec. 11.3])

The idea is, we go from one probability distribution over {muddy,¬muddy} to another, without becoming certain of anything. My introduction of Z corresponds to introducing an unknown representing the status of the sky. I would say we are conditioning on Z=cloudy.

I recalled vaguely that Jaynes discussed Jeffrey conditionalization in Probability Theory, and criticized it for holding only in a special case. I took a look, and sure enough, it's in section 5.6, and he's pointing out exactly what I did, right down to the arrows, though he calls it a "logic flow diagram" rather than identifying it as a Pearl-style Bayes net.

Reply
Unknown Probabilities
transhumanist_atom_understander6mo10

The last formula in this post, the conservation of expected evidence, had a mistake which I've only just now fixed. Since I guess it's not obvious even to me, I'll put a reminder for myself here, which may not be useful to others. Really I'm just "translating" from the "law of iterated expectations" I learned in my stats theory class, which was:

E[E[X|Y]]=E[X]

This is using a notation which is pretty standard for defining conditional expectations. To define it you can first consider the expected value given a particular value of the random variable Y. Think of that as a function of that particular value: f(y)=E[X|Y=y] Then we define conditional expectation as a random variable, obtained from plugging in the random value of Y: E[X|Y]=f(Y) The problem with this notation is it gets confusing which capital letters are random variables and which are propositions, so I've bolded random variables. But it makes it very easy to state the law of iterated expectations.

The law of iterated expectations also holds when "relativized". That is, E[E[X|Y]|B]=E[X|B] where B is an event. If we wanted to stick to just putting random variables behind the conditional bar we could have used the indicator function of that event.

And this translates to the statement in my post. X is an indicator for the event H, which makes a conditional expectation of it a conditional probability of H. So E[X|Y] is Θ. Our proposition B is the background information B, I used the same symbol there. And the right hand side is another expectation of an indicator and therefore also a probability.

I really didn't want to define this notation in the post itself, but it's how I'm trained to think of this stuff, so for my own confidence in the final formula I had to write it out this way.

Reply
How Gay is the Vatican?
transhumanist_atom_understander6mo10

It would be nice if you had the sexes of the siblings, since it's supposedly only the older brothers that count, though I don't really expect that to change anything.

Really the important thing is just to separate birth order from family size. Usually the way I think of this is, we can look at number of older brothers, with a given number of older siblings. I like this setup because it looks like a randomized trial. I have two older siblings, so do you, meiosis randomizes their sexes.

But I guess with the data you have you can look at birth order with a given family size, so we don't have to worry about the effect of a larger or smaller family. I... don't think this is what you did? Did I misunderstand something? It seems like if cardinals come from smaller families, that would show up as lower birth orders.

With 9 million people I'd just split it into categories by number of siblings, with your data I'm not sure.

Reply
Load More
362024 was the year of the big battery, and what that means for solar power
8mo
1
32Elon Musk and Solar Futurism
10mo
27
135Why I don't believe in the placebo effect
1y
22
22Unknown Probabilities
2y
1
16"Absence of Evidence is Not Evidence of Absence" As a Limit
2y
1