Yes, but such an interpretation falls outside of Yudkowsky's view, as I understand it
Maybe! But I would expect him to change his view to something like this in case you managed to persuade him that there is some crucial flaw in Bayesianism. While your goal seems to be to propagate the toolbox-view as the only valid approach. So you might as well engage with a stronger version of law-view right now.
On Walker, on that paragraph he is criticizing the specific (common) practice of comparing separate Bayesian models and picking the best (via ratios or errors or some such) when there is uncertainty about the truth instead of appropriately representing this uncertainty about your sampling model in the prior.
Rolling a die is a bit of a nifty example here since it's the case where you assign a separate probability to each label in the sample space, so that your likelihood is in fact fully general, which is where the idea for a Dirichlet prior comes from in an attempt to generalize this notion of covering all possible models for less trivial problems.
So, suppose that instead of assigning equal probabilities to each label of a die, I consider this as just one of multiple possible models from a set of models with different priors. According to one of them:
P(1) = 1/2, P(2) = P(3) = P(4)=P(5)=P(6)=1/10
According to another:
P(2) = 1/2, P(1)=P(3)=P(4)=P(5)=P(6)=1/10
And so on and so forth.
And then I assign equiprobable prior between these models and start collecting experimental data - see how well all of them perform. Do I understand correctly, that Walker considers such approach incoherent?
In which case, I respectfully disagree with him. While it's true that this approach doesn't represent our uncertainty about which label of an unknown die will be shown on a roll, it, nevertheless, represents the uncertainty about which bayesian model best approximates the behavior of this particular die. And there is nothing incoherent in modelling the latter kind of uncertainty instead of the former.
And likewise for more complicated settings and models. Whenever we have uncertainty about which model is the best one we can model this uncertainty and get a probabilistic answer to it via bayesian methods. And then get a probabilistic answer according to this model, if we want to.
On Bayesian finite-sample miscalibration, simply pick a prior which is sufficiently far off from the true value and your predictive intervals will be very bad for a long time
But why would the prior, capturing all your information about a setting, be sufficiently far off from the true value, in the first place? This seems to happen mostly when you misuse the bayesian method, by picking some arbitrary prior for no particular reason. Which is a weird complain. Surely we can also misuse frequentist methods in a similar fashion - p-hacking immediately comes to mind, or just ignoring bunch of data points altogether. But what's the point in talking about this? We are interested in situations when the art fails us, not when we fail the art, aren't we?
On minimax
Interesting! So is there a agreement among frequentists that probability of an unfair coin about which we know nothing else to land Tails is 1/2? Or is it more like: "Well we have a bunch of tools and here one of them says 1/2, but we do not have a principled reason to prefer it to other tools regarding the question of what probability is, so the question is still open".
On your last comment, it seems like a bit of an open question to attribute the existence of practical intuition and reasoning about mathematical constructs like this to a Bayesian prior updating process.
Is it? I though everyone is in agreement that Bayes theorem naturally follows from the axioms of probability theory. In which case the only reason why such reasoning doesn't follow Bayesian updating procedure is that, somehow, probability theory is not applicable to the reasoning about mathematical constructs in particular, but why would that be true?
Certainly I reason, and I change my mind, but to me personally I see no reason to imagine this was Bayesian in some way (or that those thoughts were expressed in credence-probabilities which I shifted by conditioning on a type of sense-data), nor that I would be ideally doing this instead.
Oh wait, you don't think that probability theory is applicable to reasoning in general? Surely I'm misunderstanding you here? Could you elaborate on your position here? I feel that this is the most important crux of disagreement.
On the question of 'taking the best from both', that is what Yudkowsky calls a "tool" view
Not necessary. You can have a law-view interpretation of such synthesis where we conceptualize Bayesianism as an imperfect approximation, a special case of the True Law, which should also capture all the good insights of Frequentism.
There is more responsibility on the Bayesian: she gets more out in the form of a posterior distribution on the object of interest. Hence more care needs to be taken in what gets put into the model in the first place. For the posterior to mean anything it must be representing genuine posterior beliefs, solely derived by a combination of the data and prior beliefs via the use of the Bayes theorem. Hence, the prior used must genuinely represent prior beliefs (beliefs without data). If it does not, how can the posterior represent posterior beliefs? So a “prior” that has been selected post data via some check and test from a set of possible “prior” distributions cannot represent genuine prior beliefs. This is obvious, since no one of these “priors” can genuinely represent prior beliefs. The posterior distributions based on such a practice are meaningless.
- Stephen J. Walker (first chapter of Bayesian Nonparametrics)
I'm not sure I see what exactly Walker is arguing here. Could you recreate the substance of the argument using a specific example, the roll of a 6 sided die, for instance?
We don't know anything more about the die, do not have any data of the previous throws. Is Steven Walker confused where are we getting the equiprobable prior from?
Not only that, but insofar as you 'want both' (finite-sample) calibration and coherence, you are called to abandon one or the other - Insofar as there are Bayesian methods that can get you the former, they are not derived by prior distributions that represent your knowledge of the world (if they even exist in general, anyway - not something I know of).
I really don't see why! As far as I know Bayes Theorem and Law of Large Numbers coexist perfectly. Could you give me some maximally simple example where such discrepancy happens?
On your query about coins, 1/2 is minimax for the squared error, I believe.
Minimax for the squared error of what? How do you calculate it if you don't have any access to the information about the previous tosses of the coin, nor know how exactly biased it is? Could you present your reasoning here step by step? Also what is your claim here, in the first place? That I misunderstand Frequentist position on the question and they actually agree with Bayesians here?
that there are only good properties which a method can or can't obtain.
Hmm... and how do you judge which methods have good properties and which properties are good in the first place? Doesn't reasoning about this itself requires some initial intuition and accumulated data from previous experience? Therefore essentially satisfying Bayesian structure?
I always feel that Bayesianism/Frequentism debates are somewhat misguided. Both categories are so vague, consisting of many individual elements, often for weird historical reasons. A lot of vibe based reasoning also involved, like what "smells very Frequentist" or vice versa. The global argument here appears more like a political one instead of being about math or epistemology.
It seem more useful to address specific individual disagreements and in the process develop a framework that deals with whatever problems Bayesianism and Frequentism have, taking all the best from both, instead of scoring points for these two frameworks and comparing which is better. What's the point in arguing, whether coherence or calibration is more important? Clearly we want both!
A standard example of such specific disagreement is assigning probabilities to unfair coin about which you know nothing else. Here, as far as I know, Frequentism is unable to perform, while Bayesianism has a coherent answer: 1/2. Bayesianism does look better in this regard but this is beside the point. What is important, is that now we know what answer should the optimal framework produce. Do you know similar specific examples where Frequentism appears superior? If we collect enough of them in both directions, we will be able to conceptualize a strictly superior framework.
The way you apply "Chesterton's Fence" is bordering a fully general argument against any change. This is not how it's supposed to be used. Chesterton's Fence is an argument against getting rid of things, without having knowledge why they existed in the first place. We do have a good model why alcohol consumption coincided with civilization. Therefore, there is no strong argument of this form that can be made here.
You also seem to confuse voluntary teetotaling with government enforced prohibition. All the bad secondary effects are results of the latter, not the former.
The argument is to change your personal behaviour in order to modify the global multiagent equilibrium at least to some degree.
Not much. I have initially considered this thread "not worth getting into" as @avturchin's line of reasoning is based on multiple different small confusions, addressing each of which would be a huge chore and is only tangetially relevant to the topic of the post, in the first place. I agree with this assessment today. But I will present the general outline of what is wrong with it for you and the future readers.
First of all, Gott's version of DA is different from the version of DA, I'm talking about in this post. Its a different mathematical model, that is based on the number of years humanity exists, instead of number of humans and returns a different estimate for extinction: 97.5% confidence for extinction in the next 8 million years, assuming that humanity existed for 200000 years, regardless of birthrates. Suffice to say, these two version of DA produce different predictions, and by shifting some free parameters in the models we can get even more different predictions still. This is completely expected if DA arguments are wrong.
Likewise Laplace sunrise is yet another mathematical model and a certain interpretation of it produces vaguely similar result to Gott's version of DA. Assuming LS being applicable, this isn't really an argument in favor of GDA or by kind of anthropic reasoning. Imagine if the correct answer to a test question is 1/5002, while your reasoning, which makes an extra assumption, produces an answer 1/5000. Clearly, it doesn't mean that your reasoning is correct, nor justifies the extra assumption.
And then there is a whole different question of applicability of LS to the situation at hand. Which also doesn't fully capture our knowledge state, but at least it's less wrong, in a sense, as it doesn't make the particular mistake which I'm talking about in this post.
Don’t go bullshitting me about how a kind and compassionate life of mediocrity is a “different kind of strength” or some such cope.
There is, in fact, no reason why being compassionate should doom you to a life of mediocrity. A lot of very compassionate people manage to simultaneously be extremely self-critical, even beyond the point where it's helpful for their productivity.
What is a "cope", is an idea that you are either nice or brilliant. And you seem to be a victim of it. So in the spirit of tsuyoku naritai, stop coming up with excuses not to learn a valuable skill, deluding yourself into thinking that it somehow going to make you less successful in other domains and go put some effort into acquiring it.
When I try to empathize with that woman
That's because you are not actually empathizing with the woman. You are empathizing with a man who notices the nail in the head. This is understandable because the point of the video is to make you do exactly that, it frames the situation in this particular manner that makes empathizing with the woman very hard, while empathizing with the man as easy as possible. Essentially you are being manipulated into empathizing with whoever the author of the video wants.
As a practicum of empathy and withstanding this sort of manipulation, try to re-frame the situation in such a way, where it's the woman who is in the right. And no, just switching genders of the characters won't do - that's not the point of the exercise. The point is to come up with a situation in which a complaining character who wants to be listened to, is obviously in the right, while a character who is proposing a solution is obviously in the wrong. Just like in the video it's obvious that the woman with the nail in the hand is wrong and stupid.
I believe I've solved the problem. I'm going to include this in my next post on probability theory fundamentals but here is the gist of it.
The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.
The meta-problem here is that people were looking for the answer in the wrong place, searching for a different decision making algorithm while what we actually needed is a satisfying epistemological account. The core crux isn't in decision theory, but on a previous step - in probability theory.
UDT works but it doesn't compute or make use of "probability of being at X" so epistemically it doesn't seem very satisfying.
That should be a clue that "probability of being at X" isn't, in fact, a thing. That event "I'm at X and not at Y" is ill defined. In other words, that the problem is with our intuition that mistakenly assumes that there should be such an event, and with the lack of a strict epistemological framework that would allow us to answer questions such as "Does this mathematical model fit the setting?" and "Is this event well-defined?"
Here I provide this framework. An event is a conditional statement of a belief updating algorithm, that has to return clear True or False in every iteration of probability experiment, approximating some process to the best of our knowledge - in our case Absent-Minded Driver problem. Statement "I'm at X and not at Y" doesn't satisfy this condition for Absent-Minded Driver as in some iterations of the experiment the driver will be at both. Therefore it's not an event, and can not lead to conditionalization.
The event that is well defined in every iteration of the experiment is "I'm at X or Y". This event has probability 1 which means trivial conditionalization - on its realization credences of the driver do not change. Therefore everything adds up to normality.
To show that physicalism isn't necessarily true, I only need to show there is some plausibility to the existence of intrinsic subjectivity.
I'm not saying dualism is necessarily true, I'm saying physicalism isn't necessarily true. The one is not a corollary of the other.
Okay, I think we have a long-going misunderstanding here, so let's try to clear it once and for all.
We are, in fact, both in agreement that physicalism is not necessary true. Likewise, we are in agreement that dualism is also not necessary true.
Now consider these two statements:
I think the confusion that goes on between the two of us, is that when I say "Zombie Argument" I mean the strong one, while when you say "Zombie Argument", you mean the weak one. If you agree that Strong Zombie Argument is wrong, then there is in fact, no substantial disagreement between the two of us on this matter!
So, are we in agreement here?
That's just trivially true, isn't it? Among women who were already pre-selected to have similar faces, ages and BMI's to movie starts most of them can be made extremely attractive, with the help of right makeup, clothes, context and so on.
The difference is that some people happen to be part of this group of women with similar faces, ages and BMI's to movie starts and some do not. There is no contradiction here.
I'd say, more of an outlier than being 6'2, less of an outlier than Michael Jordan.
Basically, because society as a whole actively conditions women that their looks is the most important thing about them. This includes some of the factors that you've mentioned.