I can definitely see the benefits of focusing on likelihoods but I think in practice when we are talking about differences that are like 99% vs 5% this difference usually has its roots in something highly relevant to the ideas. So to take the murder example, lets say I talk to someone and they say that their best friend was murdered, and they have had two best friends and use an empirical bayes approach that gives a prior of 50% that they will be murdered. Sure this is phrased as being about a prior but functionally speaking its about a likelihood, how should the observation that their best friend was murdered influence the estimated risk of murder?
I think something like this often explains larger differences in posteriors. So as an example, lets say hypothetically that I think the evolution analogy for AI risk is a good idea and is essentially correct, but for me it increases my estimated risk by a little bit but for someone else it increase their estimated risk a lot. This will cash out as a large difference in posteriors, and so addressing differences in posteriors can be a reasonable way of triangulating the most relevant differences in likelihood.
More importantly, that BB has approximately no knowledge of the experiences and priors that led to those pessimistic posteriors. In general I think it’s wise to stick to discussing ideas (using probability as a tool for doing so) and avoid focusing on whether someone has the right posterior probabilities.
I don't understand this idea at all. If someone someone told me they thought the probability that they would be murdered within the next year is 62%, I'd probably point out that the murder rate per capita makes that seem extremely unlikely. I think that would be a reasonable response even if I didn't fully understand the experiences that lead them to have this belief. Likewise I think posterior probabilties are relevant to decisions and so should be "on the table" for discussion, and also can't be so cleanly seperated from the "ideas". If someone is worried about murder due to overestimating its likelihood that suggests their reasoning is based on different ideas than if they have a good estimate of the likelihood but are worried for another reason like extreme risk aversion.
I think there is a disconnect here related to different usages of the word "confidence". You say in the OP:
he is more extreme in his confidence that things will be ok
Which I would interpret as being 1- P(not-okay), in other words 1 - 0.026 = 97.4% for BB, very confident.
On the other hand, I think many people probably believe that extinction from misaligned AI is very unlikely apriori, and so might use "confidence" in a sense that is relative to priors. To understand why people might do this, lets imagine that I said "there is a 55% chance that irrelevant AI blogger The Floating Droid will win the 2028 US presidential election". Now imagine someone said "Wow! That's insanely overconfident!". I think people would be a bit suspicious if I responded that it was actually pretty unconfident because it quotes a probability near 50%.
I think this different usages of "confidence" is also relevant to the OP since it is reviewing BBs statements. For example, your statement:
though arguably less confident than BB!
Reads to me as a suggestion of hypocrisy or contadiction. BB accuse YS of being overconfident when really he is the one who is being extremely confident! But for this to be the case, we would need to evaluate the statement using the notion of "confidence" that was intended by BB. Its not clear to me that your post is actually using the same notion of "confidence".
I'm new here and just going through the sequences (though I have a mathematics background), but I have yet to see a good framing of bayesian/frequentist debate as maximum likelihood vs maximum a-posteriori. (I welcome referrals)
I'm definitely not representative of lesswrong in my my views above I don't think. In fact in some sense I think I'm shadowboxing with lesswrong in some of my comments above, so sorry about any confusion that introduced.
I don't think I've ever seen maximum likelihood vs maximum a-posteriori discussed on lesswrong, and I'm kind of just griping about it! I don't have a references off to top of my head but I recall this appearing in debates elsewhere (i.e. not on lesswrong) like in more academic/stats settings. I can see if I can find examples. But in general it addresses an estimation perspective instead of hypothesis testing.
Yes, there is a methodological critique to strict p-value calculations, but in the absence of informative priors p values are a really good indicator for experiment design. I feel that in hyping up Bayesian updates people are missing that and not offering a replacement. The focus on methods is a strength when you are talking about methods.
I think I'm in agreement with you here. My "methodological" was directed at what I view as a somewhat more typical lesswrong perspective, similar to what is expressed in the Eliezer quote. Sure, if we take some simple case we can address a more philosophical question about frequentism vs bayesianism, but in practical situations there are going to so many analytical choices that you could make that there are always going to be issues. In an actual analysis you can always do stuff like look at multiple versions of an analysis and trying to use that to refine your understanding of a phenomenon. If you fix the likelihood but allow the data to vary then p-values are likely to be highly correlated with possible alternatives like bayes factors, a lot of the critiques I feel are focused on making a clean philosophical approach while ignoring the inherent messiness that would be introduced if you ever want to infer things from reasonably complicated data or observations. I don't think swapping likelihood ratios for p-values would sudden change things all that much, a lot of the core difficulties of inferring things from data would remain.
It is not a fee-for-service relationship. The price system in medicine has been mangled beyond recognition. Patients are not told prices; doctors avoid, even disdain, any discussion of prices; and the prices make no rational sense even if and when you do discover them. This destroys all ability to make rational economic choices about healthcare.
I think pricing of medical services faces somewhat of a breakdown in the normal price-setting mechanism of markets. For some random good like a sandwich or whatever the buyer can at least have a reasonable sense of how much they want it, the seller understands their costs to produce it, and the price gets established by this balance. But how is someone who seeks medical care really supposed to know how much they value a particular medical service? They would presumably have to rely on their provider, who is on the opposite side of the transaction. Insurers could somewhat serve this role, but I think people often look down upon this, and also it seems likely to be a difficult and imperfect process.
Computing p-values is what Mr. Frequentist is all about.
For once I'd like to the bayesian/frequentist debate see the return of maximum likelihood vs maximum a-posteriori. P-values absolutely are not the only aspect of frequentist statistics! Yes they are one of the prominent so certainly fair game, but the way people talk about them its like they are all that matter. People have general problems with p-values beyond them being frequentist. To me the fact that they feature so prominently raises the question of how much certain commitments to "bayesianism" reflect actual usage of bayesian methods vs a kind of pop-science version of bayesianism.
Bayesian likelihood ratios
Is this meant to refer to a specific likelihood ratio method, or to suggest that likelihood ratios themselves are "bayesian"? Yes, "the likelihood principle" is a big source of criticism of p-values, but I don't see why likelihood ratios themselves are bayesian? I think Andrew Gelman once said something to the effect of both bayesian and frequentist methods need a likelihood, and often times that makes more of a difference than the prior. There's nothing strictly bayesian about "updating". I'm curious how often things that are identified as "bayesian" actual use Bayes' rule.
The frequentist approach, even if flawed in certain respects, still serves as a valuable heuristic. It teaches us to be wary of overfitting to outcomes, to ask about the process behind the numbers, and to maintain a healthy skepticism when interpreting results. Its insistence on method over outcome protects us from the temptation to rationalize or cherry-pick. I'd rather a scientist work with p-values than with their intuition alone.
I think I largely agree with the spirit here. I definitely think p-values have issues and in particular they way they have arguably contributed to publication bias is a highly reasonable criticism. That said, I think people like to make these "methodological" critiques more for philosophical than statistical reasons. In practice, we definitely should expect the application of all methods is going to have issues and be highly imperfect. So I agree that it makes sense to have a practical, "all methods are flawed, some are useful" view of things.
Encode successfully navigated this, by not offering facts (who did what, and when), since they don't have any first-hand knowledge of the facts. What they offered, according to their brief (which is attached as a sechedule to the "main document" for document 72), was their philosophical and technical perspective, particularly as a public body concerned with AI safety vis a vis the change in structure of OpenAI.
Didn't the Encode brief do stuff like quote public opinion polls? Sure they characterize it as offering a philosophical perspective (how it would be called "technical" I'm not sure) but to me it came across as basically asserting policy rather than legal arguments. Sure there is a consideration of the public interest for the preliminary injunction but the overall feel to me was very much policy rather than legal arguments. I also don't think you're necessarily applying this standard evenly to both briefs. The ex-OpenAI brief I think can be seen in a similar way, it just brings in additional pieces of evidence to make that case.
If you read through the two proposed briefs, they are night and day. Encode describes the interest that the public might have in OpenAI continuing under its present structure, compared to transitioning to a for-profit enterprise, the risk of AI, and why it should be avoided. The employee brief recounts meetings, memos, and who was making promises.
In my view the high-level arguments of both briefs are the same in that they argue that having actual control of future AI systems residing with a non-profit is in the public interest. It's just that the ex-OpenAI brief brings in more information for purposes of suggesting that such a belief was not uncommon among OpenAI employees and that we might reasonable view OpenAI to have committed to such a thing and understood this to be consistent with its charitable purpose. I could see how that might not be relevant to the case since it doesn't necessarily go to Musk's reliance, so perhaps it makes sense to not muddy the waters with it, but I don't think its the case that the ex-OpenAI brief somehow lacked any relevance if we assume the Encode brief was relevant.
In a very abstract way, Encode is basically saying that the transition shouldn't proceed because it would be bad for society and humanity. This is a perspective that isn't captured by either Musk or OpenAi/Microsoft.
This is relevant because its a factor for preliminary injunction purposes. I haven't gone back and read all the documents, but it would be very surprising to me if Musk didn't argue that the for-profit transition was contrary to the public interest. Also it seems to me like the ex-OpenAI brief also casts their arguments in these same terms.
The employees tried to say that OpenAI and Altman made promises to them, and those promises should be kept, which is almost entirely factual.
I think the brief is trying to argue that these facts go to the very point you identify the Encode brief as addressing.
The conceptual reasoning is simple and compelling: a sufficiently sophisticated deceptive AI can say whatever we want to hear, perfectly mimicking aligned behavior externally. But faking its internal cognitive processes – its "thoughts" – seems much harder. Therefore, goes the argument, we must rely on interpretability to truly know if an AI is aligned.
I think an important consequence of this argument is that it potentially suggests that it is actually just straight-up impossible for black-box evaluations to address certain possible future scenarios (although of course that doesn't mean everyone agrees with this conclusion). If a safe model and extremely dangerous model can both produce the exact same black-box results then those black-box results can't distinguish between the two models[1]. Its not just that there are challenges that would potentially require a lot of work to solve. For the interpretability comparison, its possible that the same is true (although I don't think I really have an opinion on that), or that the challenges for interpretability are so large that we might characterize it as impossible for practical purposes. I think this is especially true for very ambitious versions of interpretability, which goes to your "high reliability" point.
However, I think we could use a different point of comparison which we might call white-box evaluations. I think your point about interpretability enhancing black-box evaluations gets at something like this. If you are using insights from "inside" the model to help your evaluations then in a sense the evaluation is no longer "black-box", we are allowing the use of strictly more information. The relevance of the perspective implied by the quote above is that such approaches are not simply helpful but required in certain cases. Its possible these cases won't actually occur, but I think its important to understand what is or is not required in various scenarios. If a certain approach implicitly assumes certain risks won't occur, I think its a good idea to try to convert those implicit assumptions to explicit ones. Many proposals offered by major labs I think suffer from a lack of explicitness about what possible threats they wouldn't address, vs arguments for why they would address certain other possibilities.
For example, one implication of the "white-box required" perspective seems to be that you can't just advance various lines of research independently and incorporate them as they arrive, you might need certain areas to move together or at certain relative rates so that insights arrive in the order needed for them to contribute to safety.
As I see it we must either not create superintelligence, rely on pre-superintelligent automated researchers to find better methods, or deploy without fully reliable safeguards and roll the dice, and do as much as we can now to improve our odds.
Greater explicitness about limitations of various approaches would help with the analysis of these issues and with building some amount of consensus about what the ideal approach is.
Strictly speaking, if two models produce the exact same results on every possible input and we would categorize those results as "safe" then there is no problem, we've just said the model is proven safe. But practically there will be a distributional generalization issue where the results we have for evaluation don't cover all possible inputs.
I had a similar thought, and that would make sense to me, but I just don't know enough about the standards to say what the correct interpretation is. To an extent I feel like its kind of tea-leaf reading and maybe isn't a good idea, but at the same time I feel like these dynamics could be relevant to how views on AI safety develop among groups that are exposed to those ideas in these formats. I definitely think this won't be the last court case by far that implicates AI issues, so I feel its worth thinking about how different courses of action could play out.
Glad the comment was helpful. I will register my prediction that BB most likely meant the "relative to priors" meaning rather than the one that you use in the OP. I also think among people who aren't steeped in the background of AI risk, this would be the significantly more common interpretation upon reading what BB wrote.