Pointing at Normativity
Consequences of Logical Induction
Partial Agency
Alternate Alignment Ideas
Filtered Evidence, Filtered Arguments
Embedded Agency
Hufflepuff Cynicism




Alexander analyzes the difference between  and  in terms of the famous "explaining away" effect. Alexander supposes that  has learned some "causes":

The reason is that the probability mass provided by examples x₁, ..., xₙ such that ϕ(xᵢ) holds is now distributed among the universal statement ∀x:ϕ(x) and additional causes Cⱼ known to the more powerful agent that also imply ϕ(xᵢ). Consequently, ∀x:ϕ(x) becomes less "necessary" and has less relative explanatory power for the more informed agent.

An implication of this perspective is that if the weaker agent learns about the additional causes Cⱼ, it should also lower its credence in ∀x:ϕ(x).

Postulating these causes adds something to the scenario. One possible view is that Alexander is correct so far as Alexander's argument goes, but incorrect if there are no such  to consider.

However, I do not find myself endorsing Alexander's argument even that far.

If  and  have a common form, or are correlated in some way -- so there is an explanation which tells us why the first two sentences,  and , are true, and which does not apply to  -- then I agree with Alexander's argument.

If  and  are uncorrelated, then it starts to look like a coincidence. If I find a similarly uncorrelated  for  for , and a few more, then it will feel positively unexplained. Although each explanation is individually satisfying, nowhere do I have an explanation of why all of them are turning up true.

I think the probability of the universal sentence should go up at this point.

So, what about my "conditional probabilities also change" variant of Alexander's argument? We might intuitively think that  and  should be evidence for the universal generalization, but  does not believe this -- its conditional probabilities indicate otherwise. 

I find this ultimately unconvincing because the point of Paul's example, in my view, is that more accurate priors do not imply more accurate posteriors. I still want to understand what conditions can lead to this (including whether it is true for all notions of "accuracy" satisfying some reasonable assumptions EG proper scoring rules).

Another reason I find it unconvincing is because even if we accepted this answer for the paradox of ignorance, I think it is not at all convincing for the problem of old evidence. 

What is the 'problem' in the problem of old evidence?

... to be further expanded later ...

The matter seems terribly complex and interesting to me.

Notions of Accuracy?

Suppose  is a prior which has uncertainty about  and uncertainty about . This is the more ignorant prior. Consider  some prior which has the same beliefs about the universal statement --  -- but which knows  and .

We observe that  can increase its credence in the universal statement by observing the first two instances,  and , while  cannot do this --  needs to wait for further evidence. This is interpreted as a defect.

The moral is apparently that a less ignorant prior can be worse than a more ignorant one; more specifically, it can learn more slowly.

However, I think we need to be careful about the informal notion of "more ignorant" at play here. We can formalize this by imagining a numerical measure of the accuracy of a prior. We might want it to be the case that more accurate priors are always better to start with. Put more precisely: a more accurate prior should also imply a more accurate posterior after updating. Paul's example challenges this notion, but he does not prove that no plausible notion of accuracy will have this property; he only relies on an informal notion of ignorance.

So I think the question is open: when can a notion of accuracy fail to follow the rule "more accurate priors yield more accurate posteriors"? EG, can a proper scoring rule fail to meet this criterion? This question might be pretty easy to investigate.

Conditional probabilities also change?

I think the example rests on an intuitive notion that we can construct  by imagining  but modifying it to know  and . However, the most obvious way to modify it so is by updating on those sentences. This fails to meet the conditions of the example, however;  would already have an increased probability for the universal statement.

So, in order to move the probability of  and  upwards to 1 without also increasing the probability of the universal, we must do some damage to the probabilistic relationship between the instances and the universal. The prior  doesn't just know   and ; it also believes the conditional probability of the universal statement given those two sentences to be lower than  believes them to be.

It doesn't think it should learn from them!

This supports Alexander's argument that there is no paradox, I think. However, I am not ultimately convinced. Perhaps I will find more time to write about the matter later.

Noting that images currently look broken to me, in this post.

I don't really interact with Twitter these days, but maybe you could translate my complaints there and let me know if you get any solid gold?


I don't have a good system prompt that I like, although I am trying to work on one. It seems to me like the sort of thing that should be built in to a tool like this (perhaps with options, as different system prompts will be useful for different use-cases, like learning vs trying to push the boundaries of knowledge). 

I would be pretty excited to try this out with Claude 3 behind it. Very much the sort of thing I was trying to advocate for in the essay!


But not intentionally. It was an unintentional consequence of training.


I am not much of a prompt engineer, I think. My "prompts" generally consist of many pages of conversation where I babble about some topic I am interested in, occasionally hitting enter to get Claude's responses, and then skim/ignore Claude's responses because they are bad, and then keep babbling. Sometimes I make an explicit request to Claude such as "Please try and organize these ideas into a coherent outline" or "Please try and turn this into math" but the responses are still mostly boring and bad.

I am trying ;p

But yes, it would be good for me to try and make a more concrete "Claude cannot do X" to get feedback on.

I've tried writing the beginning of a paper that I want to read the rest of, but the LLM did not complete it well enough to be interesting.


I agree with this worry. I am overall advocating for capabilitarian systems with a specific emphasis in helping accelerate safety research.


Sounds pretty cool! What LLM powers it?

Load More