Alexander analyzes the difference between and $p_{2}$ in terms of the famous "explaining away" effect. Alexander supposes that $p_{2}$ has learned some "causes":

The reason is that the probability mass provided by examples x₁, ..., xₙ such that ϕ(xᵢ) holds is now distributed among the universal statement ∀x:ϕ(x) and additional causes Cⱼ known to the more powerful agent that also imply ϕ(xᵢ). Consequently, ∀x:ϕ(x) becomes less "necessary" and has less relative explanatory power for the more informed agent.
An implication of this perspective is that if the weaker agent learns about the additional causes Cⱼ, it should also lower its credence in ∀x:ϕ(x).

Postulating these causes adds something to the scenario. One possible view is that Alexander is correct so far as Alexander's argument goes, but incorrect if there are no such $C_{j}$ to consider.

However, I do not find myself endorsing Alexander's argument even that far.

If $C_{1}$ and $C_{2}$ have a common form, or are correlated in some way -- so there is an explanation which tells us why the first two sentences, $ϕ (x_{1})$ and $ϕ (x_{2})$ , are true, and which does not apply to $n > 2$ -- then I agree with Alexander's argument.

If $C_{1}$ and $C_{2}$ are uncorrelated, then it starts to look like a coincidence. If I find a similarly uncorrelated $C_{3}$ for $ϕ (x_{3})$ , $C_{4}$ for $ϕ (x_{4})$ , and a few more, then it will feel positively unexplained. Although each explanation is individually satisfying, nowhere do I have an explanation of why all of them are turning up true.

I think the probability of the universal sentence should go up at this point.

So, what about my "conditional probabilities also change" variant of Alexander's argument? We might intuitively think that $ϕ (x_{1})$ and $ϕ (x_{2})$ should be evidence for the universal generalization, but $p_{2}$ does not believe this -- its conditional probabilities indicate otherwise.

I find this ultimately unconvincing because the point of Paul's example, in my view, is that more accurate priors do not imply more accurate posteriors. I still want to understand what conditions can lead to this (including whether it is true for all notions of "accuracy" satisfying some reasonable assumptions EG proper scoring rules).

Another reason I find it unconvincing is because even if we accepted this answer for the paradox of ignorance, I think it is not at all convincing for the problem of old evidence.

What is the 'problem' in the problem of old evidence?

... to be further expanded later ...

Reply

Alexander Gietelink Oldenziel's Shortform

abramdemski6d60

The matter seems terribly complex and interesting to me.

Notions of Accuracy?

Suppose is a prior which has uncertainty about $ϕ (x_{1}), ϕ (x_{2}), . . .$ and uncertainty about $\forall_{n} ϕ (x_{n})$ . This is the more ignorant prior. Consider $p_{2}$ some prior which has the same beliefs about the universal statement -- $p_{1} (\forall_{n} ϕ (x_{n})) = p_{2} (\forall_{n} ϕ (x_{n}))$ -- but which knows $ϕ (x_{1})$ and $ϕ (x_{2})$ .

We observe that $p_{1}$ can increase its credence in the universal statement by observing the first two instances, $ϕ (x_{1})$ and $ϕ (x_{2})$ , while $p_{2}$ cannot do this -- $p_{2}$ needs to wait for further evidence. This is interpreted as a defect.

The moral is apparently that a less ignorant prior can be worse than a more ignorant one; more specifically, it can learn more slowly.

However, I think we need to be careful about the informal notion of "more ignorant" at play here. We can formalize this by imagining a numerical measure of the accuracy of a prior. We might want it to be the case that more accurate priors are always better to start with. Put more precisely: a more accurate prior should also imply a more accurate posterior after updating. Paul's example challenges this notion, but he does not prove that no plausible notion of accuracy will have this property; he only relies on an informal notion of ignorance.

So I think the question is open: when can a notion of accuracy fail to follow the rule "more accurate priors yield more accurate posteriors"? EG, can a proper scoring rule fail to meet this criterion? This question might be pretty easy to investigate.

Conditional probabilities also change?

I think the example rests on an intuitive notion that we can construct $p_{2}$ by imagining $p_{1}$ but modifying it to know $ϕ (x_{1})$ and $ϕ (x_{2})$ . However, the most obvious way to modify it so is by updating on those sentences. This fails to meet the conditions of the example, however; $p_{2}$ would already have an increased probability for the universal statement.

So, in order to move the probability of $ϕ (x_{1})$ and $ϕ (x_{2})$ upwards to 1 without also increasing the probability of the universal, we must do some damage to the probabilistic relationship between the instances and the universal. The prior $p_{2}$ doesn't just know $ϕ (x_{1})$ and $ϕ (x_{2})$ ; it also believes the conditional probability of the universal statement given those two sentences to be lower than $p_{1}$ believes them to be.

It doesn't think it should learn from them!

This supports Alexander's argument that there is no paradox, I think. However, I am not ultimately convinced. Perhaps I will find more time to write about the matter later.

Reply

A Correspondence Theorem

abramdemski10d40

Noting that images currently look broken to me, in this post.

Reply

LLMs for Alignment Research: a safety priority?

abramdemski2mo20

I don't really interact with Twitter these days, but maybe you could translate my complaints there and let me know if you get any solid gold?

Reply

LLMs for Alignment Research: a safety priority?

abramdemski2moΩ342

I don't have a good system prompt that I like, although I am trying to work on one. It seems to me like the sort of thing that should be built in to a tool like this (perhaps with options, as different system prompts will be useful for different use-cases, like learning vs trying to push the boundaries of knowledge).

I would be pretty excited to try this out with Claude 3 behind it. Very much the sort of thing I was trying to advocate for in the essay!

Reply

LLMs for Alignment Research: a safety priority?

abramdemski2moΩ561

But not intentionally. It was an unintentional consequence of training.

Reply

LLMs for Alignment Research: a safety priority?

abramdemski2moΩ562

I am not much of a prompt engineer, I think. My "prompts" generally consist of many pages of conversation where I babble about some topic I am interested in, occasionally hitting enter to get Claude's responses, and then skim/ignore Claude's responses because they are bad, and then keep babbling. Sometimes I make an explicit request to Claude such as "Please try and organize these ideas into a coherent outline" or "Please try and turn this into math" but the responses are still mostly boring and bad.

I am trying ;p

But yes, it would be good for me to try and make a more concrete "Claude cannot do X" to get feedback on.

Reply

LLMs for Alignment Research: a safety priority?