Doesn't G/D gap suggest we should have more criticism than generation?
A critique in a high-ambiguity context is almost always (in all but the most technical domains where individual claims can be cheaply verified) a request to engage in a prolonged exchange
I don't think so, I think critics usually see themselves as pointing out errors in the original and would be quite happy with an "oops, good pickup". Extended exchanges happen because authors usually don't agree. Often at least one of the parties is confused/motivated/doing a poor job for some reason, which means long exchanges are often frustrating and people are wary about getting into them. But the critique isn't a request for an exchange.
I'm also a bit confused by this discussion. Is your position actually: you can only judge a critic by their papers or LessWrong posts? This seems odd, can't you judge them by their other critiques? A critique seems much easier to evaluate than a paper.
This piece combines relatively uncontroversial points with some justification ("we're not near the compute or data efficiency limit") with controversial claims justified only by Steven's intuition ("the frontier will be reached suddenly by a small group few people are tracking"). I'd be more interested in a piece which examined the consequences of the former kind of claims only, or more strongly justified the latter kinds of claims.
models will have access to some kind of "neuralese" that allows them to reason in ways we can't observe
Only modest confidence, but while there's an observability gap between neuralese and CoT monitoring, I suspect it's smaller than the gap between reasoning traces that haven't been trained against oversight and reasoning traces that have.
I mean, even if you're mostly pursuing a particular set of final values (which is not what you're advocating here), there are probably strong reasons to make coordination a high priority (which is close to what you're advocating here).
Well, I did say "to the extent permitted by 1" - there's probably conflict here - but I wasn't suggesting CEV as something that makes coordination easy. I'm saying it's a good principle for judging final outcomes between two different paths that have similar levels of coordination. Ofc we'd have to estimate the "happiness in hindsight", but this looks tractable to me.
I've thought about it a bit, I have a line of attack for a proof, but there's too much work involved in following it through to an actual proof so I'm going to leave it here in case it helps anyone.
I'm assuming everything is discrete so I can work with regular Shannon entropy.
Consider the range of the function and defined similarly. Discretize and (chop them up into little balls). Not sure which metric to use, maybe TV.
Define to be the index of the ball into which falls, similar. So if is sufficiently small, then .
By the data processing inequality, conditions 2 and 3 still hold for . Condition 1 should hold with some extra slack depending on the coarseness of the discretization.
It takes a few steps, but I think you might be able to argue that, with high probability, for each , the random variable will be highly concentrated (n.b. I've only worked it through fully in the exact case, and I think it can be translated to the approximate case but I haven't checked). We then invoke the discretization to argue that is bounded. The intuition is that the discretization forces nearby probabilities to coincide, so if is concentrated then it actually has to "collapse" most of its mass onto a few discrete values.
We can then make a similar argument switching the indices to get bounded. Finally, maybe applying conditions 2 and 3 we can get bounded as well, which then gives a bound on .
I did try feeding this to Gemini but it wasn't able to produce a proof.
Wait, I thought the first property was just independence, not also identically distributed.
In principle I could have e.g. two biased coins with their biases different but deterministically dependent.
I think:
I sort of see the former as potentially encouraging diversity (because different groups want different things, and are most likely to agree to "everyone gets what they want"), but the latter may in fact suggest convergence (because, perhaps, there are fairly universal answers to "what makes people happy with the benefit of hindsight?").
You stress the importance of having robust feedback procedures, but having overall goals like this can help to judge which procedures are actually doing what we want.
Your natural latents seem to be quite related to the common construction IID variables conditional on a latent - in fact, all of your examples are IID variables (or "bundles" of IID variables) conditional on that latent. Can you give me an interesting example of a natural latent that is not basically the conditionally IID case?
(I was wondering if the extensive literature on the correspondence between De Finetti type symmetries and conditional IID representations is of any help to your problem. I'm not entirely sure if it is, given that mostly addresses the issue of getting from a symmetry to a conditional independence, whereas you want to get from one conditional independence to another, but it's plausible some of the methods are applicable)
It's a research prototype, so probably not style tuned