Research Scientist at DeepMind. Creator of the Alignment Newsletter. http://rohinshah.com/
Biggest disagreement I have with him is that, from my perspective, is that he understands that the optimization deck stacks against us, but he does not understand the degree to which the optimization deck is stacked against us or the extent to which this ramps up and diversifies and changes its sources as capabilities ramp up, and thus thinks many things could work that I don’t see as having much chance of working. I also don’t think he’s thinking enough about what type and degree of alignment would ‘be enough’ to survive the resulting Phase 2.
I'm not committing to it, but if you wrote up concrete details here, I expect I'd engage with it.
Sounds plausible! I haven't played much with Conway's Life.
(Btw, you may want to make this comment on the original post if you'd like the original author to see it.)
I agree "information loss" seems kinda sketchy as a description of this phenomenon, it's not what I would have chosen.
I forget if I already mentioned this to you, but another example where you can interpret randomization as worst-case reasoning is MaxEnt RL, see this paper. (I reviewed an earlier version of this paper here (review #3).)
Possibly, but in at least one of the two cases I was thinking of when writing this comment (and maybe in both), I made the argument in the parent comment and the person agreed and retracted their point. (I think in both cases I was talking about deceptive alignment via goal misgeneralization.)
Okay, I understand how that addresses my edit.
I'm still not quite sure why the lightcone theorem is a "foundation" for natural abstraction (it looks to me like a nice concrete example on which you could apply techniques) but I think I should just wait for future posts, since I don't really have any concrete questions at the moment.
Okay, that mostly makes sense.
note that the resampler itself throws away a ton of information about while going from to . And that is indeed information which "could have" been relevant, but almost always gets wiped out by noise. That's the information we're looking to throw away, for abstraction purposes.
I agree this is true, but why does the Lightcone theorem matter for it?
It is also a theorem that a Gibbs resampler initialized at equilibrium will produce distributed according to , and as you say it's clear that the resampler throws away a ton of information about in computing it. Why not use that theorem as the basis for identifying the information to throw away? In other words, why not throw away information from while maintaining ?
EDIT: Actually, conditioned on , it is not the case that is distributed according to .
(Simple counterexample: Take a graphical model where node A can be 0 or 1 with equal probability, and A causes B through a chain of > 2T steps, such that we always have B = A for a true sample from X. In such a setting, for a true sample from X, B should be equally likely to be 0 or 1, but , i.e. it is deterministic.)
Of course, this is a problem for both my proposal and for the Lightcone theorem -- in either case you can't view as a latent that generates (which seems to be the main motivation, though I'm still not quite sure why that's the motivation).
The Lightcone Theorem says: conditional on , any sets of variables in which are a distance of at least apart in the graphical model are independent.
I am confused. This sounds to me like:
If you have sets of variables that start with no mutual information (conditioning on ), and they are so far away that nothing other than could have affected both of them (distance of at least ), then they continue to have no mutual information (independent).
Some things that I am confused about as a result:
(cross-posted from EAF, thanks Richard for suggesting. There's more back-and-forth later.)
I'm not very compelled by this response.
It seems to me you have two points on the content of this critique. The first point:
I'm pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursuing a hits-based approach to research, should EA funders be obligated to fund them?
Presumably you would want to say "the team will be good at hits-based research such that we can expect a future hit, for X, Y and Z reasons". I think you should actually say those X, Y and Z reasons so that the authors of the critique can engage with them; I assume that the authors are implicitly endorsing a claim like "there aren't any particularly strong reasons to expect Conjecture to do more impactful work in the future".
The second point:
Hmm, it seems extremely reasonable to me to take as a baseline prior that the VCs are profit-motivated, and the authors explicitly say
The fact that people who work(ed) at Conjecture say otherwise means that (probably) someone is wrong, but I don't see a strong reason to believe that it's the OP who is wrong.
At the meta level you say:
And in your next comment:
But afaict, the only point where you actually disagree with a claim made in the OP (excluding recommendations) is in your assessment of VCs? (And in that case I feel very uncompelled by your argument.)
In what way has the OP failed to say true things? Where should they have had more uncertainty? What things did they present as facts which were actually feelings? What claim have they been confident about that they shouldn't have been confident about?
(Perhaps you mean to say that the recommendations are overconfident. There I think I just disagree with you about the bar for evidence for making recommendations, including ones as strong as "alignment researchers shouldn't work at organization X". I've given recommendations like this to individual people who asked me for a recommendation in the past, on less evidence than collected in this post.)