I think this is related to the word problem for the rewriting system defined by your programming language. When I first read this question I was thinking "Something to do with Church-Rosser?" -- but you can follow the links to see for yourself if that literature is what you're after.
Didn't watch the video but would have read the post. Might watch the video only because previous posts have been appetising enough.
One misgiving I have about the illustrated format is that it's less accessible than text. I hope the authors of work in this format keep the needs of a wide variety of readers in mind.
the objective of agent-designers is to have the agent collect as many agents as possible
Typo: should say "dollars"?
if the daemon is obfuscated, there is no efficient procedure which takes the daemon circuit as input and produces a smaller circuit that still solves the problem.
So we can't find any efficient constructive argument. That rules out most of the obvious strategies.
I don't think the procedure needs to be efficient to solve the problem, since we only care about existence of a smaller circuit (not an efficient way to produce it).
I don't think this question has much intrinsic importance, because almost all realistic learning procedures involve a strong simplicity prior (e.g. weight sharing in neural networks).
Does this mean you do not expect daemons to occur in practice because they are too complicated?
Thanks for a great post! I have a small confusion/nit regarding natural selection. Despite its name, I don't think it's a good exemplar of a selection process. Going through the features of a selection process from the start of the post:
I'd love to know why natural selection seemed obvious as an example of a selection process, since it did not to me due to its poor score on the checklist above.
I like this post because it pushes us to be more precise about what we mean by corrigibility. Nice example.
Nice post! Do you have a link to an explanation of what counterfactual mugging is and why it's a good thing?
For subagent alignment problems, is there an interesting distinction to be drawn between the limited agent being able to understand the process by which the more powerful agent becomes powerful, versus not even understanding that? (What would it mean to "understand the process"? I suppose it means being able to validate certain relevant facts about the process though not enough to know exactly what results from it.)
More specifically, it seems that your c must include information about how to interpret the X bits. Right? So it seems slightly wrong to say "R is the largest number that can be specified in X bits of information" as long as c stays fixed. c might grow as the specification scheme changes.
Alternatively, you might just be wrong in thinking that 30 bits are enough to specify 3^^^^3. If c indicates that the number of additional universes is specified by a standard binary-encoded number, 30 bits only gets you about a billion.