xrchz

xrchz's Comments

Reframing Impact

One misgiving I have about the illustrated format is that it's less accessible than text. I hope the authors of work in this format keep the needs of a wide variety of readers in mind.

Creating Environments to Design and Test Embedded Agents
the objective of agent-designers is to have the agent collect as many agents as possible

Typo: should say "dollars"?

Open question: are minimal circuits daemon-free?
if the daemon is obfuscated, there is no efficient procedure which takes the daemon circuit as input and produces a smaller circuit that still solves the problem.
So we can't find any efficient constructive argument. That rules out most of the obvious strategies.

I don't think the procedure needs to be efficient to solve the problem, since we only care about existence of a smaller circuit (not an efficient way to produce it).

Open question: are minimal circuits daemon-free?
I don't think this question has much intrinsic importance, because almost all realistic learning procedures involve a strong simplicity prior (e.g. weight sharing in neural networks).

Does this mean you do not expect daemons to occur in practice because they are too complicated?

Selection vs Control

Thanks for a great post! I have a small confusion/nit regarding natural selection. Despite its name, I don't think it's a good exemplar of a selection process. Going through the features of a selection process from the start of the post:

  • can directly instantiate any element of the search space. No: natural selection can only make local modifications to previously instantiated points. But you already dealt with this local search issue in Choices Don't Change Later Choices.
  • gets direct feedback on the quality of each element. Yes.
  • quality of element does not depend on previous choices. No, the evaluation of an element in natural selection depends a great deal on previous choices because they usually make up important parts of its environment. I think this is the thrust of the claim that natural selection is online (which I agree with).
  • only the final output matters. No? From the perspective of natural selection, I think the quality of the current output is what matters.

I'd love to know why natural selection seemed obvious as an example of a selection process, since it did not to me due to its poor score on the checklist above.

Boeing 737 MAX MCAS as an agent corrigibility failure

I like this post because it pushes us to be more precise about what we mean by corrigibility. Nice example.

Logical Updatelessness as a Robust Delegation Problem

Nice post! Do you have a link to an explanation of what counterfactual mugging is and why it's a good thing?

For subagent alignment problems, is there an interesting distinction to be drawn between the limited agent being able to understand the process by which the more powerful agent becomes powerful, versus not even understanding that? (What would it mean to "understand the process"? I suppose it means being able to validate certain relevant facts about the process though not enough to know exactly what results from it.)

New Pascal's Mugging idea for potential solution

More specifically, it seems that your c must include information about how to interpret the X bits. Right? So it seems slightly wrong to say "R is the largest number that can be specified in X bits of information" as long as c stays fixed. c might grow as the specification scheme changes.

Alternatively, you might just be wrong in thinking that 30 bits are enough to specify 3^^^^3. If c indicates that the number of additional universes is specified by a standard binary-encoded number, 30 bits only gets you about a billion.

Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

They're not yet close to being taken over by AI, but there has been research on automating all of the above. Some possibly relevant keywords: automated theorem proving, and program synthesis.

Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

They're not yet close to being taken over by AI, but there has been research on automating all of the above. Some possibly relevant keywords: automated theorem proving, and program synthesis.

Load More