Nandi

Comments

Splitting Debate up into Two Subsystems

I agree that if you score an oracle based on how accurate it is, then it is incentivized to steer the world towards states where easy questions get asked.

I think that in these considerations it matters how powerful we assume the agent to be. You made me realize that specifying the scope and detailing the application area of the proposed approach better could have made my post more interesting. In many cases making the world more predictable may be very difficult for the agent, compared to causing the human to predict the world better. In the short term I think deploying an agentic oracle could be safe.

Splitting Debate up into Two Subsystems
I think Bostrom might have mentioned this problem (educating someone on a topic) somewhere.

Cool! I'm not familiar with it

Splitting Debate up into Two Subsystems

In the case that the epistemic helper can explain us enough for us to come up with solutions ourselves, the info helper is as useful by itself.

However, sometimes even if we get educated about a domain or problem, we may not be creative enough to propose good solutions ourselves. In such cases we would need an agent to propose options to us. It would be good if an agent that gets trained to come up with solutions that we approve of is not the same agent that explains to us why we should or should not approve of a solution (because if it were, it would have an incentive to convince us).