ben_levinstein

Posts

Sorted by New

47Still no Lie Detector for LLMs

Ω

10mo

Ω

2

Wiki Contributions

Comments

Meaning & Agency

ben_levinstein4mo10

Interesting post! As a technical matter, I think the notion you want is not reflection (or endorsement) but some version of Total Trust, where (leaving off some nuance) Agent 1 totally trusts Agent 2 if for all $x$ . In general, that's going to be equivalent to Alice being willing to outsource all decision-making to Bob if she's certain Bob has the same basic preferences she does. (It's also equivalent to expecting Bob to be better on all absolutely continuous strictly proper scoring rules, and a few other things.)

Reply

Open-minded updatelessness

ben_levinstein10mo61

I think the basic approach to commitment for the open-minded agent is right. Roughly, you don't actually get to commit your future-self to things. Instead, you just do what you (in expectation) would have committed yourself to given some reconstructed prior.

Just as a literature pointer: If I recall correctly, Chris Meacham's approach in "Binding and Its Consequences" is ultimately to estimate your initial credence function and perform the action from the plan with the highest EU according to that function. He doesn't talk about awareness growth, but open-mindedness seems to fit in nicely within his framework (or at least the framework I recall him having).

Reply