[ Question ]

What's been written about the nature of "son-of-CDT"?

by Liam Donovan 1 min read30th Nov 20196 comments

16


I'm quite curious what kind of decision algorithm a CDT agent might implement in a successor AI, but I've only found a few vague references. Are there any good posts/papers/etc about this?

New Answer
Ask Related Question
New Comment
Write here. Select text for formatting options.
We support LaTeX: Cmd-4 for inline, Cmd-M for block-level (Ctrl on Windows).
You can switch between rich text and markdown in your user settings.

3 Answers

I think I saw a bit on arbital about it

Logical decision theorists use "Son-of-CDT[red link, no such article]" to denote the algorithm that CDT self-modifies to; in general we think this algorithm works out to "LDT about correlations formed after 7am, CDT about correlations formed before 7am".

https://arbital.com/p/logical_dt/?l=5gc

The Retro Blackmail Problem in "Toward Idealized Decision Theory" shows that if CDT can self-modify (i.e., build an agent that follows an arbitrary decision rule), it self-modifies to something that still gives in to some forms of blackmail. This is Son-of-CDT, though they don't use the name.

Mako's answer will be true if it expects to only face problems where it is rewarded based on its output. However, it wouldn't hold in other conditions. For example, if it expected alphabetical agents to be rewarded heavily, it might modify to that.