I'm quite curious what kind of decision algorithm a CDT agent might implement in a successor AI, but I've only found a few vague references. Are there any good posts/papers/etc about this?
Definition A: > a system is ascription universal if, relative to our current epistemic state, its explicit beliefs contain just as much information as any other way of ascribing beliefs to it. Definition B: > a family of question-answering systems A[·] are ascription universal (w.r.t. 𝔼 and L) if A[C]...
I was surprised to have never heard of an organization working on long-term AI safety research, and can't find specific information on its work or connection to the AI safety community.
I've recently started working through AI safety posts written on LessWrong 1-3 years ago; in doing so I occasionally have questions/comments about the material. Is it considered good practice/in line with LW norms to write these as comments on the original, old posts? One hand I can see why "necro-ing"...
The idea of consequentialist agents arising in sufficiently strong optimizing systems intuitively makes sense to me. However, I don't have a good mental model of the differences between a world where optimization daemons can arise and a world where they can't (i.e. what facts about the world provide Bayesian evidence...