All of Morgan_Rogers's Comments + Replies

Generalised models as a category

Re "I'm not fully sold on category theory as a mathematical tool", if someone (e.g. me) were to take the category you've outlined and run with it, in the sense of establishing its general structure and special features, could you be convinced? Are there questions that you have about this category that you currently are only able to answer by brute force computation from the definitions of the objects and morphisms as you've given them? More generally, are there variants of this category that you've considered that it might be useful to study in parallel?

2Stuart_Armstrong9dFor the moment, I'm going to be trying to resolve practical questions of model splintering, and then I'll see if this formalism turns out to be useful for them.
Subagents of Cartesian Frames

I am very experienced in category theory but not the Chu construction (or *-autonomous categories in general). There is a widely used notion of subobject of an object  in a category  as "equivalence class of monomorphisms with codomain ". This differs from your definition most conspicuously in the case of  where there is no morphism from this frame to a typical frame.

If I'm calculating correctly, the standard notion of subobject is strictly stronger than the one you present here (as long as the world  is in... (read more)

Escaping the Löbian Obstacle

Note also that your definition implies that if an agent alieves something, it must also believe it.

I find it interesting that you (seemingly) nodded along with my descriptions, but then proposed a definition which was almost opposite mine!

I don't know how you so misread what I said; I explicitly wrote that aliefs constitute the larger logic, so that beliefs are contained in aliefs (which I'm pretty sure is what you were going for!) and not vice versa. Maybe you got confused because I put beliefs first in this description, or because I described the smaller... (read more)

Escaping the Löbian Obstacle

I like the alief/belief distinction, this seems to carry the distinction I was after. To make it more formal, I'll use "belief" to refer to 'things which an agent can prove in its reasoning engine/language (L)', and "alief" to refer to beliefs plus 'additional assumptions which the agent makes about the bearing of that reasoning on the environment', which together constitute a larger logic (L'). Does that match the distinction you intended between these terms?

An immediate pedagogical problem with this terminology is that we have to be careful not to confla... (read more)

2abramdemski3moUnfortunately, this seems almost opposite to the way I was defining the terminology. I had it that the aliefs are precisely what is proven by the formal system, and the beliefs are what the agent would explicitly endorse if asked. Aliefs are what you feel in your bones. So if the "bones" of the agent are the formal system, that's the aliefs. Note also that your definition implies that if an agent alieves something, it must also believe it. In contrast, part of the point for me is that an agent can alieve things without believing them. I would also allow the opposite, for humans and other probabilistic reasoners, though for pure-logic agents this would have to correspond to unsoundness. But pure-logical agents have to have the freedom to alieve without believing, on pain of inconsistency, even if we can't model belief-without-alief in pure logic. I find it interesting that you (seemingly) nodded along with my descriptions, but then proposed a definition which was almost opposite mine! I think there's probably a deep reason for that (having to do with how difficult it is to reliably distinguish [https://www.lesswrong.com/posts/vpvLqinp4FoigqvKy/reflective-bayesianism#3_] alief/belief), but I'm not grasping it for now. It is a symptom of my confusion in this regard that I'm not even sure we're pointing to different notions of belief/alief even though your definition sounds almost opposite to me. It is well within the realm of possibility that we mean the same thing, and are just choosing very different ways to talk about it. Specifically, your definition seems fine if L is not the formal language which the agent is hard-wired with, but rather, some logic which the agent explicitly endorses (like the relationship that we have with Peano Arithmetic). Then, yeah, "belief" is about provability in L, while "alief" implies that the agent has some "additional assumptions about the bearing of that reasoning on the environment". Totally! But then, this suggests that those a
What are some claims or opinions about multi-multi delegation you've seen in the memeplex that you think deserve scrutiny?

Critch's comments support an opinion I've held since I started thinking seriously about alignment: that the language we use to describe it is too simple, and ignores the fact that "human" interests (the target of alignment) are not the monolith they're usually presented as.

For your specific question about multi-multi, I only have limited access to the memeplex, so I'll just share my thoughts. Multi-multi delegation involves:
1. Compromise / resolution of conflicts of interest between delegators.
2. Mutual trust in delegators regarding communication of intere... (read more)

1Quinn4moThanks! Trust, compromise, and communication are all items in Dafoe et. al. 2020, if you're interested in exploring. I agree that primitive forms of these issues are present in multi-single and single-multi, it's not clear to me whether we should think of solving these primitive forms then solving some sort of extension to multi-multi or if we should think of attacking problems that are unique to multi-multi directly. It's just not clear to me which of those better reflects the nature of what's going on.
Escaping the Löbian Obstacle

Seems like you missed my point that the meta-logical belief could just be "L is sound" rather than "L plus me is sound". Adding the first as an axiom to L is fine (it results in an L' which is sound if L was sound), while adding the second as an axiom is very rarely fine (it proves soundness and consistency of the whole system, so the whole system had better be too weak for Godel's incompleteness theorems to apply).

Aha! I knew I must be missing something, thanks for the clarification. That makes things easier. I'll continue to use L' to mean "L + Sound(L,S... (read more)

2abramdemski3moI'm not insisting on anything about the agent design; I simply remain puzzled how your proposal differs from the one in the paper, and so, continue to assume it's like the one in the paper except those differences which I understand. Your statements have been reading almost like you think you've refuted formal arguments in the paper (implying you don't think the formal agent designs do what I think they do), rather than providing a new agent design which does better. This contributes to my feeling that you are really just wrestling with Lob's theorem itself. Nor do I recall you saying much to dissuade me of this. In your OP, you use the language "Under the belief that the Löbian Obstacle was a genuine problem to be avoided", as if the original idea was simply mistaken. So, to be absolutely clear on this: do you accept the mathematical proofs in the original paper (but propose some way around them), or do you think they are actually mistaken? Do you accept the proof of Lob's theorem itself, or do you think it is mistaken? But using proofs in L is precisely what I understood from "belief"; so, if you have something else in mind, I will need clarification. I think I haven't understood much of this. I would argue that the only way for A's reasoning in L to have practical value is for A to act on such reasoning. The only way for A's reasoning to have practical value for some other agent B is for B to be working under the assumption that provability-in-L implies truth under S. Indeed, this seems in line with your detective drama example. If the superiors were put in the position of the rogue detective, often they would have come to the same conclusion. Yet, looking at the problem from a removed position, they cannot accept the conclusion. But this is precisely the lobstacle: the condition for reasoning to have value to an agent A is different from the condition for A's reasoning to have value to another agent B. Let me therefore make a (not totally formal) distinctio
Escaping the Löbian Obstacle

I should pre-emptively correct my "formal" argument, since it's not true that S can never be in its own codomain; arguably I can construct U so that C(U) contains the names of some semantic maps as elements (although in this purely set-theoretic set-up, it's hard to see how doing so would capture their content). Nonetheless, a diagonalisation argument that depends only on L and C(U) being non-trivial demonstrates that C(U) cannot contain every semantic map, which I think should be enough to salvage the argument.

Escaping the Löbian Obstacle

It seems like you just get a new system, L', which believes in the soundness of L, but which doesn't believe in its own soundness. So the agent can trust agents who use L, but cannot trust agents who additionally have the same meta-logical beliefs which allow them to trust L. Meaning, the agent cannot trust itself.

A doesn't need B to believe that the logic is sound. Even if you decide to present "logic L plus metalogical beliefs" as a larger logic L' (and assuming you manage to do this in a way that doesn't lead to inconsistency), the semantic map is defin... (read more)

2abramdemski4moMy problem is that I still don't understand how you propose for the agent to reason/behave differently than what I've described; so, your statement that it does in fact do something different doesn't help me much, sorry. The semantic map is defined on L, not L' -- sure, this makes some sense? But this just seems to reinforce the idea that our agent can only "understand" the internal logic of agents who restrict themselves to only use L (not any meta-logical beliefs). Seems like you missed my point that the meta-logical belief could just be "L is sound" rather than "L plus me is sound". Adding the first as an axiom to L is fine (it results in an L' which is sound if L was sound), while adding the second as an axiom is very rarely fine (it proves soundness and consistency of the whole system, so the whole system had better be too weak for Godel's incompleteness theorems to apply). Does that make sense? You were talking about the second sort of situation (where adding the meta-logical belief as an axiom would result in an inconsistent system, because it would claim its own soundness); I wanted to point out that we could also be in the first sort of situation (where adding the meta-logical belief would result in a perfectly consistent system, but would only let you trust L, not L'). To me, this "ought" in the sentence reads as a prediction (basically an abuse of 'ought', which is in common usage basically because people make is/ought errors). I would prefer to re-phrase as "if phi is provable in L, then the interpretation of phi will be true" or less ambitiously "will probably be true". Is your proposal that "X is true" should be taken as a statement of X's desirability, instead? Or perhaps X's normativity? That's what it means to put it on the "ought" side, to me. If it means something different to you, we need to start over with the question of what the is/ought divide refers to. I agree that (by Tarski's undefinability theorem) a logic can't know its own seman
1Morgan_Rogers4moI should pre-emptively correct my "formal" argument, since it's not true that S can never be in its own codomain; arguably I can construct U so that C(U) contains the names of some semantic maps as elements (although in this purely set-theoretic set-up, it's hard to see how doing so would capture their content). Nonetheless, a diagonalisation argument that depends only on L and C(U) being non-trivial demonstrates that C(U) cannot contain every semantic map, which I think should be enough to salvage the argument.