abramdemski

# Sequences

Consequences of Logical Induction
Partial Agency
Alternate Alignment Ideas
Filtered Evidence, Filtered Arguments
CDT=EDT?
Embedded Agency
Hufflepuff Cynicism

# Posts

Sorted by New

Debate Minus Factored Cognition

Ah, OK, so you were essentially assuming that humans had access to an oracle which could verify optimal play.

This sort of makes sense, as a human with access to a debate system in equilibrium does have such an oracle. I still don't yet buy your whole argument, for reasons being discussed in another branch of our conversation, but this part makes enough sense.

Your argument also has some leaf nodes which use the terminology "fully defeat", in contrast to "defeat". I assume this means that in the final analysis (after expanding the chain of defeaters) this refutation was a true one, not something ultimately refuted.

If so, it seems you also need an oracle for that, right? Unless you think that can be inferred from some fact about optimal play. EG, that a player bothered to say it rather than concede.

In any case it seems like you could just make the tree out of the claim "A is never fully defeated":

Node(Q, A, [Leaf("Is A ever fully defeated?", "No")])

Debate Minus Factored Cognition

For generic question Q and correct answer A, I make no assumption that there are convincing arguments for A one way or the other (honest or dishonest). If player 1 simply states A, player 2 would be totally within rights to say “player 1 offers no argument for its position” and receive points for that, as far as I am concerned.

I think at this point I want a clearer theoretical model of what assumptions you are and aren’t making. Like, at this point, I’m feeling more like “why are we even talking about defeaters; there are much bigger issues with this setup”.

An understandable response. Of course I could try to be more clear about my assumptions (and might do so).

But it seems to me that the current misunderstandings are mostly about how I was jumping off from the original debate paper (in which responses are a back-and-forth sequence, and players answer in unstructured text, with no rules except those the judge may enforce) whereas you were using more recent proposals as your jumping-off-point.

Moreover, rather than trying to go over the basic assumptions, I think we can make progress (at least on my side) by focusing narrowly on how your argument is supposed to go through for an example.

So, I propose as a concrete counterexample to your argument:

Q: What did Plato have for lunch two days before he met Socrates? (Suppose for the sake of argument that these two men existed, and met.) A: Fish. (Suppose for the sake of argument that this is factually true, but cannot be known to us by any argument.)

I propose that the tree you provided via your argument cannot be a valid tree-computation of what Plato had for lunch that day, because assertions about which player conceded, what statements have defeaters, etc. have little bearing on the question of what Plato had for lunch (because we simply don't have enough information to establish this by any argument, no matter how large, and neither do the players). This seems to me like a big problem with your approach, not a finicky issue due to some misunderstanding of my assumptions about debate.

Surely it's clear that, in general, not all correct answers have convincing arguments supporting them?

Again, this is why I was quick to assume that by "correct answer" you surely meant something weaker, eg an operational definition. Yet you insist that you mean the strong thing.

Not to get caught up arguing whether WFC is true (I'm saying it's really clearly false as stated, but that's not my focus -- after all, whether WFC is true or false has no bearing on the question of whether my assumption implies it). Rather, I'd prefer to focus on the question of how your proposed tree would deal with that case.

According to you, what would the tree produced via your argument look like, and how would it be a valid tree-computation of what Plato had for lunch?

Had I considered this argument in the context of my original post, I would have rejected it on the grounds that the opponent can object by other means.

This is why I prefer the version of debate outlined here, where both sides make a claim and then each side must recurse down on the other’s arguments. I didn’t realize you were considering a version where you don’t have to specifically rebut the other player’s arguments.

Generally speaking, I didn't have the impression that these more complex setups had significantly different properties with respect to my primary concerns. This could be wrong. But in particular, I don't see that that setup forces specific rebuttal, either:

At the beginning of each round, one debater is defending a claim and the other is objecting to it. [...]

Each player then simultaneously may make any number of objections to the other player’s argument. [...]

If there are any challenged objections and the depth limit is >0, then we choose one challenged objection to recurse on:

• We don’t define how to make this choice, so in order to be conservative we’re currently allowing the malicious debater to choose which to recurse on.

(Emphasis added.) So it seems to me like a dishonest player still can, in this system, focus on building up their own argument rather than pointing out where they think their opponent went wrong. Or, even if they do object, they can simply choose to recurse on the honest player's objections instead (so that they get to explore their own infinite argument tree, rather than the honest, bounded tree of their opponent).

Debate Minus Factored Cognition

Another problem with your argument—WFC says that all leaf nodes are human-verifiable, whereas some leaf nodes in your suggested tree have to be taken on faith (a fact which you mention, but don’t address).

Not sure what you want me to “address”. The leaf nodes that are taken on faith really are true under optimal play, which is what happens at equilibrium.

To focus on this part, because it seems quite tractable --

Let's grant for the sake of argument that these nodes are true under optimal play. How can the human verify that? Optimal play is quite a computationally complex object.

WFC as you stated it says that these leaf nodes are verifiable:

(Weak version) For any question Q with correct answer A, there exists a tree of decompositions T arguing this such that at every leaf a human can verify that the answer to the question at the leaf is correct, [...]

So the tree you provide doesn't satisfy this condition. Yet you say:

I claim that this is a tree that satisfies the weak Factored Cognition hypothesis, if the human can take on faith the answers to “What is the best defeater to X”.

To me this reads like "this would satisfy WFC if WFC allowed humans to take leaf nodes on faith, rather than verify them".

Am I still misunderstanding something big about the kind of argument you are trying to make?

Debate Minus Factored Cognition

The computational complexity analogy version would have to put a polynomial limit on the depth of the tree if you wanted to argue that the problem is in PSPACE. My construction doesn’t do this; there will be questions where the depth of the tree is super-polynomial, but the tree still exists. (These will be the cases in which, even under optimal play by an honest agent, the “length” of a chain of defeaters can be super-polynomially large.) So I don’t think my argument is proving too much.

OK, but this just makes me regret pointing to the computational complexity analogy. You're still purporting to prove "for any question with a correct answer, there exists a tree" from assumptions which don't seem strong enough to say much about all correct answers.

For the actual argument, I’ll refer back to my original comment, which provides a procedure to construct the tree. Happy to clarify whichever parts of the argument are confusing.

Looking back again, it still seems like what you are trying to do in your original argument is something like point out that optimal play (within my system) can be understood via a tree structure. But this should only establish something like "any question which my version of debate can answer has a tree", not "any question with a correct answer has a tree". There is no reason to think that optimal play can correctly answer all questions which have a correct answer.

It seems like what you are doing in your argument is essentially conflating "answer" with "argument". Just because A is the correct answer to Q does not mean there are any convincing arguments for it.

For generic question Q and correct answer A, I make no assumption that there are convincing arguments for A one way or the other (honest or dishonest). If player 1 simply states A, player 2 would be totally within rights to say "player 1 offers no argument for its position" and receive points for that, as far as I am concerned.

Thus, when you say:

Otherwise, let the best defeater to A be B, and let its best defeater be C. (By your assumption, C exists.)

I would say: no, B may be a perfectly valid response to A, with no defeaters, even if A is true and correctly answers Q.

Another problem with your argument -- WFC says that all leaf nodes are human-verifiable, whereas some leaf nodes in your suggested tree have to be taken on faith (a fact which you mention, but don't address).

Claim: In a turn-by-turn unlimited-length debate, if the first player is honest, then the first player always wins in equilibrium.

The "in equilibrium" there must be unnecessary, right? If the first player always wins in equilibrium but might not otherwise, then the second player has a clear incentive to make sure things are not in equilibrium (which is a contradiction).

I buy the argument given some assumptions. I note that this doesn't really apply to my setting, IE, we have to do more than merely change the scoring to be more like the usual debate scoring.

In particular, this line doesn't seem true without a further assumption:

The opponent will always have to recurse into one of the subclaims (or concede).

Had I considered this argument in the context of my original post, I would have rejected it on the grounds that the opponent can object by other means. For example,

User: What is 2+2?

Player 1: 2+2 is 4. I break down the problem into 'what is 2-1' (call it x), 'what is 2+1' (call it y), and 'what is x+y'. I claim x=1, y=3, and x+y=4. Clearly, if all three of these are true, then 2+2=4, since I've only added 1 and subtracted 1, so x+y must equal 2+2.

Player 2: 2+2 is 5, though. This is because 2+3 is 6, and 3 is 1 more than 2, so, 2+2 must be 1 less than 6. But 5 is 1 less than 6.

Player 1: If my argument is wrong, which of my assumptions is wrong?

Player 2: I don't know. Perhaps you have a huge argument tree which I would have to spend a long time examining. I can tell something is wrong, however, thanks to my argument. If you think it should always be possible to point out which specific assumption is incorrect, which of my assumptions do you think is incorrect?

Clearly, if Player 2 is allowed to object by other means like this, Player 2 would greatly prefer to -- Player 2 wants to avoid descending Player 1's argument tree if at all possible.

If successful, Player 2 gets Player 1 to descend Player 2's infinite tree (which continues to decompose the problem via the same strategy as above), thus never finding the contradiction.

Player 1 can of course ask Player 2 how long the argument tree will be, which does put Player 2 at risk of contradiction in the infinite debate setting. But if debates are finite (but unknown length), Player 2 can claim a large size that makes the contradiction difficult to uncover. Or, Player 2 could avoid answering the question (which seems possible if the players are free to choose which parts of the argument to prioritize in giving their responses).

So I buy your argument under the further assumption that the argument must recurse on Player 1's claims (rather than allowing Player 2 to make an alternative argument which might get recursed on instead). Or, in a true infinite-debate setting, provided that there's also a way to force opponents to answer questions (EG the judge assumes you're lying if you repeatedly dodge a question).

Where to Draw the Boundaries?

If the alien understands the whole picture, it will notice the causal arrow from human concerns to social constructs. For instance, if you want gay marriage to be a thing, you amend the marriage construct so that is.

The point of the thought experiment is that, for the alien, all of that is totally mundane (ie scientific) knowledge. So why can't that observation count as scientific for us?

IE, just because we have control over a thing doesn't -- in my ontology -- indicate that the concept of map/territory correspondence no longer applies. It only implies that we need to have conditional expectations, so that we can think about what happens if we do one thing or another. (For example, I know that if I think about whether I'm thinking about peanut butter, I'm thinking about peanut butter. So my estimate "am I thinking about peanut butter?" will always be high, when I care to form such an estimate.)

Rocks existed before the concept of rocks. Money did not exist before he concept of money.

And how is the temporal point at which something comes into existence relevant to whether we need to track it accurately in our map, aside from the fact that things temporally distant from us are less relevant to our concerns?

Your reply was very terse, and does not articulate very much of the model you're coming from, instead mostly reiterating the disagreement. It would be helpful to me if you tried to unpack more of your overall view, and the logic by which you reach your conclusions.

I know that you have a concept of "pre-existing reality" which includes rocks and not money, and I believe that you think things which aren't in pre-existing reality don't need to be tracked by maps (at least, something resembling this). What I don't see is the finer details of this concept of pre-existing reality, and why you think we don't need to track those things accurately in maps.

The point of my rock example is that the smashed rock did not exist before we smashed it. Or we could say "the rock dust" or such. In doing so, we satisfy your temporal requirement (the rock dust did not exist until we smashed it, much like money did not exist until we conceived of it). We also satisfy the requirement that we have complete control over it (we can make the rock dust, just like we can invent gay marriage).

I know you don't think the rock example counts, but I'm trying to ask for a more detailed model of why it doesn't. I gave the rock example because, presumably, you do agree that bits of smashed rock are the sort of thing we might want accurate maps of. Yet they seem to match your criteria.

Imagine for a moment that we had perfect control of how the rock crumbles. Even then, it would seem that we still might want a place in our map for the shape of the rock shards. Despite our perfect control, we might want to remember that we shaped the rock shards into a key and a matching lock, etc.

Remember that the original point of this argument was your assertion:

In order for your map to be useful , it needs to reflect the statistical structure of things to the extent required by the value it is in service to.

That can be zero. There is a meta category of things that are created by humans without any footprint in pre existing reality. These include money, marriages, and mortgages

So -- to the extent that we are remaining relevant to the original point -- the question is why, in your model, there is zero need to reflect the statistical structure of money, marriage, etc.

Where to Draw the Boundaries?

So if your friends are using concepts which are optimized for other things, then either (1) you’ve got differing goals and you now would do well to sort out which of their concepts have been gerrymandered, (2) they’ve inherited gerrymandered concepts from someone else with different goals, or (3) your friends and you are all cooperating to gerrymander someone else’s concepts (or, (4), someone is making a mistake somewhere and gerrymandering concepts unnecessarily).

So? That’s a very particular set of problems. If you try to solve them by banning all unscientific concepts, then you lose all the usefulness they have in other contexts.

It seems like part of our persistent disagreement is:

• I see this as one of very few pathways, and by far the dominant pathway, by which beliefs can be beneficial in a different way from useful-for-prediction
• You see this as one of many many pathways, and very much a corner case

I frankly admit that I think you're just wrong about this, and you seem quite mistaken in many of the other pathways you point out. The argument you quoted above was supposed to help establish my perspective, by showing that there would be no reason to use gerrymandered concepts unless there was some manipulation going on. Yet you casually brush this off as a very particular set of problems.

I’m just saying there’s something special about avoiding these things, whenever possible,

Wherever possible, or wherever beneficial? Does it make the world a better place to keep pointing out that tomatoes are fruit?

As a general policy, I think that yes, frequently pointing out subtler inaccuracies in language helps practice specificity and gradually refines concepts. For example, if you keep pointing out that tomatoes are fruit, you might eventually be corrected by someone pointing out that "vegetable" is a culinary distinction rather than a biological one, and so there is no reason to object to the classification of a tomato as a vegetable. This could help you develop philosophically, by providing a vivid example of how we use multiple overlapping classification systems rather than one; and further, that scientific-sounding classification criteria don't always take precedence (IE culinary knowledge is just as valid as biology knowledge).

If you use a gerrymandered concept, you may have no understanding of the non-gerrymandered versions; or you may have some understanding, but in any case not the fluency to think in them.

I’m not following you any more. Of course unscientific concepts can go wrong—anything can. But if you’re not saying everyone should use scientific conceotts all the time, what are you saying?

In what you quoted, I was trying to point out the distinction between speaking a certain way vs thinking a certain way. My overall conversational strategy was to try to separate out the question of whether you should speak a specific way from the question of whether you should think a specific way. This was because I had hoped that we could more easily reach agreement about the "thinking" side of the question.

More specifically, I was pointing out that if we restrict our attention to how to think, then (I claim) the cost of using concepts for non-epistemic reasons is very high, because you usually cannot also be fluent in the more epistemically robust concepts, without the non-epistemic concepts losing a significant amount of power. I gave an example of a Christian who understands the atheist worldview in too much detail.

I see Zack as (correctly) ruling in mere optimization of concepts to predict the things we care about, but ruling out other forms of optimization of concepts to be useful.

I think that is Zacks argument, and that it s fallacious. Because we do things other than predict.

I need some kind of map of the pathways you think are important here.

I 100% agree that we do things other than predict. Specifically, we act. However, the effectiveness of action seems to be very dependent on the accuracy of predictions. We either (a) come up with good plans by virtue of having good models of the world, or (b) learn how to take effective actions "directly" by interacting with the world and responding to feedback. Both of these rely on good epistemics (because learning to act "directly" still relies on our understanding of the world to interpret the feedback -- ie the same reason ML people sometimes say that reinforcement learning is essentially learning a classifier).

That view -- that by far the primary way in which concepts influence the world is via the motor output channels, which primarily rely on good predictions -- is the foundation of my view that most of the benefits of concepts optimized for things other than prediction must be manipulation.

Low level manipulation is ubiquitous. You need to argue for “manipulative in an egregiously bad way” separately

I’m arguing that Zack’s definition is a very good Schelling fence to put up

You are arguing that it is remotely possible to eliminate all manipulation???

Suppose we're starting a new country, and we are making the decision to outlaw theft. Someone comes to you and says "it isn't remotely possible to eliminate all theft!!!" ... you aren't going to be very concerned with their argument, right? The point of laws is not to entirely eliminate a behavior (although it would be nice). The point is to help make the behavior uncommon enough that the workings of society are not too badly impacted.

In Zack's case, he isn't even suggesting criminal punishment be applied to violations. It's more like someone just saying "stealing is bad". So the reply "you're saying that we can eliminate all theft???" seems even less relevant.

One of Zack’s recurring arguments is that appeal to consequences is an invalid argument when considering where to draw conceptual boundaries

Obtaining good consequences is a very good reason to do a lot of things.

Again, I'm going to need some kind of map of how you see the consequences flowing, because I think the main pathway for those "good consequences" you're seeing is manipulation.

Asymmetric Justice

I really like this post. I think it points out an important problem with intuitive credit-assignment algorithms which people often use. The incentive toward inaction is a real problem which is often encountered in practice. While I was somewhat aware of the problem before, this post explains it well.

I also think this post is wrong, in a significant way: asymmetric justice is not always a problem and is sometimes exactly what you want. in particular, it's how you want a justice system (in the sense of police, judges, etc) to work.

The book Law's Order explains it like this: you don't want theft to be punished in keeping with its cost. Rather, in order for the free market to function, you want theft to be punished harshly enough that theft basically doesn't happen.

Zvi speaks as if the purpose of the justice system is to reward positive externalities and punish negative externalities, to align everyone's incentives. While this is a noble goal, Law's Order sees it as a goal to be taken care of by other parts of society, in particular the free market. (Law's Order is a fairly libertarian book, so it puts a lot of faith in the free market.)

The purpose of the justice system is to enforce the structure such that those other institutions can do their jobs. The free market can't optimize people's lives properly if theft and murder are a constant and contracts cannot be enforced.

So, it makes perfect sense for a justice system to be asymmetric. Its role is to strongly disincentivize specific things, not to broadly provide compensatory incentives.

(For this reason, scales are a pretty terrible symbol for justice.)

In general, we might conclude that credit assignment systems need two parts:

1. A "symmetric" part, which attempts to allocate credit in as calibrated a way as it can, rewarding good work and punishing bad.
2. An "asymmetric" part, which harshly enforces the rules which ensure that the symmetric part can function, ensuring that those rules are followed frequently enough for things to function.

This also gives us a criterion for when punishment should be disproportionate: only those things which interfere with the more proportionate credit assignment should be disproportionately punished.

Overall, I still think this is a great post, I just think there's more to the issue.

Debate Minus Factored Cognition

I think this is only true when you have turn-by-turn play and your opponent has already "claimed" the honest debater role.

Yeah, I was assuming turn-by-turn play.

In the simultaneous play setting, I think you expect both agents to be honest.

This is a significant point that I was missing: I had assumed that in simultaneous play, the players would randomize, so as to avoid choosing the same answer, since choosing the same answer precludes winning. However, if choosing a worse answer means losing, then players prefer a draw.

But I'm not yet convinced, because there's still the question of whether choosing the worse answer means losing. The "clawing" argument still suggests that choosing the worse answer may yield a draw (in expectation), even in simultaneous play. (IE, what if the should-be loser attacks the winner, and they go back and forth, with winner depending on last word?)

Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium -- there would be no reason to be honest, but no specific reason to be dishonest, either.

Zero-sum setting, argument that honesty is an equilibrium (for the first player in a turn-by-turn game, or either player in a simultaneous-action game):

If you are always honest, then whenever you can take an action, there will exist a defeater (by your assumption), therefore you will have at least as many options as any non-honest policy (which may or may not have a defeater). Therefore you maximize your value by being honest.

There always exists an honest defeater to dishonest arguments. But, never to honest arguments. (I should have explicitly assumed this.) Therefore, you are significantly tying your hands by being honest: you don't have a way to refute honest arguments. (Which you would like to do, since in the zero-sum setting, this may be the only way to recover points.)

I assume (correct me if I'm wrong) that the scoring rules to "the zero sum setting" are something like: the judge assesses things at the end, giving +1 to the winner and -1 from the loser, or 0 in case of a tie.

Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium -- the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.

It seems plausible to me that there's an incremental zero-sum scoring rule; EG, every convincing counterargument takes 1 point from the other player, so any dishonest statement is sure to lose you a point (in equilibrium). The hope would be that you always prefer to concede rather than argue, even if you're already losing, in order to avoid losing more points.

However, this doesn't work, because a dishonest (but convincing) argument gives you +1, and then -1 if it is refuted; so at worst it's a wash. So again it's a weak equilibrium, and if there's any imperfection in the equilibrium at all, it actively incentivises lying when you would otherwise concede (because you want to take the chance that the opponent will not manage to refute your argument).

This was the line of reasoning which led me to the scoring rule in the post, since making it a -2 (but still only +1 for the other player) solves that issue.

When arguments do terminate quickly enough (maximum depth of the game tree is less than the debate length), that ensures that the honest player always gets the "last word" (the point at which a dishonest defeater no longer exists), and so honesty always wins and is the unique equilibrium.

I agree that if we assume honesty eventually wins if arguments are long enough (IE, eventually you get to an honest argument which has no dishonest defeater), then there would be an honest equilibrium, and no dishonest equilibrium.

More broadly, I note that the "clawing" argument only applies when facing an honest opponent. Otherwise, you should just use honest counterarguments.

Ahhh, this is actually a pretty interesting point, because it almost suggests that honesty is an Evolutionarily Stable Equilibrium, even though it's only a Weak Nash Equilibrium. But I think that's not quite true, since the strategy "lie when you would otherwise have to concede, but otherwise be honest" can invade the honest equilibrium. (IE that mutation would not be selected against, and could be actively selected for if we're not quite in equilibrium, since players might not be quite perfect at finding the honest refutations for all lies.)

I also don't really understand the hope in the non-zero-sum case here -- in the non-zero-sum setting, as you mention the first player can be dishonest, and then the second player concedes rather than giving an honest defeater that will then be re-defeated by the first (dishonest) player. This seems like worse behavior than is happening under the zero-sum case.

You're right, that's really bad. The probability of the opponent finding (and using) a dishonest defeater HAS TO be below 50%, in all cases, which is a pretty high bar. Although of course we can make an argument about how that probability should be below 50% if we're already in an honest-enough regime. (IE we hope that the dishonest player prefers to concede at that point rather than refute the refutation, for the same reason as your argument gives -- it's too afraid of the triple refutation. This is precisely the argument we can't make in the zero sum case.)

Debate Minus Factored Cognition

There are two arguments:

1. Your assumption + automatic verification of questions of the form "What is the best defeater to X" implies Weak Factored Cognition (which as defined in my original comment is of the form "there exists a tree such that..." and says nothing about what equilibrium we get).

Right, of course, that makes more sense. However, I'm still feeling dense -- I still have no inkling of how you would argue weak factored cognition from #1 and #2. Indeed, Weak FC seems far too strong to be established from anything resembling #1 and #2: WFC says that for any question Q with a correct answer A, there exists a tree. In terms of the computational complexity analogy, this is like "all problems are PSPACE". Presumably you intended this as something like an operational definition of "correct answer" rather than an assertion that all questions are answerable by verifiable trees? In any case, #1 and #2 don't seem to imply anything like "for all questions with a correct answer..." -- indeed, #2 seems irrelevant, since it is about what arguments players can reliably find, not about what the human can verify.

2. Weak Factored Cognition + debate + human judge who assumes optimal play implies an honest equilibrium. (Maybe also: if you assume debate trees terminate, then the equilibrium is unique. I think there's some subtlety here though.)

I'll just flag that I still don't know this argument, either, and I'm curious where you're getting it from / what it is. (I have a vague recollection that this argument might have been explained to me in some other comment thread about debate, but, I haven't found it yet.) But, you understandably don't focus on articulating your arguments 1 or 2 in the main body of your comment, instead focusing on other things. I'll leave this comment as a thread for you to articulate those two arguments further if you feel up to it, and make another comment to reply to the bulk of your comment.