Vladimir_Nesov's Comments

Why Ranked Choice Voting Isn't Great

Maybe "near-fatal" is too strong a word, the comment I replied to also had examples. Existence of examples doesn't distinguish winning from survival, seeing some use. I understand the statement I replied to as meaning something like "In 200 years, if the world remains mostly as we know it, the probability that most elections use cardinal voting methods is above 50%". This seems implausible to me for the reasons I listed, hence the question about what you actually meant, perhaps my interpretation of the statement is not what you intended. (Is "long run" something like 200 years? Is "winning" something like "most elections of some kind use cardinal voting methods"?)

Why Ranked Choice Voting Isn't Great

Cardinal voting methods will win in the long run.

(What kind of long run? Why is this to be expected?) Popularity is not based on only merit, being more complicated than the simplest most familiar method sounds like a near-fatal disadvantage. Voting being involved with politics makes it even harder for good arguments to influence what actually happens.

Maybe Lying Doesn't Exist

The problem with unrestrained consequentialism is that it accepts no principles in its designs. An agent that only serves a purpose has no knowledge of the world or mathematics, it makes no plans and maintains no goals. It is what it needs to be, and no more. All these things are only expressed as aspects of its behavior, godshatter of the singular purpose, but there is no part that seeks excellence in any of the aspects.

For an agent designed around multiple aspects, its parts rely on each other in dissimilar ways, not as subagents with different goals. Access to knowledge is useful for planning and can represent goals. Exploration and reflection refine knowledge and formulate goals. Planning optimizes exploration and reflection, and leads to achievement of goals.

If the part of the design that should hold knowledge accepts a claim for reasons other than arguments about its truth, the rest of the agent can no longer rely on its claims as reflecting knowledge.

Of course you'd have to also patch the specification

In my comment, I meant the situation where the specification is not patched (and by specification in the programming example I meant the informal description on the level of procedures or datatypes that establishes some principles of what it should be doing).

In the case of appeal to consequences, the specification is a general principle that a map reflects the territory to the best of its ability, so it's not a small thing to patch. Optimizing a particular belief according to the consequences of holding it violates this general specification. If the general specification is patched to allow this, you no longer have access to straightforwardly expressed knowledge (there is no part of cognition that satisfies the original specification).

Alternatively, specific beliefs could be marked as motivated, so the specification is to have two kinds of beliefs, with some of them surviving to serve the original purpose. This might work, but then actual knowledge that corresponds to the motivated beliefs won't be natively available, and it's unclear what the motivated beliefs should be doing. Will curiosity act on the motivated beliefs, should they be used for planning, can they represent goals? A more developed architecture for reliable hypocrisy might actually do something sensible, but it's not a matter of merely patching particular beliefs.

Maybe Lying Doesn't Exist

correctly weigh these kinds of considerations against each on a case by case basis

The very possibility of intervention based on weighing map-making and planning against each other destroys their design, if they are to have a design. It's similar to patching a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug. In theory this can be beneficial, but in practice the ability to reason about what's going on deteriorates.

Towards a mechanistic understanding of corrigibility

I agree that exotic decision algorithms or preference transformations are probably not going to be useful for alignment, but I think this kind of activity is currently more fruitful for theory building than directly trying to get decision theory right. It's just that the usual framing is suspect: instead of exploration of the decision theory landscape by considering clearly broken/insane-acting/useless but not yet well-understood constructions, these things are pitched (and chosen) for their perceived use in alignment.

TurnTrout's shortform feed

I suspect that it doesn't matter how accurate or straightforward a predictor is in modeling people. What would make prediction morally irrelevant is that it's not noticed by the predicted people, irrespective of whether this happens because it spreads the moral weight conferred to them over many possibilities (giving inaccurate prediction), keeps the representation sufficiently baroque, or for some other reason. In the case of inaccurate prediction or baroque representation, it probably does become harder for the predicted people to notice being predicted, and I think this is the actual source of moral irrelevance, not those things on their own. A more direct way of getting the same result is to predict counterfactuals where the people you reason about don't notice the fact that you are observing them, which also gives a form of inaccuracy (imagine that your predicting them is part of their prior, that'll drive the counterfactual further from reality).

Open & Welcome Thread - September 2019

This still puts these comments in Recent Comments on GreaterWrong, and the fact that they can't be seen on the LessWrong All Comments page is essentially a bug.

A Critique of Functional Decision Theory

By the way, selfish values seem related to the reward vs. utility distinction. An agent that pursues a reward that's about particular events in the world rather than a more holographic valuation seems more like a selfish agent in this sense than a maximizer of a utility function with a small-in-space support. If a reward-seeking agent looks for reward channel shaped patterns instead of the instance of a reward channel in front of it, it might tile the world with reward channels or search the world for more of them or something like that.

Proving Too Much (w/ exercises)

"I think, therefore I am."

(This is also incorrect, because considering a thinking you in a counterfactual makes sense. Many UDTish examples demonstrate that this principle doesn't hold.)

Formalising decision theory is hard

I was never convinced that "logical ASP" is a "fair" problem. I once joked with Scott that we can consider a "predictor" that is just the single line of code "return DEFECT" but in the comments it says "I am defecting only because I know you will defect."

I'm leaning this way as well, but I think it's an important clue to figuring out commitment races. ASP Predictor, DefectBot, and a more general agent will make different commitments, and these things are already algorithms specialized for certain situations. How is the chosen commitment related to what the thing making the commitment is?

When an agent can manipulate a predictor in some sense, what should the predictor do? If it starts scheming with its thoughts, it's no longer a predictor, it's just another agent that wants to do something "predictory". Maybe it can only give up, as in ASP, which acts as a precommitment that's more thematically fitting for a predictor than for a general agent. It's still a commitment race then, but possibly the meaning of something being a predictor is preserved by restricting the kind of commitment that it makes: the commitment of a non-general agent is what it is rather than what it does, and a general agent is only committed to its preference. Thus a general agent loses all knowledge in an attempt to out-commit others, because it hasn't committed to that knowledge, didn't make it part of what it is.

Load More