Vladimir_Nesov

Comments

Bob Jacobs's Shortform

(I think making arguments clear is more meaningful than using them for persuasion.)

Bob Jacobs's Shortform

It's not clear what "subjective idealism is correct" means, because it's not clear what "a given thing is real" means (at least in the context of this thread). It should be more clear what a claim means before it makes sense to discuss levels of credence in it.

If we are working with credences assigned to hypotheticals, the fact that the number of disjoint hypotheticals incompatible with some hypothetical S is large doesn't in itself make them (when considered altogether) more probable than S. (A sum of an infinite number of small numbers can still be small.)

Working with credences in hypotheticals is not the only possible way to reason. If we are talking about weird things like subjective idealism, assumptions about epistemics are not straightforward and should be considered.

Null-boxing Newcomb’s Problem

If the trickster god personally reads the prediction, their behavior can depend on the prediction, which makes diagonalization possible (ask the trickster god what the prediction was, then do the opposite). This calls the claim of 100% precision of the predictor into question (or at least makes the details of its meaning relevant).

Maximal Ventilation

Not doubting it at all

I don't think doubting should be either socially frowned-upon or uncalled-for where one doesn't see an argument that makes a claim evident.

Why Ranked Choice Voting Isn't Great

Maybe "near-fatal" is too strong a word, the comment I replied to also had examples. Existence of examples doesn't distinguish winning from survival, seeing some use. I understand the statement I replied to as meaning something like "In 200 years, if the world remains mostly as we know it, the probability that most elections use cardinal voting methods is above 50%". This seems implausible to me for the reasons I listed, hence the question about what you actually meant, perhaps my interpretation of the statement is not what you intended. (Is "long run" something like 200 years? Is "winning" something like "most elections of some kind use cardinal voting methods"?)

Why Ranked Choice Voting Isn't Great

Cardinal voting methods will win in the long run.

(What kind of long run? Why is this to be expected?) Popularity is not based on only merit, being more complicated than the simplest most familiar method sounds like a near-fatal disadvantage. Voting being involved with politics makes it even harder for good arguments to influence what actually happens.

Maybe Lying Doesn't Exist

The problem with unrestrained consequentialism is that it accepts no principles in its designs. An agent that only serves a purpose has no knowledge of the world or mathematics, it makes no plans and maintains no goals. It is what it needs to be, and no more. All these things are only expressed as aspects of its behavior, godshatter of the singular purpose, but there is no part that seeks excellence in any of the aspects.

For an agent designed around multiple aspects, its parts rely on each other in dissimilar ways, not as subagents with different goals. Access to knowledge is useful for planning and can represent goals. Exploration and reflection refine knowledge and formulate goals. Planning optimizes exploration and reflection, and leads to achievement of goals.

If the part of the design that should hold knowledge accepts a claim for reasons other than arguments about its truth, the rest of the agent can no longer rely on its claims as reflecting knowledge.

Of course you'd have to also patch the specification

In my comment, I meant the situation where the specification is not patched (and by specification in the programming example I meant the informal description on the level of procedures or datatypes that establishes some principles of what it should be doing).

In the case of appeal to consequences, the specification is a general principle that a map reflects the territory to the best of its ability, so it's not a small thing to patch. Optimizing a particular belief according to the consequences of holding it violates this general specification. If the general specification is patched to allow this, you no longer have access to straightforwardly expressed knowledge (there is no part of cognition that satisfies the original specification).

Alternatively, specific beliefs could be marked as motivated, so the specification is to have two kinds of beliefs, with some of them surviving to serve the original purpose. This might work, but then actual knowledge that corresponds to the motivated beliefs won't be natively available, and it's unclear what the motivated beliefs should be doing. Will curiosity act on the motivated beliefs, should they be used for planning, can they represent goals? A more developed architecture for reliable hypocrisy might actually do something sensible, but it's not a matter of merely patching particular beliefs.

Maybe Lying Doesn't Exist

correctly weigh these kinds of considerations against each on a case by case basis

The very possibility of intervention based on weighing map-making and planning against each other destroys their design, if they are to have a design. It's similar to patching a procedure in a way that violates its specification in order to improve overall performance of the program or to fix an externally observable bug. In theory this can be beneficial, but in practice the ability to reason about what's going on deteriorates.

Towards a mechanistic understanding of corrigibility

I agree that exotic decision algorithms or preference transformations are probably not going to be useful for alignment, but I think this kind of activity is currently more fruitful for theory building than directly trying to get decision theory right. It's just that the usual framing is suspect: instead of exploration of the decision theory landscape by considering clearly broken/insane-acting/useless but not yet well-understood constructions, these things are pitched (and chosen) for their perceived use in alignment.

Load More