Sorted by New


Rationalism before the Sequences

I recently learned of a free (donation-funded) service, siftrss.com, wherein you can take an RSS feed and do text-based filtering on any of its fields to produce a new RSS feed.  (I've made a few feeds with it and it seems to work well.)  I suspect you could filter based on the "category" field.

Defending the non-central fallacy

At a glance, I don't think I've seen the following points made, so I'll do so:

  • The general approach from math and the sciences is to make the definitions rigorous, from which the intended conclusions will necessarily follow.  For example, any object of mass 10kg near Earth's surface will experience a force of roughly 98.1 N toward's Earth's center due to Earth's gravity.  There is no "non-central example" of "an object of mass 10kg near Earth's surface"—well, perhaps I should specify what "near" is, for example as being within 1% of 6371 km of the km.  Then quantifying that allows me to quantify "roughly" as well, by plugging in GmM/r^2 for the minimum and maximum of the radius values.
  • Observe how we refine both the conditions and the conclusions in the above.
  • Arguments that have been refined in this way will be of the form "1. Object X meets the conditions of belonging to group Y.  2. It is a theorem that statement Z applies to all objects in group Y.  3. Therefore Z(X)."
  • If you have the proof of the theorem at hand, then it should be easy to taboo the name of group Y and just plug X into the text of the proof and get an equally rigorous argument.  Sometimes that would be a better approach to the whole issue; though if group Y is common enough, it might be worth the work of establishing the definition.

To apply it to the "taxation is theft" argument: Well, basically, we want to make a rigorous definition of "theft" (or possibly a new term) that covers taxation, and see how much we can retain.

Taxation is nonconsensual taking of someone's rightful property (some might argue about "consent" in a democracy; let's assume that Bob is objecting to being taxed, and he did everything he could to vote against it, to support politicians who said they would reduce or even abolish taxes, etc., yet was unsuccessful).  We would therefore be able to make arguments like "under certain moral systems, taxation is immoral", and "because it doesn't require the consent of those being taxed, even if some instances of taxation were net good in some way, we'd expect to end up with a lot more instances than that unless there were strong barriers preventing it", and "taxation reduces people's incentive to trade their labor for property, because some of that property will go missing", and "taxation incentivizes people to spend energy arranging their possessions in ways that are less likely to get taken, which is a waste".

On the other hand, certain other characteristics that are common in theft are not the case: taxes are generally mostly known in advance, while theft is mostly unpredictable; and where theft might take a poor-ish person whose primary asset is a car and suddenly bankrupt them, taxes are unlikely to do that.

Taxation is centralized, systematized theft.  The systematization has its benefits and civilizing effects: economies of scale and reduced risk and variation in the collection process—similar to what you get when you industrialize other processes.  Also, we probably benefit from a "tragedy of the commons" among those who receive the taxes: most individuals don't have a strong incentive to raise taxes a lot.

For murder and capital punishment: Let's first note that the legal profession has made distinctions: "first-degree murder", "second-degree", etc. (it seems to vary by jurisdiction), not to mention manslaughter, and of course there are cases like self-defense where it may not even be a crime.  "Homicide" is what they call "killing" without implying anything about the legality.  First-degree murder, the worst, seems to mean "murder pre-meditated in cold blood", while the lesser degrees apply when there are extenuating circumstances and less pre-meditation.

Executing a prisoner is 100% pre-meditated in cold blood.  The argument for it to be legal is to cast it as self-defense and/or revenge.  Are revenge killings legal?  It seems like they sort of used to be, and then at some point States generally disallowed individuals from doing it, while arrogating that function to itself.  As for self-defense... one could argue that the criminal, having committed their crimes (like murder), has shown they are a threat, but really that's not a strong enough data point.  (What fraction of murderers do it again? ... A Google result says between 2% and 16% for different groups.  What if they killed their brother out of enmity that began in childhood, and they have no more brothers? Also, "got drunk and angry in a bar argument and killed a stranger" is a lot more likely to recur, yet would probably be second-degree, while the "brother" scenario might be first-degree.)

Capital punishment is centralized, systematized revenge-killing.  Once again, the systematization brings benefits and civilizing effects: economies of scale, reduced risk and variation.  I would not say that this changes the morality of it, only the tactical utility.  (I haven't actually said whether I think revenge-killing itself is moral.)

Anyway, on the subject of the original frame—"capital punishment is murder", given the definition "murder = killing without proper justification", is assuming the conclusion—that capital punishment should be illegal.  If you want a different definition, I would say use a different term.  If it were "capital punishment is killing", that would be an uncontroversial statement of fact; nor would the argument "killing is necessarily bad" persuade more than a few pacifists.

"Capital punishment is revenge-killing" would be the closest to an argument we can break into its pieces, "killing people for retaliation is bad (to the point where we should have a policy against it)" and "capital punishment is a policy of killing in retaliation", and attempt to justify both pieces.  Though some of the arguments people would like to make, like "revenge killings generally lead to generations-long family feuds", would not extend to the State's centralized revenge killings; in constructing or evaluating such arguments, the key technique of rigor is to notice statements that are actually "(we've seen in the past that) revenge killings (often) lead to family feuds" when they should be "(we can prove that) revenge killings (necessarily create conditions that likely) lead to family feuds", and in trying to prove that you should either notice that the definition of "revenge killing" doesn't specify that it's carried out by a family member—or, if it does, then notice that clause of the definition doesn't apply to capital punishment.

Utility Maximization = Description Length Minimization

Hmm.  If we bring actual thermodynamics into the picture, then I think that energy stored in some very usable way (say, a charged battery) has a small number of possible states, whereas when you expend it, it generally ends up as waste heat that has a lot of possible states.  In that case, if someone wants to take a bunch of stored energy and spend it on, say, making a robot rotate a huge die made of rock into a certain orientation, then that actually leads to a larger state space than someone else's preference to keep the energy where it is, even though we'd probably say that the former is costlier than the latter.  We could also imagine a third person who prefers to spend the same amount of energy arranging 1000 smaller dice—same "cost", but exponentially (in the mathematical sense) different state space shrinkage.

It seems that, no matter how you conceptualize things, it's fairly easy to construct a set of examples in which state space shrinkage bears little if any correlation to either "expected utility" or "cost".

Utility Maximization = Description Length Minimization

There is not any meaningful sense in which utility changes are "large" or "small" in the first place, except compared to other changes in the same utility function.

We can establish a utility scale by tweaking the values a bit.  Let's say that in my favored 3/4 of the state space, half the values are 1 and the other half are 2.  Then we can set the disfavored 1/4 to 0, to -100, to -10^100, etc., and get utility functions that aren't equivalent.  Anyway, in practice I expect we would already have some reasonable unit established by the problem's background—for example, if the payoffs are given in terms of number of lives saved, or in units of "the cost of the action that 'optimizes' the situation".

Satisfying your preferences requires shrinking the world-space by a relatively tiny amount, and that's important. [...] satisfying your preferences is "easy" and "doesn't require optimizing very much"; you have a very large target to hit.

So the theory is that the fraction by which you shrink the state space is proportional (or maybe its logarithm is proportional) to the effort involved.  That might be a better heuristic than none at all, but it is by no means true in general.  If we say I'm going to type 100 digits, and then I decide what those digits are and type them out, I'm shrinking the state-space by 10^100.  If we say my net worth is between $0 and $10^12, and then I make my net worth be $10^12, I'm shrinking the state-space (in that formulation of the world) by only 10^12 (or perhaps 10^14 if cents are allowed); but the former is enormously easier for me to do than the latter.  In practice, again, I think the problem's background would give much better ways to estimate the cost of the "optimization" actions.

(Edit: If you want an entirely self-contained example, consider: A wall with 10 rows of 10 cubby-holes, and you have 10 heavy rocks.  One person wants the rocks to fill out the bottom row, another wants them to fill out the left column, and a third wants them on the top row.  At least if we consider the state space to just be the positions of the rocks, then each of these people wants the same amount of state-space shrinking, but they cost different amounts of physical work to arrange.)

I'm guessing that the best application of the idea would be as one of the basic first lenses you'd use to examine/classify a completely alien utility function.

Utility Maximization = Description Length Minimization

The title, "Utility Maximization = Description Length Minimization", and likewise the bolded statement, "to “optimize” a system is to reduce the number of bits required to represent the system state using a particular encoding", strike me as wrong in the general case, or as only true in a degenerate sense that can't imply much.  This is unfortunate, because it inclines me to dismiss the rest of the post.

Suppose that the state of the world can be represented in 100 bits.  Suppose my utility function assigns a 0 to each of 2^98 states (which I "hate"), and a 1 to all the remaining (2^100 - 2^98) states (which I "like").  Let's imagine I chose those 2^98 states randomly, so there is no discernible pattern among them.

You would need 99.58 bits to represent one state out of the states that I like.  So "optimizing" the world would mean reducing it from a 100-bit space to a 99.58-bit space (which you would probably end up encoding with 100 bits in practice).  While it's technically true that optimizing always implies shrinking the state space, the amount of shrinking can be arbitrarily tiny, and is not necessarily proportional to the amount by which the expected utility changes.  Thus my objection to the title and early statement.

It probably is true in practice that most real utility functions are much more constraining than the above scenario.  (For example, if you imagine all the possible configurations of the atoms that make up a human, only a tiny fraction of them correspond to a living human.)  There might be interesting things to say about that.  However, the post doesn't seem to base its central arguments on that.

Given what is said later about using K-L divergence to decompose the problem into "reducing entropy" + "changing between similar-entropy distributions", I could say that the post makes the case for me: that a more accurate title would be "Utility Maximization = Description Length Minimization + Other Changes" (I don't have a good name for the second component).

Luna Lovegood and the Chamber of Secrets - Part 12

Yeah, I assumed the same.  The chapter specifies "episodic memory" (although, somewhat confusingly, it says "everything" earlier in the sentence):

Everything, forget everything, Tom Riddle, Professor Quirrell, forget your whole life, forget your entire episodic memory, forget the disappointment and the bitterness and the wrong decisions, forget Voldemort -

It seems this is a real thing that can happen.  "In the case of dissociative amnesia, individuals are separated from their memories ... they may forget who they are and everything about themselves and their personal history", yet they can walk and talk and do everything well enough to "move to a new location and establish a new identity" as an adult: https://www.psychologytoday.com/us/conditions/dissociative-amnesia

Luna Lovegood and the Chamber of Secrets - Part 12

Also, "Quirrell" is globally missing its second l.

Luna Lovegood and the Chamber of Secrets - Part 12

Lord Voldemort should be unable to do anything due to being supposedly killed for good by Obliviation

HPMOR chapter 115 says: "After future-Harry had figured out what to do with an almost-completely-amnesiac wizard who still had some bad habits of thought and some highly negative emotional patterns - a dark side, as 'twere - plus a great deal of declarative and procedural knowledge about powerful magic. Harry had tried his best not to Obliviate that part, because he might need it, someday."