Sorted by New

Wiki Contributions



This isn't precisely what Daniel_Burfoot was talking about but its a related idea based on "sparse coding" and it has recently obtained good results in classification:

Here the "theories" are hierarchical dictionaries (so a discrete hierarchy index set plus a set of vectors) which perform a compression (by creating reconstructions of the data). Although they weren't developed with this in mind, support vector machines also do this as well, since one finds a small number of "support vectors" that essentially allow you to compress the information about decision boundaries in classification problems (support vector machines are one of the very few things from machine learning that have had significant and successful impacts elsewhere since neural networks).

The hierarchical dictionaries learned do contain a "theory" of the visual world in a sense, although an important idea is that they do so in a way that is sensitive to the application at hand. There is much left out by Daniel_Burfoot about how people actually go about implementing this line of thought.


(A text with some decent discussion on the topic)[]. At least one group that has a shot at winning a major speech recognition benchmark competition uses information-theoretic ideas for the development of their speech recognizer. Another development has been the use of error-correcting codes to assist in multi-class classification problems (google "error correcting codes machine learning")[] (arguably this has been the clearest example of a paradigm shift that comes from thinking about compression which had a big impact in machine learning). I don't know how many people think about these problems in terms of information theory questions (since I don't have much access to their thoughts): but I do know at least two very competent researchers who, although they never bring it outright into their papers, they have an information-theory and compression-oriented way of posing and thinking about problems.

I often try to think of how humans process speech in terms of information theory (which is inspired by a couple of great thinkers in the area), and thus I think that it is useful for understanding and probing the questions of sensory perception.

There's also a whole literature on "sparse coding" (another compression-oriented idea originally developed by biologist but since ported over by computer vision and a few speech researchers) whose promise in machine learning may not have been realized yet, but I have seen at least a couple somewhat impressive applications of related techniques appearing.


I have a minor disagreement, which I think supports your general point. There is definitely a type of compression going on in the algorithm, it's just that the key insight in the compression is not to just "minimize entropy" but rather make the outputs of the encoder behave in a similar manner as the observed data. Indeed, that was one of the major insights in information theory is that one wants the encoding scheme to capture the properties of the distribution over the messages (and hence over alphabets).

Namely, in Hinton's algorithm the outputs of the encoder are fed through a logistic function and then the cross-entropy is minimized (essentially the KL divergence). It seems that he's more providing something like a reparameterization of a probability mass function for pixel intensities which is a logistic distribution when conditioned on the "deeper" nodes. Minimizing that KL divergence means that the distribution is made to be statistically indistinguishable from the distribution over the data intensities (since the KL-divergence minimizes expected log likelihood ratio-which means minimizing the power over the uniformly most powerful test).

Minimizing entropy blindly would mean the neural network nodes would give constant output: which is very compressive but utterly useless.


This attacks a straw-man utilitarianism, in which you need to compute precise results and get the one correct answer. Functions can be approximated; this objection isn't even a problem.

Not every function can be approximated efficiently, though. I see the scope of morality as addressing human activity where human activity is a function space itself. In this case the "moral gradient" that the consequentialist is computing is based on a functional defined over a function space. There are plenty of function spaces and functionals which are very hard to efficiently approximate (the Bayes predictors for speech recognition and machine vision fall into this category) and often naive approaches will fail miserably.

I think the critique of utility functions is not that they don't provide meaning, but that they don't necessarily capture the meaning which we would like. The incoherence argument is that there is no utility function which can represent the thing we want to represent. I don't buy this argument mostly because I've never seen a clear presentation of what it is that we would preferably represent, but many people do (and a lot of these people study decision-making and behavior whereas I study speech signals). I think it is fair to point out that there is only a very limited biological theory of "utility" and generally we estimate "utility" phenomenologically by studying what decisions people make (we build a model of utility and try to refine it so that it fits the data). There is a potential that no utility model is actually going to be a good predictor (i.e. that there is some systematic bias). So, I put a lot of weight on the opinions of decision experts in this regard: some think utility is coherent and some don't.

The deontologist's rules seem to do pretty well as many of them are currently sitting in law books right now. They form the basis for much of the morality that parents teach their children. Most utilitarians follow most of them all the time, anyway.

My personal view is to do what I think most people do: accept many hard constraints on one's behavior and attempt to optimize over estimates of projections of a moral gradient along a few dimensions of decision-space. I.e. I try to think about how my research may be able to benefit people, I also try to help out my family and friends, I try to support things good for animals and the environment. These are areas where I feel more certain that I have some sense where some sort of moral objective function points.


I would like you to elaborate on the incoherence of deontology so I can test out how my optimization perspective on morality can handle the objections.


To be clear I see the deontologist optimization problem as being a pure "feasibility" problem: one has hard constraints and zero gradient (or approximately zero gradient) on the moral objective function given all decisions that one can make.

Of the many, many critiques of utilitarianism some argue that its not sensible to actually talk about a "gradient" or marginal improvement in moral objective functions. Some argue this on the basis of computational constraints: there's no way that you could ever reasonably compute a moral objective function (because the consquences of any activity are much to complicated) to other critiques that argue the utilitarian notion of "utility" is ill-defined and incoherent (hence the moral objective function has no meaning). These sorts of arguments undermine argue against the possibility of soft-constraints and moral objective functions with gradients.

The deontological optimization problem, on the other hand, is not susceptible to such critiques because the objective function is constant, and the satisfaction of constraints is a binary event.

I would also argue that the most hard-core utilitarian practically acts pretty similarly to a deontologist. The reason is that we only consider a tiny subspace of all possible decisions, and our estimate of the moral gradient will be highly inaccurate over most possible decision axis (I buy the computational-constraint critique), and its not clear that we have enough information about human experience in order to compute those gradients. So, practically speaking: we only consider a small number of different way to live our lives (hence we optimize over a limited range of axes) and the directions we optimize over is not-random for the most part. Think about how most activists and most individuals who perform any sort of advocacy focus on a single issue.

Also consider the fact that most people don't murder or perform certain forms of horrendous crimes. These single issue thinking, law-abiding types may not think of themselves as deontologist but a deontologist would behave very similarly to them since neither attempts to estimate moral gradients over decisions and treats many moral rules as binary events.

The utilitarian and the deontologist are distinguished in practice in that the utilitarian computes a noisy estimate of the moral gradient along a few axes of their potential decision-space: while everywhere else we think of hard constraints and no gradients on the moral objective. The pure utilitarian is at best a theoretical concept that has no potential basis in reality.


I would argue that deriving principles using the categorical imperative is a very difficult optimization problem and that there is a very meaningful sense in which one is a deontologist and not a utilitarian. If one is a deontologist then one needs to solve a series of constraint-satisfaction problems with hard constraints (i.e. they cannot be violated). In the Kantian approach: given a situation, one has to derive the constraints under which one must act in that situation via moral thinking then one must accord to those constraints.

This is very closely related to combinatorial optimization problems. I would argue that often there is a "moral dual" (in the sense of a dual program) where those constraints are no longer treated as absolute and you can assign different costs to each violation and you can then find a most moral strategy. I think very often we have something akin to strong duality where the utilitarian dual is equivalent to the deontological problem, but its an important distinction to remember that the deontologist has hard constraints and zero gradient on their objective functions (by some interpretations).

The utilitarian performs a search over a continuous space for the greatest expected utility, while the deontologist (in an extreme case) has a discrete set of choices, from which the immoral ones are successively weeded out.

Both are optimization procedures, and can be shown to produce very similar output behavior but the approach and philosophy are very different. The predictions of the behavior of the deontologist and the utilitarian can become quite different under the sorts of situations that moral philosophers love to come up with.


I agree with the beginning of your comment. I would add that the authors may believe they are attacking utilitarianism, when in fact they are commenting on the proper methods for implementing utilitarianism.

I disagree that attacking utilitarianism involves arguing for different optimization theory. If a utilitarian believed that the free market was more efficient at producing utility then the utilitarian would support it: it doesn't matter by what means that free market, say, achieved that greater utility.

Rather, attacking utilitarianism involves arguing that we should optimize for something else: for instance something like the categorical imperative. A famous example of this is Kant's argument that one should never lie (since it could never be willed to be a universal law, according to him), and the utilitarian philosopher loves to retort that lying is essential if one is hiding a Jewish family from the Nazis. But Kant would be unmoved (if you believe his writings), all that would matter are these universal principles.


Bear in mind that having more fat means that the brain gets starved of (glucose)[] and blood sugar levels have (impacts on the brain generally)[]. Some research has indicated that the amount of sugar available to the brain has a relationship with self-control. A moderately obese person may have fat cells that steal so much glucose from their brain that their brain is incapable of mustering the will in order to get them to stop eating poorly. Additionally, the marginal fat person is likely fat because of increased sugar consumption (which has been the main sort of food whose intake has increased since the origins of the obesity epidemic in the 1970s), in particular there has been a great increase in the consumption of fructose: which is capable of raising insulin levels (which signal to the body to start storing energy as fat) while at the same time not activating leptin (which makes you feel full). Thus, people are consuming this substance that may be kicking their bodies into full gear to produce more fat: which leaves them with no energy or will to perform any exercise.

The individuals most affected by the obesity epidemic are the poor and recall that some of the cheapest sources of calories available on the market are foods like fructose and processed meats. While there is a component of volition regardless, if the body works as the evidence suggests: they may have a diet that is pushing them quite hard towards being obese, sedentary, and unable to do anything about it.

Think about it this way, if you constantly wack me over the head you can probably get me to do all sorts of things that I wouldn't normally do: but it wouldn't be right to call my behavior in that situation "voluntary". Fat people may be in a similar situation.


I think that this post has something to say about political philosophy. The problem as I see it is that we want to understand how our local decision-making affects the global picture and what constraints should we put on our local decisions. This is extremely important because, arguably, people make a lot of local decisions that make us globally worse off: such as pollution ("externalities" in econo-speak). I don't buy the author's belief that we should ignore these global constraints: they are clearly important--indeed its the fear of the potential global outcomes of careless local decision-making that arguably led to the creation of this website.

However, just like a computers we have a lot of trouble integrating the global constraints into our decision-making (which is necessarily a local operation), and we probably have a great deal of bias in our estimates of what is the morally best set of choices for us to make. Just like the algorithm we would like to find some way to make the computational burden on us less in order to achieve these moral ends.

There is an approach in economics to understand social norms advocated by Herbert Gintis [PDF] that is able to analyze these sorts of scenarios. The essential idea is this: agents can engage in multiple correlated equilibria (these are a generalized version of Nash equilibria) possible as a result of various social norms. These correlated equilibria are, in a sense, patched together by a social norm from the "rational" (self-interested, local expected utility maximizers) agents' decisions. Human rights could definitely be understood in this light (I think: I haven't actually worked out the model).

Similar reasoning may also be used to understand certain types of laws and government policies. It is via these institutions (norms, human organizations, etc.) that we may efficiently impose global constraints on people's local decision-making. The karma system, for instance, on Less wrong probably changes the way that people make their decision to comment.

There is a probably a computer science - economics crossover paper here that would describe how institutions can lower the computational burden on individuals in their decision-making: so that when individuals make decisions in these simpler domains we can be sure that we will still be globally better off.

One word of caution is that this is precisely the rational behind "command economies" and these didn't work out so well during the 20th century. So choosing the "patching together" institution well is absolutely essential.

Load More