Friendly AI ideas needed: how would you ban porn?

by Stuart_Armstrong1 min read17th Mar 201480 comments

6

Personal Blog

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

There are lots of suggestions on how to do this, and a lot of work in the area. But having been over the same turf again and again, it's possible we've got a bit stuck in a rut. So to generate new suggestions, I'm proposing that we look at a vaguely analogous but distinctly different question: how would you ban porn?

Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.

The distinction between the two case is certainly not easy to spell out, and many are reduced to saying the equivalent of "I know it when I see it" when defining pornography. In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?

To get maximal creativity, it's best to ignore the ultimate aim of the exercise (to find inspirations for methods that could be adapted to AI) and just focus on the problem itself. Is it even possible to get a reasonable solution to this question - a question much simpler than designing a FAI?

Personal Blog

6

80 comments, sorted by Highlighting new comments since Today at 5:40 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?

Not saying that I would endorse this as a regulatory policy, but it's my understanding that the strategy used by e.g. the Chinese government is to not give any explicit guidelines. Rather, they ban things which they consider to be out of line and penalize the people who produced/distributed them, but only give a rough reason. The result is that nobody tries to pull tricks like obeying the letter of the regulations while avoiding the spirit of them. Quite the opposite, since nobody knows what exactly is safe, people end up playing it as safe as possible and avoiding anything that the censors might consider a provocation.

Of course this errs on the side of being too restrictive, which is a problem if the eroticism is actually somet... (read more)

3Risto_Saarelma7yIf the problem is about classifier design, I supposed that in the least convenient possible world both the porn and the high culture media were being beamed from Alpha Centauri. Instead of being able to affect the production in any way, all you could do was program the satellite relay that propagates the stuff to terrestrial networks to filter the porny bits while feeding the opera bits to lucrative pay-per-view, while trying not to think too hard about just what is going on at Alpha Centauri and why is it resulting pitch-perfect human porn movies and opera performances being narrowcast at Earth in high resolution video.
2Gunnar_Zarncke7yWhich is exactly "refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts".
2Kaj_Sotala7yHmm, I read the original post rather quickly, so I actually missed the fact that the analogy was supposed to map to value loading. I mistakenly assumed that this was about how to ban/regulate AGI while still allowing more narrow AI.

Short answer: Mu.

Longer answer: "Porn" is clearly underspecified, and to make matters worse there's no single person or interest group that we can try to please with our solution: many different groups (religious traditionalists, radical feminists, /r/nofap...) dislike it for different and often conflicting reasons. This wouldn't be such a problem -- it's probably possible to come up with a definition broad enough to satisfy all parties' appetites for social control, distasteful as such a thing is to me -- except that we're also trying to leave ... (read more)

5Oscar_Cunningham7yNote that this is the general method for dealing with confused concepts.
2Nornagest7yYeah. An earlier version of my post started by saying so, but I decided that the OP had been explicit enough in asking for an object-level solution that I'd be better off laying out more of the reasoning behind going meta.
0itaibn07yThis all sounds reasonable to me. Now what happens when you apply the same reasoning to Friendly AI?
1Nornagest7yNothing particularly new or interesting, as far as I can tell. It tells us that defining a system of artificial ethics in terms of the object-level prescriptions of a natural ethic is unlikely to be productive; but we already knew that. It also tells us that aggregating people's values is a hard problem and that the best approaches to solving it probably consist of trying to satisfy underlying motivations rather than stated preferences; but we already knew that, too.

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

I don't think that this is true. Reductionist solutions to philosophical problems typically pick some new concepts which can be crisply defined, and then rephrase the problem in terms of those, throwing out the old fuzzy concepts in the process. What they don't do is to take the fuzzy concepts and try to rework them.

For example, nowhere in the ... (read more)

distinguish between pornography and eroticism

Aren't you assuming these two are at different sides of a "reality joint"?

I tend to treat these words as more or less synonyms in that they refer to the same thing but express different attitude on the part of the speaker.

2Stuart_Armstrong7yI chose those examples because the edge cases seem distinct, but the distinction seems very hard to formally define.
3Lumifer7yI don't think the edge cases are as distinct as they seem to you to be. Generally speaking, pornography and eroticism are two-argument things, the first argument is the object (the text/image/movie), and the second argument is the subject (the reader/watcher) together with all his cultural and personal baggage. Trying to assume away the second argument isn't going to work. The cultural and individual differences are too great.
0Gunnar_Zarncke7yDoesn't seem lawmakers see it the same way.

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

Strongly disagree. The whole point of Bayesian reasoning is that it allows us to deal with uncertainty. And one huge source of uncertainty is that we don't have precise understandings of the concepts we use. When we first learn a new concept, we have a ton of uncertainty about its location in thingspace. As we collect more data (either thro... (read more)

1Kaj_Sotala7yRelated paper [http://lesswrong.com/lw/1ty/mental_crystallography/1oot]. Also sections 1 and 2 of this paper [http://users.ics.aalto.fi/tho/online-papers/TKK-ICS-R41.pdf].

In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you

But it is, and the contrary approach of teaching humans to recognize things doesn't have an obvious relation to FAI, unless we think that the details of teaching human brains by instruction and example are relevant to how you'd set up a similar training ... (read more)

0itaibn07yWell, conflicting values is obviously relevant, and disagreements seem so as well to a less extend (consider the problem of choosing priors for an AI), for starters.
0Stuart_Armstrong7yI'm just fishing in random seas for new ideas.
0Gunnar_Zarncke7yThe movie prediction question is complicated because it includes feedbacl cycles over styles and tastes and is probably cross-linked to other moves airing at the same time. See e.g. http://www.stat.berkeley.edu/~aldous/157/Old_Projects/kennedy.pdf [http://www.stat.berkeley.edu/~aldous/157/Old_Projects/kennedy.pdf] The "predict taste buds?" question is better. But even that one contains feedback cycles over tastes. At least on some domains like wine and probably cigarettes and expensive socially consumed goods.

This is a sorites problem and you want to sort some pebbles into kinds. You've made clear that externalism about porn may be true (something may begin or stop being porn in virtue of properties outside it's own inherent content, such as where it is and contextual features).

It seems to me that you have to prioritize your goals in this case. So the goal "ban porn" is much more important then the goal "leave eroticism alone". My response would be to play safe and ban all footage including genitals, similar to what the Japanese already do... (read more)

0Stuart_Armstrong7yI feel this kind of idea could have some AI potential in some form or other. Let me think about it...

The distinction between eroticism and pornography is that it's porn of a typical viewer wanks to it. Like the question of whether something is art, the property is not intrinsic to the thing itself.

That this question was so easy, very slightly decreases my difficulty estimate for Friendliness.

3CronoDAS7ySo if you only show your porn somewhere that makes it inconvenient to masturbate to (such as at the Met) then it's no longer porn? ;)
5jimrandomh7yYes.
2knb7yIn the same sense that a broken toilet [http://en.wikipedia.org/wiki/File:Duchamp_Fountaine.jpg] in an art gallery is a powerful dadaist work of art, but a broken toilet in an alley is a broken toilet.
2Stuart_Armstrong7yIf I port this type of idea over to AI, I would get things like "the definition of human pain is whether the typical sufferer desires to scream or not". Those definition can be massively gamed, of course; but it does hint that if we define a critical mass of concepts correctly (typical, desires...) we can ground some undefined concepts in those ones. It probably falls apart the more we move away from standard human society (eg will you definition of porn work for tenth generation uploads?). So in total, if we manage to keep human society relatively static, and we have defined a whole host of concepts, we may be able to ground extra ambiguous concepts using what we've already defined. The challenge seems keeping human society (and humans!) relatively static.

I'll cite the comment section on this post to friends whenever I need to say: "And this, my friends, is why you don't let rationalists discuss porn" http://imgs.xkcd.com/comics/shopping_teams.png

[-][anonymous]7y 3

I would guess that eroticism is supposed to inspire creativity while pronography supposedly replaces it. So if the piece in question were to be presented to people while their brain activity is being monitored I would expect to see an increase of activity throughout the brain for eroticism while I'd expect a decrease or concentration of activity for pornography. Although I have no idea if that is actually the case.

Without reference to sexual stimulation this would include a lot of things that are not currently thought of as pornography, but that might actually be intentional depending on the reason why someone would want to ban pornography.

This thread should provide interesting examples of the Typical Mind Fallacy... :-D

[-][anonymous]7y 3

An alternative to trying to distinguish between porn and erotica on the basis of content or user attitudes: teach the AI to detect infrastructures of privacy and subterfuge, and to detect when people are willing to publicly patronize and self-identify with something. Most people don't want others to know that they enjoy porn. You could tell your boss about the nude Pirates you saw last weekend, but probably not the porn. Nude Pirates shows up on the Facebook page, but not so much the porn. An online video with naked people that has half a million views, but is discussed nowhere where one's identity is transparent, is probably porn. It's basic to porn that it's enjoyed privately, erotica publicly.

1Gunnar_Zarncke7yExcept that this doesn't hold in all social circles. Once there is a distinction people will start to use it to make a difference.
2[anonymous]7yWell, it's sufficient for our purposes that it holds in most. Proud and public porn consumers are outliers, and however an AI might make ethical distinctions, there will always be a body of outliers to ignore. But I grant that my approach is culturally relative. In my defense, a culture ordered such that this approach wouldn't work at all probably wouldn't seek a ban on porn anyway, and might not even be able to make much sense of the distinction we're working with here.
1Gunnar_Zarncke7yHandling sub cultures is difficult. We can ignore outliers because our rules are weak and don't reach those who make their own local rules and accept the price of violating (or bending) some larger sociaties rules. But an AI may not treat them the same. An AI will be able to enforce the rules on the outliers and effectively kill those sub cultures. Do we ant this? One size fits all? I don't think so. The complex value function must also allow 'outliers' - only the concept must be made stronger.
0Stuart_Armstrong7yI like this kind of indirect approach. I wonder if such ideas could be ported to AI...

Just criminalize porn, and leave it to the jury to decide whether or not it's porn. That's how we handle most moral ambiguities, isn't it?

I will assume that the majority of the population shares my definition of porn and is on board with this, creating low risk of an activist jury (otherwise this turns into the harder problem of "how to seize power from the people".)

Edit: On more careful reading, I guess that's not allowed since it would fall in the "I know it when I see it" category. But then, since we obviously are not going to write ... (read more)

I think there is no fundamental difference between porn and erotica, it's just that one is low status and the other is high status (and what's perceived as highs status depends greatly on the general social milieu so it's hard to give any kind of stand-alone definition to delineate the two). It only seems like there are two "clusters in thingspace" because people tend to optimize their erotic productions to either maximize arousal or maximize status, without much in between (unless there is censorship involved, in which case you might get shows that are optimized to just barely pass censorship). Unfortunately I don't think this answer helps much with building FAI.

A couple of thoughts here:

  1. Set a high minimum price for anything arousing (say $1000 a ticket). If it survives in the market at that price, it is erotica; if it doesn't, it was porn. This also works for $1000 paintings and sculptures (erotica) compared to $1 magazines (porn).

  2. Ban anything that is highly arousing for males but not generally liked by females. Variants on this: require an all-female board of censors; or invite established couples to view items together, and then question them separately (if they both liked it, it's erotica). Train the AI on examples until it can classify independently of the board or couples.

4Nornagest7yI can see the headline now: "Yaoi Sales Jump on Controversial FCC Ruling".
0mwengler7yI doubt that that works. What makes you think there are no rich guys who want to see pornography? They will simply buy it at the $1000 price. I can think of no reason why price discrimination would favor "art" over porn.
0drnickbone7yA "few" rich guys buying (overpriced) porn is unlikely to sustain a real porn industry. Also, using rich guy logic, it is probably a better investment to buy the sculptures, paintings, art house movies etc, amuse yourself with those for a while, then sell them on. Art tends to appreciate over time.

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

I don't think that's how the solution to FAI will look like. I think the solution to FAI will look like "Look, this a human (or maybe an uploaded human brain), it is an agent, it has a utility function. You should be maximizing that."

0Stuart_Armstrong7yThe clearer we make as many concepts as we can, the more likely is is that "look, this is..." is going to work.
0Squark7yWell, I think the concept we have to make clear is "agent with given utility function". We don't need any human-specific concepts, and they're hopelessly complex anyway: let the FAI figure out the particulars on its own. Moreover, the concept of an "agent with given utility function" is something I believe I'm already relatively near to formalizing [http://lesswrong.com/lw/jub/updateless_intelligence_metrics_in_the_multiverse/] .
0Eugine_Nier7yIf the agent in question has a well-defined utility function, why is he deferring to the FAI to explain it to him?
0Squark7yBecause he is bad at introspection and his only access to the utility function is through a noisy low-bandwidth sensor called "intuition".
0Stuart_Armstrong7yAgain, the more we can do ahead of time, the more likely it is that the FAI will figure these things out correctly.
0Squark7yWhy do you think the FAI can figure these things out incorrectly, assuming we got "agent with given utility function" right? Maybe we can save it time by providing it with more initial knowledge. However, since the FAI has superhuman intelligence, it would probably take us much longer to generate that knowledge than it would take the FAI. I think that to generate an amount of knowledge which would be non-negligible from the FAI's point of view would take a timespan large with respect to the timescale on which UFAI risk becomes significant. Therefore in practice I don't think we can wait for it before building the FAI.
1Stuart_Armstrong7yBecause values are not physical facts, and cannot be deduced from mere knowledge.
0Squark7yI'm probably explaining myself poorly. I'm suggesting that there should be a mathematical operator which takes a "digitized" representation of an agent, either in white-box form (e.g. uploaded human brain) or in black-box form (e.g. chatroom logs) and produces a utility function. There is nothing human-specific in the definition of the operator: it can as well be applied to e.g. another AI, an animal or an alien. It is the input we provide the operator that selects a human utility function.
0asr7yI don't understand how such an operator could work. Suppose I give you a big messy data file that specifies neuron state and connectedness. And then I give you a big complicated finite-element simulator that can accurately predict what a brain would do, given some sensory input. How do you turn that into a utility function? I understand what it means to use utility as a model of human preference. I don't understand what it means to say that a given person has a specific utility function. Can you explain exactly what the relationship is between a brain and this abstract utility function?
0Squark7ySee the last paragraph in this comment [http://lesswrong.com/lw/jda/friendly_ai_ideas_needed_how_would_you_ban_porn/aqzo] .
0asr7yI don't see how that addresses the problem. You're linking to a philosophical answer, and this is an engineering problem. The claim you made, some posts ago, was "we can set an AI's goals by reference to a human's utility function." Many folks objected that humans don't really have utility functions. My objection was "we have no idea how to extract a utility function, even given complete data about a human's brain." Defining "utility function" isn't a solution. If you want to use "the utility function of a particular human" in building an AI, you need not only a definition, but a construction. To be convincing in this conversation, you would need to at least give some evidence that such a construction is possible. You are trying to use, as a subcomponent, something we have no idea how to build and that seems possibly as hard as the original problem. And this isn't a good way to do engineering.
0Squark7yThe way I expect AGI to work is receiving a mathematical definition of its utility function as input. So there is no need to have a "construction". I don't even know what a "construction" is, in this context. Note that in my formal definition of intelligence, we can use any appropriate formula* in the given formal language as a utility function, since it all comes down to computing logical expectation values. In fact I expect a real seed AGI to work through computing logical expectation values (by an approximate method, probably some kind of Monte Carlo). Of course, if the AGI design we will come up with is only defined for a certain category of utility functions then we need to somehow project into this category (assuming the category is rich enough for the projection not to lose too much information). The construction of this projection operator indeed might be very difficult. * In practice, I formulated the definition with utility = Solomonoff expectation value of something computable. But this restriction isn't necessary. Note that my proposal [http://lesswrong.com/lw/jyv/logical_thermodynamics_towards_a_theory_of/] for defining logical probabilities admits self reference in the sense that the reasoning system is allowed to speak of the probabilities it assigns (like in Christiano et al [http://intelligence.org/wp-content/uploads/2013/03/Christiano-et-al-Naturalistic-reflection-early-draft.pdf] ).
0Stuart_Armstrong7yHumans don't follow anything like a utility function, which is a first problem, so you're asking the AI to construct something that isn't there. Then you have to knit this together into a humanity utility function, which is very non trivial (this is one feeble and problematic way of doing this: http://lesswrong.com/r/discussion/lw/8qb/cevinspired_models/ [http://lesswrong.com/r/discussion/lw/8qb/cevinspired_models/]). The other problem is that you haven't actually solved many of the hard problems. Suppose the AI decides to kill everyone, then replay, in an endless loop, the one upload it has, having a marvellous experience. Why would it not do that? We want the AI to correctly balance our higher order preferences (not being reduced to a single mindless experience) with our lower order preferences (being happy). But that desire is itself a higher order preference - it won't happen unless the AI already decides that higher order preferences trump lower ones. And that was one example I just thought of. It's not hard to come up with "the AI does something stupid in this model (eg: replaces everyone with chatterbots that describe their ever increasing happiness and fulfilment) that is compatible with the original model but clearly stupid - clearly stupid to our own judgement, though, not to the AIs. You may object that these problems won't happen - but you can't be confident of this, as you haven't defined your solution formally, and are relying on common sense to reject those pathological solutions. But nowhere have you assumed the AI has common sense, or how it will use it. The more details you put in your model, I think, the more the problems will become apparent.
0Squark7yThank you for the thoughtful reply! In the white-box approach it can't really hide. But I guess it's rather tangential to the discussion. What do you mean by "follow a utility function"? Why do you thinks humans don't do it? If it isn't there, what does it mean to have a correct solution to the FAI problem? The main problem with Yvain's thesis is in the paragraph: What does Yvain mean by "give the robot human level intelligence"? If the robot's code remained the same, in what sense does it have human level intelligence? This is the part of the CEV proposal which always seemed redundant to me. Why should we do it? If you're designing the AI, why wouldn't you use your own utility function? At worst, an average utility function of the group of AI designers? Why do we want / need the whole humanity there? Btw, I would obviously prefer my utility function in the AI but I'm perfectly willing to settle on e.g. Yudkowsky's. It seems that you're identifying my proposal with something like "maximize pleasure". The latter is a notoriously bad idea, as was discussed endlessly. However, my proposal is completely different. The AI wouldn't do something the upload wouldn't do because such an action is opposed to the upload's utility function. Actually, I'm not far from it (at least I don't think I'm further than CEV). Note that I have already defined formally I(A, U) where I=intelligence, A=agent, U=utility function. Now we can do something like "U(A) is defined to be U s.t. the probability that I(A, U) > I(R, U) for random agent R is maximal". Maybe it's more correct to use something like a thermal ensemble with I(A, U) playing the role of energy: I don't know, I don't claim to have solved it all already. I just think it's a good research direction.
0Stuart_Armstrong7yHumans are neither independent not transitive. Human preferences change over time, depending on arbitrary factors, including how choices are framed. Humans suffer because of things they cannot affect, and humans suffer because of details of their probability assessment (eg ambiguity aversion). That bears repeating - humans have preference over their state of knowledge. The core of this is that "assessment of fact" and "values" are not disconnected in humans, not disconnected at all. Humans feel good when a team they support wins, without them contributing anything to the victory. They will accept false compliments, and can be flattered. Social pressure changes most values quite easily. Need I go on? A utility function which, if implemented by the AI, would result in a positive, fulfilling, worthwhile existence for humans. Even if humans had a utility, it's not clear that a ruling FAI should have the same one, incidentally. The utility is for the AI, and it aims to capture as much of human value as possible - it might just be the utility of a nanny AI (make reasonable efforts to keep humanity from developing dangerous AIs, going extinct, or regressing technologically, otherwise, let them be).
0Squark7yYou still haven't defined "follow a utility function". Humans are not ideal rational optimizers of their respective utility functions. It doesn't mean they don't have them. Deep Blue often plays moves which are not ideal, nevertheless I think it's fair to say it optimizes winning. If you make intransitive choices, it doesn't mean your terminal values are intransitive. It means your choices are not optimal. This is probably the case. However, the changes are slow, otherwise humans wouldn't behave coherently at all. The human utility function is only defined approximately, but the FAI problem only makes sense in the same approximation. In any case, if you're programming an AI you should equip it with the utility function you have at that moment. Why do you think it is inconsistent with having a utility function? How can you know that a given utility function has this property? How do you know the utility function I'm proposing doesn't have this property? Isn't it? Assume your utility function is U. Suppose you have the choice to create a superintelligence optimizing U or a superintelligence optimizing something other than U, let say V. Why would you choose V? Choosing U will obviously result in an enormous expected increase of U, which is what you want to happen, since you're a U-maximizing agent. Choosing V will almost certainly result in a lower expectation value of U: if the V-AI chooses strategy X that leads to higher expected U than the strategy that would be chosen by a U-AI then it's not clear why the U-AI wouldn't choose X.
2Stuart_Armstrong7yThen why claim that they have one? If humans have intransitive preferences (A>B>C>A), as I often do, then why claim that actually their preferences are secretly transitive but they fail to act on them properly? Nothing we know about the brain points to there being a hidden box with a pristine and pure utility function, that we then implement poorly.
1Stuart_Armstrong7yThey have preferences like ambiguity aversion, eg being willing to pay to find out, during a holiday, whether they were accepted for a job, while knowing that they can't make any relevant decisions with that early knowledge. This is not compatible with following a standard utility function.
0Squark7yI don't know what you mean by "standard" utility function. I don't even know what you mean by "following". We want to find out since uncertainty makes you nervous, being nervous is unpleasant and pleasure is a terminal value. It is entirely consistent with having a utility function and with my formalism in particular. In what epistemology are you asking this question? That is, what is the criterion according to which the validity of answer would be determined? If you don't think human preferences are "secretly transitive", then why do you suggest the following: What is the meaning of asking a person to resolve intransitivities if there are no transitive preferences underneath?
0Stuart_Armstrong7yThose are questions for you, not for me. You're claiming that humans have a hidden utility function. What do you mean by that, and what evidence do you have for your position?
0Squark7yI'm claiming that it is possible to define the utility function of any agent. For unintelligent "agents" the result is probably unstable. For intelligent agents the result should be stable. The evidence is that I have a formalism which produces this definition in a way compatible with intuition about "agent having a utility function". I cannot present evidence which doesn't rely on intuition since that would require having another more fundamental definition of "agent having a utility function" (which AFAIK might not exist). I do not consider this to be a problem since all reasoning falls back to intuition if you ask "why" sufficiently many times. I don't see any meaningful definition of intelligence or instrumental rationality without a utility function. If we accepts humans are (approximately) rational / intelligent, they must (in the same approximation) have utility functions. It also seems to me (again, intuitively) that the very concept of "preference" is incompatible with e.g. intransitivity. In the approximation it makes sense to speak of "preferences" at all, it makes sense to speak of preferences compatible with the VNM axioms ergo utility function. Same goes for the concept of "should". If it makes sense to say one "should" do something (for example build a FAI), there must be a utility function according to which she should do it. Bottom line, eventually it all hits philosophical assumptions which have no further formal justification. However, this is true of all reasoning. IMO the only valid method to disprove such assumptions is either by reductio ad absurdum or by presenting a different set of assumptions which is better in some sense. If you have such an alternative set of assumption for this case or a wholly different way to resolve philosophical questions, I would be very interested to know.
0Stuart_Armstrong7yIt is trivially possible to do that. Since no choice is strictly identical, you just add enough details to make each choice unique, and then choose a utility function that will always reach that choice ("subject has a strong preference for putting his left foot forwards when seeing an advertisement for deodorant on Tuesday morning that are the birthdays of prominent Dutch politicians"). A good simple model of human behaviour is that of different modules expressing preferences and short-circuiting the decision making in some circumstances, and a more rational system ("system 2") occasionally intervening to prevent loss through money pumps. So people are transitive in their ultimate decisions, often and to some extent, but their actual decisions depend strongly on which choices are presented first (ie their low level preferences are intransitive, but the rational part of them prevents loops). Would you say these beings have no preferences?
0Squark7yMy formalism doesn't work like that since the utility function is a function over possible universes, not over possible choices. There is no trivial way to construct a utility function wrt which the given agent's intelligence is close to maximal. However it still might be the case we need to give larger weight to simple utility functions (otherwise we're left with selecting a maximum in an infinite set and it's not clear why it exists). As I said, I don't have the final formula. I'd say they have a utility function. Image a chess AI that selects moves by one of two strategies. The first strategy ("system 1") uses simple heuristics like "check when you can" that produce an answer quickly and save precious time. The second strategy ("system 2") runs a minimax algorithm with a 10-move deep search tree. Are all of the agent's decisions perfectly rational? No. Does it have a utility function? Yes: winning the game.
0pengvado7yThere are many such operators, and different ones give different answers when presented with the same agent. Only a human utility function distinguishes the right way of interpreting a human mind as having a utility function from all of the wrong ways of interpreting a human mind as having a utility function. So you need to get a bunch of Friendliness Theory right before you can bootstrap.
0Squark7yWhy do you think there are many such operators? Do you believe the concept of "utility function of an agent" is ill-defined (assuming the "agent" is actually an intelligent agent rather than e.g. a rock)? Do you think it is possible to interpret a paperclip maximizer as having a utility function other than maximizing paperclips?
0Stuart_Armstrong7yDeducing the correct utility of a utility maximiser is one thing (which has a low level of uncertainty, higher if the agent is hiding stuff). Assigning a utility to an agent that doesn't have one is quite another. See http://lesswrong.com/lw/6ha/the_blueminimizing_robot/ [http://lesswrong.com/lw/6ha/the_blueminimizing_robot/] Key quote:
0Squark7yReplied in the other thread.
[-][anonymous]7y 0

After refining my thoughts, I think I see the problem:

1: The Banner AI must ban all transmissions of naughty Material X.

1a: Presumably, the Banner must also ban all transmissions of encrypted naughty Material X.

2: The people the Banner AI is trying to ban from sending naughty transmissions have an entire field of thought (knowledge of human values) the AI is not allowed to take into account: It is secret.

3: Presumably, the Banner AI has to allow some transmissions. It can't just shut down all communications.

Edit: 4: The Banner AI needs a perfect success ... (read more)

what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other

There's a heuristic I use to distinguish between the two that works fairly well: in erotica, the participants are the focus of the scene. In pornography, the camera (and by implication the viewer) is the true focus of the scene.

That being said, I have a suspicion that trying to define the difference explicitly is a wrong question. People seem to use a form of fuzzy logic[1] when thinking about the two. What we're really looking at is gradatio... (read more)

0Stuart_Armstrong7yThis seems like a very high level solution - I don't think "where is the real focus of the scene (in a very abstract sense)" is simpler than "is this pornography".
0Gunnar_Zarncke7yYour heuristic is bound to be gamed for. But that is a problem of any definition that isn't true to the underlying complex value function.
0Error7yI agree. I wasn't suggesting it for serious, literal use; that's why I specified that it was a heuristic.

Well, there's Umberto Eco's famous essay on the subject. (The essay is not long so read the whole thing.)

One notable thing about his criterion is that it makes no reference to nudity, thus it's a horrendous predictor on the set of all possible movies, it just happens to work well on the subset of possible movies a human would actually want to watch.

Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.

I have no idea what distinction you're trying to draw here. And I say this ... (read more)

[+][anonymous]7y -9