All of DefectiveAlgorithm's Comments + Replies

I didn't say I knew which parts of the brain would differ, but to conclude therefore that it wouldn't is to confuse the map with the territory.

We can't conclude that they would not differ. We could postulate it and then ask: could we measure if equal copies have equal qualia. And we can't measure it. And here we return to "hard question": we don't know if different qualia imply different atom's combinations.

I would like to suggest zombies of second kind. This is a person with inverted spectrum. It even could be my copy, which speaks all the same philosophical nonsense as me, but any time I see green, he sees red, but names it green. Is he possible?

Such an entity is possible, but would not be an atom-exact copy of you.

We don't know how qualia are encoded in the brain. And how to distinguish a person and his copy with inverted spectrum.

...Has someone been mass downvoting you?

Yup. For quite a long time and with ever-increasing frequency. (Lots of sockpuppets.)
Yes, The Artist Formerly Known As Eugine_Nier is doing it.
Yes. We are working on it.

What if you're like me and consider it extremely implausible that even a strong superintelligence would be sentient unless explicitly programmed to be so (or at least deliberately created with a very human-like cognitive architecture), and that any AI that is sentient is vastly more likely than a non-sentient AI to be unfriendly?

I think you would be relatively exceptional, at least in how you would be suggesting that one should treat a sentient AI, and so people like you aren't likely to be the determining factor in whether or not an AI is allowed out of the box.

I've never heard of 'Dust Theory' before, but I should think it follows trivially from most large multiverse theories, does it not?

Nah, it's not just vanilla anthropics, it also contains the claim that measure doesn't matter and/or causal structure doesn't matter. Since these claims don't seem to hold for me, I'm not going to migrate to the dust anytime soon.

Trigger warning: memetic hazard.

Abj guvax nobhg jung guvf zrnaf sbe nalbar jub unf rire qvrq (be rire jvyy).

I'm not too concerned, but primarily because I still have a lot of uncertainty as to how to approach that sort of question. My mind still spits out some rather nasty answers.

EDIT: I just realized that you were probably intentionally implying exactly what I just said, which makes this comment rather redundant.

I assume that inside the simulation spaces of Cthulhu, you are going to be on some level aware of all the deaths that you have already experienced, and the ones that await you. Otherwise you are clearly not suffering enough. :-)
Zl nccebnpu gb gur fb-pnyyrq dhnaghz vzzbegnyvgl: Vs lbh qvr va avargl-avar crepragf bs Rirergg oenapurf, naq fheivir va bar creprag, sebz gur cbvag bs ivrj va gung bar-creprag oenapu, lbh fheivirq, ohg sebz gur cbvag bs ivrj JURER LBH NER ABJ (juvpu vf gur bar lbh fubhyq hfr), lbh ner avargl-avar creprag qrnq. Gurersber, dhnaghz vzzbegnyvgl vf n fryrpgvba ovnf rkcrevraprq ol crbcyr va jrveq jbeyqf; sebz bhe cbvag bs ivrj, vg cenpgvpnyyl qbrf abg rkvfg, naq lbh fvzcyl qvr naq prnfr gb rkvfg. V qba'g cergraq gb haqrefgnaq pbzcyrgryl jung guvf zrnaf -- va fbzr frafr, nyy cbffvoyr pbasvthengvbaf bs cnegvpyrf "rkvfg" fbzrjurer va gur gvzryrff culfvpf, naq vs gurl sbez n fragvrag orvat, gung orvat vf cresrpgyl erny sebz gurve bja cbvag bs ivrj (juvpu vf ABG bhe cbvag bs ivrj) -- ohg va gur fcvevg bs "vg nyy nqqf hc gb abeznyvgl", jr fubhyq bayl pner nobhg pbasvthengvbaf juvpu sbyybj sebz jurer jr ner abj, naq jr fubhyq bayl pner nobhg gurz nf zhpu, nf ynetr vf gur senpgvba bs bhe nzcyvghqr juvpu sybjf gb gurz. Gur senpgvba tbvat gb urnira/uryy jbeyqf sebz zl pheerag jbeyq vf sbe nyy cenpgvpny checbfrf mreb, gurersber V jvyy gerng vg nf mreb. Qbvat bgurejvfr jbhyq or yvxr cevivyrtvat n ulcbgurfvf; vs V gnxr nyy pbcvrf bs "zr-zbzragf", jrvtugrq ol ubj zhpu nzcyvghqr gurl unir, gur infg znwbevgl bs gurz yvir cresrpgyl beqvanel yvirf. Gubfr pbcvrf jvgu snagnfgvpnyyl ybat yvirf nyfb unir snagnfgvpnyyl fznyy nzcyvghqrf ng gur ortvaavat, fb vg pnapryf bhg. Vs gurer vf n snagnfgvpnyyl ybat yvsr npuvrinoyr ol angheny zrnaf, fhpu nf pelbavpf be zvaq hcybnqvat, fhpu "zr-zbzragf" jvyy unir zber nzcyvghqr guna gur snagnfgvpnyyl ybat yvirf npuvrirq ol zvenphybhf zrnaf. Ohg fgvyy, rira gurfr anghenyyl ybat yvirf jvyy zbfg yvxryl bayl trg n fznyy senpgvba bs gur nzcyvghqr V unir urer naq abj; zbfg bs zl shgher zr'f jvyy or qrnq. gy;qe -- V qvqa'g bevtvanyyl jnag gb fgneg n qrongr ba guvf gbcvp, bayl gb abgr gung sbe "rkcybengvba bs cbffvoyr raqvatf" lbh qba'g npghnyyl arrq n fvzhyngbe

What bullet is that? I implicitly agreed that murder is wrong (as per the way I use the word 'wrong') when I said that your statement wasn't a misinterpretation. It's just that as I mentioned before, I don't care a whole lot about the thing that I call 'morality'.

What I meant when I called myself a nihilist was essentially that there was no such thing as an objective, mind-independent morality. Nothing more. I would still consider myself a nihilist in that sense (and I expect most on this site would), but I don't call myself that because it could cause confusion.

Can you explain how the statement 'A world in which everyone but me does not murder is preferable to a world in which everyone including me does not murder' is a misinterpretation of this quotation?

It isn't, although that doesn't mean I would necessaril... (read more)

I agree that morality is not in the quarks. That doesn't seem like a huge bullet to bite? ['s_law]

That's my point. You're saying the 'nihilists' are wrong, when you may in fact be disagreeing with a viewpoint that most nihilists don't actually hold on account of them using the words 'nihilism' and/or 'morality' differently to you. And yeah, I suppose in that sense my 'morality' does tie into my actual values, but only my values as applied to an unrealistic thought experiment, and then again a world in which everyone but me adhered to my notions of morality (and I wasn't penalized for not doing so) would still be preferable to me than a world in which everyone including me did.

But you still have yet to explicitly describe what you mean by nihilism. Could you? How have I misrepresented whom you believe to be the average self-identifying nihilist? Can you explain how the statement 'A world in which everyone but me does not murder is preferable to a world in which everyone including me does not murder' is a misinterpretation of this quotation?

I mean that what I call my 'morality' isn't intended to be a map of my utility function, imperfect or otherwise. Along the same lines, you're objecting that self-proclaimed moral nihilists have an inaccurate notion of their own utility function, when it's quite possible that they don't consider their 'moral nihilism' to be a statement about their utility function at all. I called myself a moral nihilist for quite a while without meaning anything like what you're talking about here. I knew that I had preferences, I knew (roughly) what those preferences were... (read more)

It sounds like you agree with me, but are just using the words morality and nihilism differently, and are particularly using nihilism in a way that I don't understand or that you have yet to explicate. It also seems to me that you're already talking about what you value when you talk about desirable worlds.

Personally, when I use the word 'morality' I'm not using it to mean 'what someone values'. I value my own morality very little, and developed it mostly for fun. Somewhere along the way I think I internalized it at least a little, but it still doesn't mean much to me, and seeing it violated has no perceivable impact on my emotional state. Now, this may just be unusual terminology on my part, but I've found that a lot of people at least appear based on what they say about 'morality' to be using the term similarly to myself.

You say what you do not mean by 'morality,' but not what you do mean. If you mean that you have a verbal, propositional sort of normative ethical theory that you have 'developed mostly for fun and the violation of which has no perceivable impact on your emotional state,' then that does not mean that you are lacking in morality, it just means that your verbal normative theory is not in line with your wordless one. I do not believe that there is an arbitrary thing that you currently truly consider horrifying that you could stop experiencing as horrifying by the force of your will; or that there is an arbitrary horrible thing that you could prevent that would currently cause you to feel guilty for not preventing, and that you could not-prevent that horrible thing and stop experiencing the subsequent guilt by the force of your will. I do not believe that your utility function is open season.

I think a big part of it is that I don't really care about other people except instrumentally. I care terminally about myself, but only because I experience my own thoughts and feelings first-hand. If I knew I were going to be branched, then I'd care about both copies in advance as both are valid continuations of my current sensory stream. However, once the branch had taken place, both copies would immediately stop caring about the other (although I expect they would still practice altruistic behavior towards each other for decision-theoretic reasons). I s... (read more)

Approximately the same extent to which I'd consider myself to exist in the event of any other form of information-theoretic death. Like, say, getting repeatedly shot in the head with a high powered rifle, or having my brain dissolved in acid.

Right. This is why I said that total obliviation is worse than death. Not only are you removed, you can later be used to support purposes outright opposed to your goals, as Harry intends to do with Voldemort.

I mean the sufficiency of the definition given. Consider a universe which absolutely, positively, was not created by any sort of 'god', the laws of physics of which happen to be wired such that torturing people lets you levitate, regardless of whether the practitioner believes he has any sort of moral justification for the act. This universe's physics are wired this way not because of some designer deity's idea of morality, but simply by chance. I do not believe that most believers in objective morality would consider torturing people to be objectively good in this universe.

Hm. I'll acknowledge that's consistent (though I maintain that calling that 'morality' is fairly arbitrary), but I have to question whether that's a charitable interpretation of what modern believers in objective morality actually believe.

If you actually believe that burning a witch has some chance of saving her soul from eternal burning in hell (or even only provide a sufficient incentive for others to not agree to pacts with Satan and so surrender their soul to eternal punishment), wouldn't you be morally obligated to do it?

Ok, I understand it in that context, as there are actual consequences. Of course, this also makes the answer trivial: Of course it's relevant, it gives you advantages you wouldn't otherwise have. Though even in the sense you've described, I'm not sure whether the word 'morality' really seems applicable. If torturing people let us levitate, would we call that 'objective morality'?

EDIT: To be clear, my intent isn't to nitpick. I'm simply saying that patterns of behavior being encoded, detected and rewarded by the laws of physics doesn't obviously seem to equate those patterns with 'morality' in any sense of the word that I'm familiar with.

Sure, see e.g. good Christians burning witches.

I have no idea what 'there is an objective morality' would mean, empirically speaking.

"There is objective morality" basically means that morality is part of physics and just like there are natural laws of, say, gravity or electromagnetism, there are natural laws of morals because the world just works that way. Consult e.g. Christian theology for details. Think of a system where, for example, a yogin can learn to levitate (which is a physical phenomenon) given that he diligently practices and leads a moral life. If he diligently practices but does not lead a moral life, he doesn't get to levitate. In such a system morality would be objective. Note that this comment is not saying that objective morality exists, it just attempts to explain what the concept means.

More concerning to me than outright unfriendly AI is AI the creators of which attempted to make it friendly but only partially succeeded such that our state is relevant to its utility calculations but not necessarily in ways we'd like.

I don't think Harry meant to imply that actually running this test would be nice, but rather that one cannot even think of running this test without first thinking of the possibility of making a horcrux for someone else (something which is more-or-less nice-ish in itself, the amorality inherent in creating a horcrux at all notwithstanding).

0Adam Zerner8y
I wonder why EY chose to use this example. It seems that a big reason he writes the book is to promote rationality and goodness. This seems like a huge opportunity to make the point that "the otherwise smart dark wizard is missing out by not being good". But the point is much less clear because of the fact that actually running the test wouldn't really be that nice.

A paperclip maximizer won't wirehead because it doesn't value world states in which its goals have been satisfied, it values world states that have a lot of paperclips.

In fact, taboo 'values'. A paperclip maximizer is an algorithm the output of which approximates whichever output leads to world states with the greatest expected number of paperclips. This is the template for maximizer-type AGIs in general.

I am not as confident as you that valuing worlds with lots of paperclips will continue once an AI goes from "kind of dumb AI" to "super-AI." Basically, I'm saying that all values are instrumental values and that only mashing your "value met" button is terminal. We only switched over to talking about values to avoid some confusion about reward mechanisms. This is a definition of paperclip maximizers. Once you try to examine how the algorithm works you'll find that there must be some part which evaluates whether the AI is meeting it's goals or not. This is the thing that actually determines how the AI will act. Getting a positive response from this module is what the AI is actually going for (is my contention). The actions that configure world states will only be relevant to the AI insofar as they trigger this positive response from this module. Since we already have infinitely able to self modify as a given in this scenario, why wouldn't the AI just optimize for positive feedback? Why continue with paperclips?

Because I terminally value the uniqueness of my identity.

Really? Can you say a little more about why you think you have that value? I guess I'm not convinced that it's really a terminal value if it varies so widely across people of otherwise similar beliefs. Presumably that's what lalartu meant as well, but I just don't get it. I like myself, so I'd like more of myself in the world!

What would an AI that 'cares' in the sense you spoke of be able to do to address this problem that a non-'caring' one wouldn't?

Kind of. I wouldn't defect against my copy without his consent, but I would want the pool trimmed down to only a single version of myself (ideally whichever one had the highest expected future utility, all else equal). The copy, being a copy, should want the same thing. The only time I wouldn't be opposed to the existence of multiple instances of myself would be if those instances could regularly synchronize their memories and experiences (and thus constitute more a single distributed entity with mere synchronization delays than multiple diverging entities).

Why would you want to actively avoid having a copy?

Leaving aside other matters, what does it matter if an FAI 'cares' in the sense that humans do so long as its actions bring about high utility from a human perspective?

Because what any human wants is a moving target. As soon as someone else delivers exactly what you ask for, you will be disappointed unless you suddenly stop changing. Think of the dilemma of eating something you know you shouldn't. Whatever you decide, as soon as anyone (AI or human) takes away your freedom to change your mind, you will likely rebel furiously. Human freedom is a huge value that any FAI of any description will be unable to deliver until we are no longer free agents.

This post starts off on a rather spoiler-ish note.

My first thought (in response to the second question) is 'immediately terminate myself, leaving the copy as the only valid continuation of my identity'.

Of course, it is questionable whether I would have the willpower to go through with it. I believe that my copy's mind would constitute just as 'real' a continuation of my consciousness as would my own mind following a procedure that removed the memories of the past few days (or however long since the split) whilst leaving all else intact (which is of course just a contrived-for-the-sake-of-the-thought-experiment variety of the sort of forgetting that we undergo all the time), but I have trouble alieving it.

This is a lot more interesting a response if you would also agree with Lalartu in the more general case.

Even leaving aside the matters of 'permission' (which lead into awkward matters of informed consent) as well as the difficulties of defining concepts like 'people' and 'property', define 'do things to X'. Every action affects others. If you so much as speak a word, you're causing others to undergo the experience of hearing that word spoken. For an AGI, even thinking draws a miniscule amount of electricity from the power grid, which has near-negligible but quantifiable effects on the power industry which will affect humans in any number of different ways. I... (read more)

I know what terminal values are and I apologize if the intent behind my question was unclear. To clarify, my request was specifically for a definition in the context of human beings - that is, entities with cognitive architectures with no explicitly defined utility functions and with multiple interacting subsystems which may value different things (ie. emotional vs deliberative systems). I'm well aware of the huge impact my emotional subsystem has on my decision making. However, I don't consider it 'me' - rather, I consider it an external black box which i... (read more)

That upon ideal rational deliberation and when having all the relevant information, a person will choose to pursue pleasure as a terminal value.

Can you define 'terminal values', in the context of human beings?

Terminal values are what are sought for their own sake, as opposed to instrumental values, which are sought because they ultimately produce terminal values.

If the universe is infinite, then there are infinitely many copies of me, following the same algorithm

Does this follow? The set of computable functions is infinite, but has no duplicate elements.

The measure of simple computable functions is probably larger than the measure of complex computable functions and I probably belong to the simpler end of computable functions.

"Comments (1)"

"There doesn't seem to be anything here."


What Vladimir_Nesov said. The single comment was spam, I banned it.

A spammer was banned. Deleted comments still count towards the total, see bug 330 on the issue tracker.

Maybe someone got shadowbanned? Reddit does that to fight spam and (if I understand correctly) Lesswrong is based on the Reddit source code.
I, too, am confused.

I think this should get better and better for P1 the closer P1 gets to (2/3)C (1/3)B (without actually reaching it).

I do think 'a disagreement on utility calculations' may indeed be a big part of it. Are you a total utilitarian? I'm not. A big part of that comes from the fact that I don't consider two copies of myself to be intrinsically more valuable than one - perhaps instrumentally valuable, if those copies can interact, sync their experiences and cooperate, but that's another matter. With experience-syncing, I am mostly indifferent to the number of copies of myself to exist (leaving aside potential instrumental benefits), but without it I evaluate decreasing utility... (read more)

That line of thinking leads directly to recommending immediate probabilistic suicide, or at least indifference to it. No thanks.

Well, ok, but if you agree with this then I don't see how you can claim that such a system would be particularly useful for solving FAI problems.

Well, I don't know about the precise construction that would be used. Certainly I could see a human being deliberately focusing the system on some things rather than others.

Ok, but a system like you've described isn't likely to think about what you want it to think about or produce output that's actually useful to you either.

Well yes. That's sort of the problem with building one. Utility functions are certainly useful for specifying where logical uncertainty should be reduced.

an Oracle AI you can trust

That's a large portion of the FAI problem right there.

EDIT: To clarify, by this I don't mean to imply that FAI is easy, but that (trustworthy) Oracle AI is hard.

In-context, what was meant by "Oracle AI" is a very general learning algorithm with some debug output, but no actual decision-theory or utility function whatsoever built in. That would be safe, since it has no capability or desire to do anything.

No. Clippy cannot be persuaded away from paperclipping because maximizing paperclips is its only terminal goal.


If acquiring bacon was your ONLY terminal goal, then yes, it would be irrational not to do absolutely everything you could to maximize your expected bacon. However, most people have more than just one terminal goal. You seem to be using 'terminal goal' to mean 'a goal more important than any other'. Trouble is, no one else is using it this way.

EDIT: Actually, it seems to me that you're using 'terminal goal' to mean something analogous to a terminal node in a tree search (if you can reach that node, you're done). No one else is using it that way either.

Feel free to offer the correc definition. But note that you came define it as overridable, since non terminal goals are already defined that way. There is no evidence that people have one or more terminal goals . At least you need to offer a definition such that multiple TGs don't collide, and are distinguishable from non TGs.

Consider an agent trying to maximize its Pacman score. 'Getting a high Pacman score' is a terminal goal for this agent - it doesn't want a high score because that would make it easier for it to get something else, it simply wants a high score. On the other hand, 'eating fruit' is an instrumental goal for this agent - it only wants to eat fruit because that increases its expected score, and if eating fruit didn't increase its expected score then it wouldn't care about eating fruit.

That is the only difference between the two types of goals. Knowing that one of an agent's goals is instrumental and another terminal doesn't tell you which goal the agent values more.

a terminal goal of interpreting instructions correctly

There is a huge amount of complexity hidden beneath this simple description.

I'll say it again: absolute complexity is not relative complexity. Everything in AGI us very complex in absolute teams. In relative terms, language is less complex than language+morality

Isn't this equivalent to total utilitarianism that only takes into account the utility of already extant people? Also, isn't this inconsistent over time (someone who used this as their ethical framework could predict specific discontinuities in their future values)?

I suppose you could say that it's equivalent to "total utilitarianism that only takes into account the utility of already extant people, and only takes into account their current utility function [at the time the decision is made] and not their future utility function". (Under mere "total utilitarianism that only takes into account the utility of already extant people", the government could wirehead its constituency.) -------------------------------------------------------------------------------- Yes, this is explicitly inconsistent over time. I actually would argue that the utility function for any group of people will be inconsistent over time (as preferences evolve, new people join, and old people leave) and any decision-making framework needs to be able to handle that inconsistency intelligently. Failure to handle that inconsistency intelligently is what leads to the Repugnant Conclusions.

The primary issue? No matter how many times I read your post, I still don't know what your claim actually is.

No, but I think there are times that semantics matter. []

While I do still find myself quite uncertain about the concept of 'quantum immortality', not to mention the even stronger implications of certain multiverse theories, these don't seem to be the kind of thing that you're talking about. I submit that 'there is an extant structure not found within our best current models of reality isomorphic to a very specific (and complex) type of computation on a very specific (and complex) set of data (ie your memories and anything else that comprises your 'identity')' is not a simple proposition.

After reading this, I became incapable of giving finite time estimates for anything. :/

Isn't expected value essentially 'actual value, to the extent that it is knowable in my present epistemic state'? Expected value reduces to 'actual value' when the latter is fully knowable.

EDIT: Oh, you said this in the post. This is why I should read a post before commenting on it.

This is (one of the reasons) why I'm not a total utilitarian (of any brand). For future versions of myself, my preferences align pretty well with average utilitarianism (albeit with some caveats), but I haven't yet found or devised a formalization which captures the complexities of my moral intuitions when applied to others.

Could you explain? Those sound like awfully big caveats. If I consider the population of "future versions of myself" as unchangeable, then average utilitarianism and total utilitarianism are equivalent. If I consider that population as changeable, then average utilitarianism seems to suggest changing it by removing the ones with lowest utility: e.g. putting my retirement savings on the roulette wheel and finding some means of painless suicide if I lose.

A proper theory of population ethics should be complex, as our population intuitions are complex...

This sounds a lot like quantum suicide, except... without the suicide. So those versions of yourself who don't get what they want (which may well be all of them) still end up in a world where they've experienced not getting what they want. What do those future versions of yourself want then?

EDIT: Ok, this would have worked better as a reply to Squark's scenario, but it still applies whenever this philosophy of yours is applied to anything directly (in the practical sense) observable.

0Scott Garrabrant9y
I think you are misunderstanding me Also see this comment from Squark in the other thread It has nothing to do with wanting one world more than another. It is all about thinking that one world is more important than another. If I observe that I am not in an important world, I work to make the most important world that I can change as good as possible.

If L-zombies have conscious experience (even when not being 'run'), does the concept even mean anything? Is there any difference, even in principle, between such an L-zombie and a 'real' person?

Load More