Erhannis

Posts

Sorted by New

Wiki Contributions

Comments

While I (a year late) tentatively agree with you (though a million years of suffering is a hard thing to swallow compared to the instinctually almost mundane matter of death) I think there's an assumption in your argument that bears inspection.  Namely, I believe you are maximizing happiness at a given instance in time - the present, or the limit as time approaches infinity, etc.  (Or, perhaps, you are predicating the calculations on the possibility of escaping the heat death of the universe, and being truly immortal for eternity.)  A (possibly) alternate optimization goal - maximize human happiness, summed over time.  See, I was thinking, the other day, and it seems possible we may never evade the heat death of the universe.  In such a case, if you only value the final state, nothing we do matters, whether we suffer or go extinct tomorrow.  At the very least, this metric is not helpful, because it cannot distinguish between any two states.  So a different metric must be chosen.  A reasonable substitute seems to me to be to effectively take the integral of human happiness over time, sum it up.  The happy week you had last week is not canceled out by a mildly depressing day today, for instance - it still counts.  Conversely, suffering for a long time may not be automatically balanced out the moment you stop suffering (though I'll grant this goes a little against my instincts).  If you DO assume infinite time, though, your argument may return to being automatically true.  I'm not sure that's an assumption that should be confidently made, though.  If you don't assume infinite time, I think it matters again what precise value you put on death, vs incredible suffering, and that may simply be a matter of opinion, of precise differences in two people's terminal goals.

(Side note: I've idly speculated about expanding the above optimization criteria for the case of all-possible-universes - I forget the exact train of thought, but it ended up more or less behaving in a manner such that you optimize the probability-weighted ratio of good outcomes to bad outcomes (summed across time, I guess).  Needs more thought to become more rigorous etc.)

"Complex" doesn't imply "hard to emulate".  We likely won't need to understand the encoded systems, just the behavior of the neurons.  In high school I wrote a simple simulator of charged particles - the rules I needed to encode were simple, but it displayed behavior I hadn't programmed in, nor expected, but which were, in fact, real phenomena that really happen.

I agree with you almost perfectly.  I'd been working on a (very long-shot) plan for it, myself, but having recently realized that other people may be working on it, too, I've started looking for them.  Do you (or anyone reading this) know of anybody seriously willing to do, or already engaged in, this avenue of work?  Specifically, working towards WBE, with the explicit intent of averting unaligned AGI disaster.

It bears mention that, compared to the median predicted unaligned AGI, I'd hands-down accept Hitler as supreme overlord.  It seems probable that humans would still exist under Hitler, and in a fairly recognizable form, even if there were many troubling things about their existence.  Furthermore, I suspect that an average human would be better than Hitler, and I'm fairly optimistic that most individuals striving to prevent the AGI apocalypse would make for downright pleasant overseers (or whatever).

I'm not convinced "want to modify their utility functions" is the perspective most useful.  I think it might be more helpful to say that we each have multiple utility functions, which conflict to varying degrees and have voting power in different areas of the mind.  I've had first-hand experience with such conflicts (as essentially everyone probably has, knowingly or not), and it feels like fighting yourself.  I wish to describe a hypothetical example.  "Do I eat that extra donut?".  Part of you wants the donut; the part feels like more of an instinct, a visceral urge.  Part of you knows you'll be ill afterwards, and will feel guilty about cheating your diet; this part feels more like "you", it's the part that thinks in words.  You stand there and struggle, trying to make yourself walk away, as your hand reaches out for the donut.  I've been in similar situations where (though I balked at the possible philosophical ramifications) I felt like if I had a button to make me stop wanting the thing, I'd push it - yet often it was the other function that won.  I feel like if you gave an agent the ability to modify their utility functions, the one that would win depends on which one had access to the mechanism (do you merely think the thought? push a button?), and whether they understand what the mechanism means.  (The word "donut" doesn't evoke nearly as strong a reaction as a picture of a donut, for instance; your donut-craving subsystem doesn't inherently understand the word.)

Contrarily, one might argue that cravings for donuts are more hardwired instincts than part of the "mind", and so don't count...but I feel like 1. finding a true dividing line is gonna be real hard, and 2. even that aside, I expect many/most people have goals localized in the same part of the mind that nevertheless are not internally consistent, and in some cases there may be reasonable sounding goals that turn out to be completely incompatible with more important goals.  In such a case I could imagine an agent deciding it's better to stop wanting the thing they can't have.

I feel like the concept of "neural address" is incompletely described, and the specifics may matter.  For example, a specific point in the skull, yeah, is a bad way to address a specific concept, between individuals.  However, there might be, say, particular matching structures that tend to form around certain ideas, and searching on those structures might be a better way of addressing a particular concept.  (Probably still not good, but it hints in the direction that there may be better ways of formulating a neural address that maybe WOULD be sufficiently descriptive.  I don't know any particularly good methods, of the top of my head, though, and your point may turn out correct.)

"When you are finished reading this, you will see Bayesian problems in your dreams."

Whaddaya know; he was right.

Also, yes the other version (on Arbital) is better, with more information - though this one has a point or two that aren't in the other version, like the discussion of degrees of freedom.

Ah, thanks; that looks pretty relevant. I'll try to read it in the next day or so.

Yeah, and it would also cut out close contact with a number of people. It's actually looking pretty likely the second of the mentioned alternatives will happen (assuming I go at all) - it's possible I'll stay at home with an immediate family member, rather than go to Utah. This reduces my close contact from ~20 people (in ~4 families) down to like 3 people (1 family), and should significantly reduce the odds of my catching it. I started writing a huge response, but moved it into an update.

I think my own spin on the incorrectness of the article would be, I think some forms of procrastination and laziness are valuable. Sweeping every day will only make the floor so clean. Some tasks truly DO go away if you ignore them long enough.

...But overall, I do firmly agree with the intent of your article.

Load More