All of grobstein's Comments + Replies

I read on r/MagicArena that, at least based on public information from Wizards, we don't *know* that "You draw two hands, and it selects the hand with the amount of lands closest to the average for your deck."

What we know is closer to: "You draw two hands, and there is some (unknown, but possibly not absolute) bias towards selecting the hand with the amount of lands closest to the average for your deck."

I take it that, if the bias is less than absolute, the consequences for deck-building are in the same direction but less extreme.

But I don't think "utility function" in the context of this post has to mean, a numerical utility explicitly computed in the code.

It could just be, the agent behaves as-if its utilities are given by a particular numerical function, regardless of whether this is written down anywhere.

People do not behave as if we have utilities given by a particular numerical function that collapses all of their hopes and goals into one number, and machines need not do it that way, either. Often when we act, we end up 25% short of the optimum solution, but we have been hypothesizing systems with huge amounts of computing power. If they frequently end up 25% or even 80% short of behaving optimally, so what? In exchange for an AGI that stays under control we should be willing to make the trade-off. In fact, if their efficiency falls by 95%, they are still wildly powerful. Eliezer and Bostrom have discovered a variety of difficulties with AGIs which can be thought of as collapsing all of their goals into a single utility function. Why not also think about making other kinds of systems? An AGI could have a vast array of hedges, controls, limitations, conflicting tendencies and tropisms which frequently cancel each other out and prevent dangerous action. The book does scratch the surface on these issues, but it is not all about fail-safe mind design and managed roll-out. We can develop a whole literature on those topics.

In humans, goal drift may work as a hedging mechanism.

One possible explanation for the plasticity of human goals is that the goals that change aren't really final goals.

So me-now faces the question,

Should I assign any value to final goals that I don't have now, but that me-future will have because of goal drift?

If goals are interpreted widely enough, the answer should be, No. By hypothesis, those goals of me-future make no contribution to the goals of me-now, so they have no value to me. Accordingly, I should try pretty hard to prevent goal drift and / or reduce investment in the well-being of me-futur... (read more)

The first issue is that you don't know what they will be.

I am not that confident in the convergence properties of self-preservation as instrumental goal.

It seems that at least some goals should be pursued ballistically -- i.e., by setting an appropriate course in motion so that it doesn't need active guidance.

For example, living organisms vary widely in their commitments to self-preservations. One measure of this variety is the variety of lifespans and lifecycles. Organisms generally share the goal of reproducing, and they pursue this goal by a range of means, some of which require active guidance (like teachi... (read more)

Hard to see why you can't make a version of this same argument, at an additional remove, in the time travel case. For example, if you are a "determinist" and / or "n-dimensionalist" about the "meta-time" concept in Eliezer's story, the future people who are lopped off the timeline still exist in the meta-timeless eternity of the "meta-timeline," just as in your comment the dead still exist in the eternity of the past.

In the (seemingly degenerate) hypothetical where you go back in time and change the future, I'm not ... (read more)

Any inference about "what sort of thingies can be real" seems to me premature. If we are talking about causality and space-time locality, it seems to me that the more parsimonious inference regards what sort of thingies a conscious experience can be embedded in, or what sort of thingies a conscious experience can be of.

The suggested inference seems to privilege minds too much, as if to say that only the states of affairs that allow a particular class of computation can possibly be real. (This view may reduce to empiricism, which people like, but... (read more)

Which of these is a major stressor on romantic relationships?

Not that it's happened to me, but I can easily see "autism through vaccination" fitting into the scenario.

(Wikipedia's article on tax incidence claims that employees pay almost all of payroll taxes, but cites a single paper that claims a 70% labor / 30% owner split for corporate income tax burden in the US, and I have no idea how or whether that translates to payroll tax burden or whether the paper's conclusions are generally accepted.)

There's no consensus on the incidence of the corporate income tax in the fully general case. It's split among too many parties.

The USA is not the best place to earn money.2 My own experience suggests that at least Japan, New Zealand, and Australia can all be better. This may be shocking, but young professionals with advanced degrees can earn more discretionary income as a receptionist or a bartender in the Australian outback than as, say, a software engineer in the USA.

As a side question, when did a receptionist or bartender become a "professional"? Is "professional" just used as a class marker, standing for something like "person with a non-vocational ... (read more)

I read it as "young people employed as professionals can make more money by being not-professionals in the Australian outback".

But to many, "professional" merely means "someone who is paid to do something". I think that usage came into the popular consciousness via "professional athlete", though I'm not sure if that's the first instance of the popular usage.

ETA: according to OED, the relevant distinction in this usage is "professional" vs. "amateur", and it was used somewhat in that sense as far back as maybe 1806 (I assert that their earlier citations were meant ironically, or merely by comparison to actual professions).

Note that a lot of the financial benefit described here comes from living somewhere remote -- in particular the housing and food costs. That's the reason for the strenuous warning not to live in "Sidney, Melbourne or any major Australian city." From a larger perspective, it partly accounts for choosing Australia over America (low population density --> low housing costs, etc.).

For a full analysis, the cost differentials of living in the Australian outback vs. an American city (or whatever) have to be decomposed into price level, consumption, ... (read more)

There used to be a special "expatriation tax" that applied only to taxpayers who renounced their (tax) citizenship for tax avoidance purposes. However, under current law, I believe you are treated the same regardless of your reason for renouncing your (tax) citizenship. Here's an IRS page on the subject:,,id=97245,00.html

This is not an area of my expertise, though.

Hi. I am a very occasional participant, mostly because of competing time demands, but I appreciate the work done here and check it out when I can.

If there is an infinite number of conscious minds, how do the anthropic probability arguments work out?

In a big universe, there are infinitely many beings like us.

Caffeine, of course, is rather addictive.

So one might (and I do) find it difficult to optimize finely according to what tasks one is attempting. The addictive nature of the drug probably explains the "always or never" consumption pattern.

Personally I drink 2-3 espressos every other day or so and find it easy to maintain that level. In fact, I wish I were more addicted to caffeine, because I recently realized that I can write better with it, hands down. I would sometimes go for a week or two without drinking much caffeine because I thought it was unnecessary and slightly hurt my stomach, but now I regret neglecting it.

In the wild, people use these gambits mostly for social, rather than argumentative, reasons. If you are arguing with someone and believe their arguments are pathological, and engagement is not working, you need to be able to stop the debate. Hence, one of the above -- this is most clear with "Let's agree to disagree."

In practice, it can be almost impossible to get out of a degrading argument without being somewhat intellectually dishonest. And people generally are willing to be a little dishonest if it will get them out of an annoying and unprod... (read more)

It seems obvious that if the AI has the capacity to torture trillions of people inside the box, it would have the capacity to torture *illions outside the box.

If EY is right, most failures of friendliness will produce an AI uninterested in torture for its own sake. It might try the same trick to escape to the universe simulating this one, but that seems unlikely for a number of reasons. (Edit: I haven't thought about it blackmailing aliens or alien FAIs.)

If that's true, what consequence does it have for your decision?

Agreed. If you are inside a box, the you outside the box did whatever it did. Whatever you do is simply a repetition of a past action. If anything, this would convince me to keep the AI in the box because if I'm a simulation I'm screwed anyway but at least I won't give the AI what it wants. A good AI would hopefully find a better argument.

The difficulty for me is that this technique is at war with having an accurate self-concept, and may conflict with good epistemic hygiene generally. For the program to work, one must seemingly learn to suppress one's critical faculties for selected cases of wishful thinking. This runs against trying to be just the right amount critical when faced with propositions in general. How can someone who is just the right amount critical affirm things that are probably not true?

Generally, I see no conflict here, assuming that the thing you're priming yourself with is not something that might displace your core rationalist foundations. If you're riding a horse, it is epistemically rational to incorporate the knowledge about the horse into your model of the world (to be aware how it will react to a pack of wolves or an attractive mare during a mating season), and it is instrumentally rational to be able to steer the horse where you want it to carry you. Same with your mind -- if you're riding an evolutionary kludge, it is epistemically rational to incorporate the knowledge about the kludge it into your map of reality, and it is instrumentally rational to be able to steer it where you want it to be. What matters is where you draw the line between the agent and the environment.
Is an actor practicing poor epistemic hygiene when they play a role? Refraining from dispute is not the same thing as believing. Not discussing religion with your theist friends is not the same as becoming one yourself.

It's $45 from Amazon. At that price, I'm going to scheme to steal it back first.


Gosh. It's only £17 in the UK. (I wasn't meaning to suggest that you're crazy, but I did wonder about ... hmm, not sure whether there's a standard name for it. Being less prepared to spend X to get Y on account of having done so before and then lost Y. A sort of converse to the endowment effect.)

Rationality is made of win.



Eliezer's argument, if I understand it, is that any decision-making algorithm that results in two-boxing is by definition irrational due to giving a predictably bad outcome.

So he's assuming the conclusion that you get a bad outcome? Golly.

True, we don't know the outcome. But we should still predict that it will be bad, due to Omega's 99% accuracy rate. Don't mess with Omega.
The result of two-boxing is a thousand dollars. The result of one-boxing is a million dollars. By definition, a mind that always one-boxes receives a better payout than one that always two-boxes, and therefore one-boxing is more rational, by definition.

This premise is not accepted by the 1-box contingent. Occasionally they claim there's a reason.

Can you please elaborate? I'm trying to catch up!
You mean they don't accept that the decision doesn't affect what's in box B?

Please ... Newcomb is a toy non-mathematizable problem and not a valid argument for anything at all.


As far as I can tell, Newcomb problem exists only in English, and only because a completely aphysical causality loop is introduced. Every mathematization I've ever seen collapses it to either trivial one-boxing problem, or trivial two-boxing problem. If anybody wants this problem to be treated seriously, maths first to show the problem is real! Otherwise, we're really not much better than if we were discussing quotes from the Bible.

Actually the problem is an ambiguity in "right" -- you can take the "right" course of action (instrumental rationality, or ethics), or you can have "right" belief (epistemic rationality).

Here's a functional difference: Omega says that Box B is empty if you try to win what's inside it.

Yes! This functional difference is very important! In Logic, you begin with a set of non-contradicting assumptions and then build a consistent theory based on those assumptions. The deductions you make are analogous to being rational. If the assumptions are non-contradicting, then it is impossible to deduce something false in the system. (Analogously, it is impossible for rationality not to win.) However, you can get a paradox by having a self-referential statement. You can prove that every sufficiently complex theory is not closed -- there are things that are true that you can't prove from within the system. Along the same lines, you can build a paradox by forcing the system to try to talk about it itself. What Grobstein has presented is a classic paradox and is the closest you can come to rationality not winning.

Your argument is equivalent to, "But what if your utility function rates keeping promises higher than a million orgasms, what then?"

The hypo is meant to be a very simple model, because simple models are useful. It includes two goods: getting home, and having $100. Any other speculative values that a real person might or might not have are distractions.

Simple models are fine as long as we don't forget they are only approximations. Rationalists should win in the real world.
Except that you mention both persons and promises in the hypothetical example, so both things factor into the correct decision. If you said that it's not a person making the decision, or that there's no promising involved, then you could discount integrity.

Right. The question of course is, "better" for what purpose? Which model is better depends on what you're trying to figure out.

I do think these problems are mostly useful for purposes of understanding and (moreso) defining rationality ("rationality"), which is perhaps a somewhat dubious use. But look how much time we're spending on it.

I very much recommend Reasons and Persons, by the way. A friend stole my copy and I miss it all the time.

5Paul Crowley14y
OK, thanks! Your friend stole a book on moral philosophy? That's pretty special!
It's still in print and readily available. If you really miss it all the time, why haven't you bought another copy?

What is it, pray tell, that Omega cannot do?

Can he not scan your brain and determine what strategy you are following? That would be odd, because this is no stronger than the original Newcomb problem and does not seem to contain any logical impossibilities.

Can he not compute the strategy, S, with the property "that at each moment, acting as S tells you to act -- given (1) your beliefs about the universe at that point and (2) your intention of following S at all times -- maximizes your net utility [over all time]?" That would be very odd, since y... (read more)

Well, for instance, he cannot make 1+1=3. And, if one defines rationality as actually winning then he cannot act in such a way that rational people lose. This is perfectly obvious; and, in case you have misunderstood what I wrote (as it looks like you have), that is the only thing I said that Omega cannot do. In the discussion of strategy S, my claim was not about what Omega can do but about what you (a person attempting to implement such a strategy) can consistently include in your model of the universe. If you are an S-rational agent, then Omega may decide to screw you over, in which case you lose; that's OK (as far as the notion of rationality goes; it's too bad for you) because S doesn't purport to guarantee that you don't lose. What S does purport to do is to arrange that, in so far as the universe obeys your (incomplete, probabilistic, ...) model of it, you win on average. Omega's malfeasance is only a problem for this if it's included in your model. Which it can't be. Hence: (Actually, I think that's not quite right. You could probably consistently expect that, provided your expectations about how he's going to to it were vague enough.) I did not claim, nor do I believe, that a regular person can compute a perfectly rational strategy in the sense I described. Nor do I believe that a regular person can play chess without making any mistakes. None the less, there is such a thing as playing chess well; and there is such a thing as being (imperfectly, but better than one might be) rational. Even with a definition of the sort Eliezer likes.

Yes, this seems unimpeachable. The missing piece is, rational at what margin? Once you are home, it is not rational at the margin to pay the $100 you promised.

This assumes no one can ever find out you didn't pay, as well. In general, though, it seems better to assume everything will eventually be found out by everyone. This seems like enough, by itself, to keep promises and avoid most lies.

It's a test case for rationality as pure self-interest (really it's like an altruistic version of the game of Chicken).

Suppose I'm purely selfish and stranded on a road at night. A motorist pulls over and offers to take me home for $100, which is a good deal for me. I only have money at home. I will be able to get home then IFF I can promise to pay $100 when I get home.

But when I get home, the marginal benefit to paying $100 is zero (under assumption of pure selfishness). Therefore if I behave rationally at the margin when I get home, I cannot keep my pr... (read more)

1Paul Crowley14y
Thank you, I too was curious. We need names for these positions; I'd use hyper-rationalist but I think that's slightly different. Perhaps a consequentialist does whatever has the maximum expected utility at any given moment, and a meta-consequentialist is a machine built by a consequentialist which is expected to achieve the maximum overall utility at least in part through being trustworthy to keep commitments a pure consequentialist would not be able to keep. I guess I'm not sure why people are so interested in this class of problems. If you substitute Clippy [] for my lift, and up the stakes to a billion lives lost later in return for two billion saved now, there you have a problem, but when it's human beings on a human scale there are good ordinary consequentialist reasons to honour such bargains, and those reasons are enough for the driver to trust my commitment. Does anyone really anticipate a version of this situation arising in which only a meta-consequentialist wins, and if so can you describe it?
Ah, thanks. I'm of the school of thought that says it is rational both to promise to pay the $100, and to have a policy of keeping promises.

Yes, you are changing the hypo. Your Omega dummy says that it is the same game as Newcomb's problem, but it's not. As VN notes, it may be equivalent to the version of Newcomb's problem that assumes time travel, but this is not the classical (or an interesting) statement of the problem.

No. The point is that you actually want to survive more than you want to win, so if you are rational about Chicken you will sometimes lose (consult your model for details). Given your preferences, there will always be some distance \epsilon before the cliff where it is rational for you to give up.

Therefore, under these assumptions, the strategy "win or die trying" seemingly requires you to be irrational. However, if you can credibly commit to this strategy -- be the kind of person who will win or die trying -- you will beat a rational player every time.

This is a case where it is rational to have an irrational disposition, a disposition other than doing what is rational at every margin.

But a person who truly cares more about winning than surviving can be utterly rational in choosing that strategy.

Why don't you accept his distinction between acting rationally at a given moment and having the disposition which it is rational to have, integrated over all time?

EDIT: er, Parfit's, that is.

Obviously we can construct an agent who does this. I just don't see a reasonably parsimonious model that does it without including a preference for getting AIDS, or something similarly crazy. Perhaps I'm just stuck.

(likewise the fairness language of the parent post)

It's impossible to add substance to "non-pathological universe." I suspect circularity: a non-pathological universe is one that rewards rationality; rationality is the disposition that lets you win in a nonpathological universe.

You need to attempt to define terms to avoid these traps.

Pathological universes are ones like: where there is no order and the right answer is randomly placed. Or where the facts are maliciously arranged to entrap in a recursive red herring where the simplest well-supported answer is always wrong, even after trying to out-think the malice. Or where the whole universe is one flawless red herring ("God put the fossils there to test your faith"). "No free lunch" demands they be mathematically conceivable. But to assert that the real universe behaves like this is to go mad.
(likewise the fairness language of the parent post)

This is a classic point and clearer than the related argument I'm making above. In addition to being part of the accumulated game theory learning, it's one of the types of arguments that shows up frequently in Derek Parfit's discussion of what-is-rationality, in Ch. 1 of Reasons and Persons.

I feel like there are difficulties here that EY is not attempting to tackle.

Quoting myself:

(though I don't see how you identify any distinction between "properties of the agent" and "decisions . . . predicted to be made by the agent" or why you care about it).

I'll go further and say this distinction doesn't matter unless you assume that Newcomb's problem is a time paradox or some other kind of backwards causation.

This is all tangential, though, I think.

Yes, all well and good (though I don't see how you identify any distinction between "properties of the agent" and "decisions . . . predicted to be made by the agent" or why you care about it). My point is that a concept of rationality-as-winning can't have a definite extension say across the domain of agents, because of the existence of Russell's-Paradox problems like the one I identified.

This is perfectly robust to the point that weird and seemingly arbitrary properties are rewarded by the game known as the universe. Your proposed red... (read more)

Hello. My name is Omega. Until recently I went around claiming to be all-knowing/psychic/whatever, but now I understand lying is Wrong, so I'm turning over a new leaf. I'd like to offer you a game. Here are two boxes. Box A contains $1,000, box B contains $1,000,000. Both boxes are covered by touch-sensitive layer. If you choose box B only (please signal that by touching box B), it will send out a radio signal to box A, which will promptly disintegrate. If you choose both boxes (please signal that by touching box A first), a radio signal will be sent out to box B, which will disintegrate it's content, so opening it will reveal an empty box. (I got the disintegrating technology from the wreck of a UFO that crashed into my barn, but that's not relevant here.) I'm afraid, if I or my gadgets detect any attempt to temper with the operation of my boxes, I will be forced to disqualify you. In case there is doubt, this is the same game I used to offer back in my deceitful days. The difference is, now the player knows the rules are enforced by cold hard electronics, so there's no temptation to try and outsmart anybody. So, what will it be?

What you give is far harder than a Newcomb-like problem. In Newcomb-like problems, Omega rewards your decisions, he isn't looking at how you reach them.

You misunderstand. In my variant, Omega is also not looking at how you reach your decision. Rather, he is looking at you beforehand -- "scanning your brain", if you will -- and evaluating the kind of person you are (i.e., how you "would" behave). This, along with the choice you make, determines your later reward.

In the classical problem, (unless you just assume backwards causation,) ... (read more)

We have such an Omega: we just refer to it differently. After all, we are used to treating our genes and our environments as definite influences on our ability to Win. Taller people tend to make more money []; Omega says "there will be $1mil in box B if you have alleles for height." If Omega makes decisions based on properties of the agent, and not on the decisions either made or predicted to be made by the agent, then Omega is no different from, well, a lot of the world. Rationality, then, might be better redefined under these observations as "making the decisions that Win whenever such decisions actually affect one's probability of Winning," though I prefer Eliezer's more general rules plus the tacit understanding that we are only including situations where decisions make a difference.

I don't think I buy this for Newcomb-like problems. Consider Omega who says, "There will be $1M in Box B IFF you are irrational."

Rationality as winning is probably subject to a whole family of Russell's-Paradox-type problems like that. I suppose I'm not sure there's a better notion of rationality.

If one defines rationality in some way that isn't about winning, your example shows that rationalists-in-such-a-sense might not win. If one defines rationality as actually winning, your example shows that there are things that even Omega cannot do because they involve logical contradiction. If one defines rationality as something like "expected winning given one's model of the universe" (for quibbles, see below), your example shows that you can't coherently carry around a model of the universe that includes a superbeing who deliberately acts so as to invalidate that model. I find all three of these things rather unsurprising. The traditional form of Newcomb's problem doesn't involve a superbeing deliberately acting so as to invalidate your model of the universe. That seems like a big enough difference from your version to invalidate inferences of the form "there's no such thing as acting rationally in grobstein's version of Newcomb's problem; therefore it doesn't make sense to use any version of Newcomb's problem in forming one's ideas about what constitutes acting rationally". I think the third definition is pretty much what Eliezer is getting at when he declares that rationalists/rationality should win. Tightening it up a bit, I think we get something like this: rationality is a strategy S such that at each moment, acting as S tells you to act -- given (1) your beliefs about the universe at that point and (2) your intention of following S at all times -- maximizes your net utility (calculated in whatever way you prefer; that is mostly not a question of rationality). This isn't quite a definition, because there might turn out to be multiple such strategies, especially for people whose initial beliefs about the universe are sufficiently crazy. But if you add some condition to the effect that S and your initial beliefs shouldn't be too unlike what's generally considered (respectively) rational and right now, there might well be a unique solution to the equations
What you give is far harder than a Newcomb-like problem. In Newcomb-like problems, Omega rewards your decisions, he isn't looking at how you reach them. This leaves you free to optimize those decisions.

"Passing out condoms increases the amount of sex but makes each sex act less dangerous. So theoretically it's indeterminant whether it increases or decreases the spread of AIDS."

Not quite -- on a rational choice model, passing out condoms may decrease or not impact the spread of AIDS (in principle), but it can't increase it. A rational actor who doesn't actively want AIDS might increase their sexual activity enough to compensate for the added safety of the condom, but they would not go further than that.

(This is different from the seatbelt case because car crashes result in costs, say to pedestrians who are struck, that are not internalized by the driver.)

3Eliezer Yudkowsky14y
In theory - I say nothing of practice - this need not be true. If people get ten times as much sexual pleasure per unit risk, they may pay out more total risk. As a general principle of resource consumption this has an official name, but I forget it.
5Paul Crowley14y
We might suppose that condom promotion has two effects: a replacement effect and an encouragement effect. So there will be instances where what would have been unsafe sex at all becomes safe sex, and some instances where no sex at all becomes safe sex. Safe sex is so vastly less likely to transmit HIV that the latter effect would have to be hundreds of times larger than the former for condom promotion to have an overall increasing effect on HIV transmission; that doesn't seem plausible to me and no evidence to support it has been presented. If you could show that condom promotion caused a lot of instances where no sex becomes unsafe sex that would change the picture, but AFAIK there's no reason or evidence to suppose that. It's pretty clear in this instance that the desire to bash the Pope-criticising liberals came first, and the arguments second.

Cambridge, MA. Rarely venture beyond Boston metro area.

However, I'll in the Pioneer Valley on Apr. 17-19, if anyone is interested in a meetup that Sunday (19th), say NoHo or Amherst.

The bus is less friendly on weekends, but I could get as far as Amherst Center (and back) without spending unduly long waiting for multiple buses.

Simply annoying from a usability point of view. It requires you to know too surely which posts you will want to vote on and which authors you'll want to know; if you care about the value of your karmic vote you'll wind up interfering with your enjoyment of the posts to preserve its value; etc.

Formalizations that take big chunks of arguments as black boxes are not that useful. Formalizations that instead map all of an argument's moving parts are very hard.

The reason that specialists learn formalizations for domain-specific arguments only is because formalizing truly general arguments[FN1] is an extremely difficult problem -- difficult to design and difficult to use. This is why mathematicians work largely in natural language, even though their arguments could (usually or always) be described in formal logic. Specialized formal languages are pos... (read more)

Totally agree -- helps if you can convince them to read Fire Upon the Deep, too. I'm not being facetious; the explicit and implicit background vocabulary (seems to) make it easier to understand the essays.

(EDIT: to clarify, it is not that I think Fire in particular must be elevated as a classic of rationality, but that it's part of a smart sci/fi tradition that helps lay the ground for learning important things. There's an Eliezer webpage about this somewhere.)

Clarity and transparency. One should be able to open the book to a page, read an argument, and see that it is right.

(Obviously this trades off against other values -- and is in some measure a deception --, but it's the kind of thing that impresses my friends.)

Load More