All of Bunthut's Comments + Replies

That prediction may be true. My argument is that "I know this by introspection" (or, introspection-and-generalization-to-others) is insufficient. For a concrete example, consider your 5-year-old self. I remember some pretty definite beliefs I had about my future self that turned out wrong, and if I ask myself how aligned I am with it I don't even know how to answer, he just seems way too confused and incoherent.

I think it's also not absurd that you do have perfect caring in the sense relevant to the argument. This does not require that you don't make mista... (read more)


This prediction seems flatly wrong: I wouldn’t bring about an outcome like that. Why do I believe that? Because I have reasonably high-fidelity access to my own policy, via imagining myself in the relevant situations.

This seems like you're confusing two things here, because the thing you would want is not knowable by introspection. What I think you're introspecting is that if you'd noticed that the-thing-you-pursued-so-far was different from what your brother actually wants, you'd do what he actually wants. But the-thing-you-pursued-so-far doesn't play the... (read more)

I want to know whether, as a matter of falsifiable fact, I would enact good outcomes by my brother's values were I very powerful and smart. You seem to be sympathetic to the falsifiable-in-principle prediction that, no, I would not. (Is that true?) Anyways, I don't really buy this counterargument, but we can consider the following variant (from footnote 2):  "True" values: My own (which I have access to) "Proxy" values: My brother's model of my values (I have a model of his model of my values, as part of the package deal by which I have a model of him) I still predict that he would bring about a good future by my values. Unless you think my predictive model is wrong? I could ask him to introspect on this scenario and get evidence about what he would do? 

The idea is that we can break any decision problem down by cases (like "insofar as the predictor is accurate, ..." and "insofar as the predictor is inaccurate, ...") and that all the competing decision theories (CDT, EDT, LDT) agree about how to aggregate cases.

Doesn't this also require that all the decision theories agree that the conditioning fact is independent of your decision?

Otherwise you could break down the normal prisoners dilemma into "insofar as the opponent makes the same move as me" and "insofar as the opponent makes the opposite move" and con... (read more)

Would a decision theory like this count as "giving up on probabilities" in the sense in which you mean it here?

I think your assessments of whats psychologically realistic are off.

I do not know what it feels like from the inside to feel like a pronoun is attached to something in your head much more firmly than "doesn't look like an Oliver" is attached to something in your head.

I think before writing that, Yud imagined calling [unambiguously gendered friend] either pronoun, and asked himself if it felt wrong, and found that it didn't. This seems realistic to me: I've experienced my emotional introspection becoming blank on topics I've put a lot of thinking into. This... (read more)

I don't think the analogy to biological brains is quite as strong. For example, biological brains need to be "robust" not only to variations in the input, but also in a literal sense, to forceful impact or to parasites trying to control it. It intentionally has very bad suppressability, and this means there needs to be a lot of redundancy, which makes "just stick an electrode in that area" work. More generally, it is under many constraints that a ML system isn't, probably too many for us to think of, and it generally prioritizes safety over performance. Bo... (read more)

2Quintin Pope2y
Firstly, thank you for your comment. I'm always glad to have good faith engagement on this topic. However, I think you're assuming the worst case scenario in regards to interpretability. Artificial networks are often trained with randomness applied to their internal states (dropout, gradient noise, etc). These seem like they'd cause more internal disruption (and are FAR more common) than occasional impacts. Evolved resistance to parasite control seems like it should decrease interpretability, if anything. E.g., having multiple reward centers that are easily activated is a terrible idea for resisting parasite control. And yet, the brain does it anyways. No it doesn't. One of my own examples of brain interpretability was: Which is a type of suppressability. Various forms of amnesia, aphasia and other perceptual blocks are other examples of suppressability in one form or another. So more reliable ML systems will be more interpretable? Seems like a reason for optimism. Yes, it would be terrible. I think you've identified a major reason why larger neural nets learn more quickly than smaller nets. The entire point of training neural nets is that you constantly change each part of the net to better process the data. The internal representations of different circuits have to be flexible, robust and mutually interpretable to other circuits. Otherwise, the circuits won't be able to co-develop quickly. One part of Knowledge Neurons in Pretrained Transformers I didn't include (but probably should have) is the fact that transformers re-use their input embedding in their internal knowledge representations. I.e., the feed forward layers that push the network to output target tokens represent their target token using the input embeddings of the target tokens in question. This would be very surprising if you thought that the network was just trying to represent its internal states using the shortest possible encoding. However, it's very much what you'd expect if you thought

Probably way too old here, but I had multible experiences relevant to the thread.

Once I had a dream and then, in the dream, I remembered I had dreamt this exact thing before, and wondered if I was dreaming now, and everything looked so real and vivid that I concluded I was not.

I can create a kind of half-dream, where I see random images and moving sequences at most 3 seconds or so long, in succession. I am really dimmed but not sleeping, and I am aware in the back of my head that they are only schematic and vague.

I would say the backstories in dreams are d... (read more)

I think its still possible to have a scenario like this. Lets say each trader would buy or sell a certain amount when the price is below/above what they think it to be, but the transition being very steep instead of instant. Then you could still have long price intervalls where the amounts bought and sold remain constant, and then every point in there could be the market price.

I'm not sure if this is significant. I see no reason to set the traders up this way other than the result in the particular scenario that kicked this off, and adding traders who don'... (read more)

So I'm not sure what's going on with my mental sim. Maybe I just have a super-broad 'crypto-moral detector' that goes off way more often than yours (w/o explicitly labeling things as crypto-moral for me).

Maybe. How were your intuitions before you encountered LW? If you already had a hypocrisy intuition, then trying to internalize the rationalist perspective might have lead it to ignore the morality-distinction.

My father playing golf with me today, telling me to lean down more to stop them going out left so much.

Ok. My mental sim doesn't expect any backlash in this type of situation. My first thought is it's just super obvious why the advice might apply to you and not to him; but, this doesn't really seem correct. For one thing, it might not be super obvious. For another, I think there are cases where it's pretty obvious, but I nonetheless anticipate a backlash. So I'm not sure what's going on with my mental sim. Maybe I just have a super-broad 'crypto-moral detector' that goes off way more often than yours (w/o explicitly labeling things as crypto-moral for me).

I don't strongly relate to any of these descriptions. I can say that I don't feel like I have to pretend advice from equals is more helpful than it is, which I suppose means its not face. The most common way to reject advice is a comment like "eh, whatever" and ignoring it. Some nerds get really mad at this and seem to demand intellectual debate. This is not well received. Most people give advice with the expectation of intellectual debate only on crypto-moral topics (this is also not well received generally, but the speaker seems to accept that as an "identity cost"), or not at all.

You mean advice to diet, or "technical" advice once its established that person wants to diet? I don't have experience with either, but the first is definitely crypto-moral.

What's definitely not crypto-moral?

This excludes worlds which the deductive process has ruled out, so for example if  has been proved, all worlds will have either A or B. So if you had a bet which would pay $10 on A, and a bet which would pay $2 on B, you're treated as if you have $2 to spend.

I agree you can arbitrage inconsistencies this way, but it seems very questionable. For one, it means the market maker needs to interpret the output of the deductive process semantically. And it makes him go bankrupt if that logic is inconsistent. And there could be a case where a proposit... (read more)

Why is the price of the un-actualized bet constant? My argument in the OP was to suppose that PCH is the dominant hypothesis, so, mostly controls market prices.

Thinking about this in detail, it seems like what influence traders have on the market price depends on a lot more of their inner workings than just their beliefs. I was thinking in a way where each trader only had one price for the bet, below which they bought and above which they sold, no matter how many units they traded (this might contradict "continuous trading strategies" because of finite wea... (read more)

The continuity property is really important.

But now, you seem to be complaining that a method that explicitly avoids Troll Bridge would be too restrictive?

No, I think finding such a no-learning-needed method would be great. It just means your learning-based approach wouldn't be needed.

You seem to be arguing that being susceptible to Troll Bridge should be judged as a necessary/positive trait of a decision theory.

No. I'm saying if our "good" reasoning can't tell us where in Troll Bridge the mistake is, then something that learns to make "good" inferences would have to fall for it.

But there are decisi

... (read more)

So I don't see how we can be sure that PCH loses out overall. LCH has to exploit PCH -- but if LCH tries it, then we're seemingly in a situation where LCH has to sell for PCH's prices, in which case it suffers the loss I described in the OP.

So I've reread the logical induction paper for this, and I'm not sure I understand exploitation. Under 3.5, it says:

On each day, the reasoner receives 50¢ from T, but after day t, the reasoner must pay $1 every day thereafter.

So this sounds like before day t, T buys a share every day, and those shares never pay out - ot... (read more)

Again, my view may have drifted a bit from the LI paper, but the way I think about this is that the market maker looks at the minimum amount of money a trader has "in any world" (in the sense described in my other comment). This excludes worlds which the deductive process has ruled out, so for example if A∨B has been proved, all worlds will have either A or B. So if you had a bet which would pay $10 on A, and a bet which would pay $2 on B, you're treated as if you have $2 to spend. It's like a bookie allowing a gambler to make a bet without putting down the money because the bookie knows the gambler is "good for it" (the gambler will definitely be able to pay later, based on the bets the gambler already has, combined with the logical information we now know). Of course, because logical bets don't necessarily ever pay out, the market maker realistically shouldn't expect that traders are necessarily "good for it". But doing so allows traders to arbitrage logically contradictory beliefs, so, it's nice for our purposes. (You could say this is a difference between an ideal prediction market and a mere betting market; a prediction market should allow arbitrage of inconsistency in this way.)
Hm. It's a bit complicated and there are several possible ways to set things up. Reading that paragraph, I'm not sure about this sentence either. In the version I was trying to explain, where traders are "forced to sell" every morning before the day of trading begins, the reasoner would receive 50¢ from the trader every day, but would return that money next morning. Also, in the version I was describing, the reasoner is forced to set the price to $1 rather than 50¢ as soon as the deductive process proves 1+1=2. So, that morning, the reasoner has to return $1 rather than 50¢. That's where the reasoner loses money to the trader. After that, the price is $1 forever, so the trader would just be paying $1 every day and getting that $1 back the next morning. I would then define exploitation as "the trader's total wealth (across different times) has no upper bound". (It doesn't necessarily escape to infinity -- it might oscillate up and down, but with higher and higher peaks.) Now, the LI paper uses a different definition of exploitation, which involves how much money a trader has within a world (which basically means we imagine the deductive process decides all the sentences, and we ask how much money the trader would have; and, we consider all the different ways the deductive process could do this). This is not equivalent to my definition of exploitation in general; according to the LI paper, a trader 'exploits' the market even if its wealth is unbounded only in some very specific world (eg, where a specific sequence of in-fact-undecidable sentences gets proved). However, I do have an unpublished proof that the two definitions of exploitation are equivalent for the logical induction algorithm and for a larger class of "reasonable" logical inductors. This is a non-trivial result, but, justifies using my definition of exploitation (which I personally find a lot more intuitive). My basic intuition for the result is: if you don't know the future, the only way to ensure y
Now I feel like you're trying to have it both ways; earlier you raised the concern that a proposal which doesn't overtly respect logic could nonetheless learn a sort of logic internally, which could then be susceptible to Troll Bridge. I took this as a call for an explicit method of avoiding Troll Bridge, rather than merely making it possible with the right prior. But now, you seem to be complaining that a method that explicitly avoids Troll Bridge would be too restrictive? I think there is a mistake somewhere in the chain of inference from cross→−10 to low expected value for crossing. Material implication is being conflated with counterfactual implication. A strong candidate from my perspective is the inference from ¬(A∧B) to C(A|B)=0 where C represents probabilistic/counterfactual conditional (whatever we are using to generate expectations for actions). You seem to be arguing that being susceptible to Troll Bridge should be judged as a necessary/positive trait of a decision theory. But there are decision theories which don't have this property, such as regular CDT, or TDT (depending on the logical-causality graph). Are you saying that those are all necessarily wrong, due to this? I'm not sure quite what you meant by this. For example, I could have a lot of prior mass on "crossing gives me +10, not crossing gives me 0". Then my +10 hypothesis would only be confirmed by experience. I could reason using counterfactuals, so that the troll bridge argument doesn't come in and ruin things. So, there is definitely a way. And being born with this prior doesn't seem like some kind of misunderstanding/delusion about the world. So it also seems natural to try and design agents which reliably learn this, if they have repeated experience with Troll Bridge.

So your social experience is different in this respect?

I've never experienced this example in particular, but I would not expect such a backlash. Can you think of another scenario with non-moral advice that I have likely experienced?

Can you tell me anything about the "advice culture" you have experience with? For example, I've had some experience with Iranian culture, and it is very different from American culture. It's much more combative (in the sense of combat vs nurture, not necessarily real combativeness -- although I think they have a higher preference/tolerance for heated arguments as well). I was told several times that the bad thing about american culture is that if someone has a problem with you they won't tell you to your face, instead they'll still try to be nice. I sometimes found the blunt advice (criticism) from Iranians overwhelming and emotionally difficult to handle.
Diet advice?

It seems to me that this habit is universal in American culture, and I'd be surprised (and intrigued!) to hear about any culture where it isn't.

I live in Austria. I would say we do have norms against hypocrisy, but your example with the drivers license seems absurd to me. I would be surprised (and intrigued!) if agreement with this one in particular is actually universal in American culture. In my experience, hypocrisy norms are for moral and crypto-moral topics.

For normies, morality is an imposition. Telling them of new moral requirements increases how mu... (read more)

My current take is that anti-hypocrisy norms naturally emerge from micro status battles: giving advice naturally has a little undercurrent of "I'm smarter than you", and pointing out that the person is not following their own advice counters this. Therefore, a hypocrisy check naturally becomes a common response, because it's a pretty good move in status games. Therefore, people expect a hypocrisy check, and check themselves. On the one hand, I was probably blind to the moral aspect and over-generalized to some extent. On the other hand, do you really imagine me telling someone they should get a driver's license (in a context where there is common knowledge that I don't have one), and not expect a mild backlash? I expect phrases like "look who's talking" and I expect the 'energy in the room' after the backlash to be as if my point was refuted. I expect to have to reiterate the point, to show that I'm undeterred, if I still want it to be considered seriously in the conversation. (Particularly if the group isn't rationalists.) So your social experience is different in this respect?

The payoff for 2-boxing is dependent on beliefs after 1-boxing because all share prices update every market day and the "payout" for a share is essentially what you can sell it for.

If a sentence is undecidable, then you could have two traders who disagree on its value indefinitely: one would have a highest price to buy, thats below the others lowest price to sell. But then anything between those two prices could be the "market price", in the classical supply and demand sense. If you say that the "payout" of a share is what you can sell it for... well, the ... (read more)

This sounds like doing optimality results poorly. Unfortunately, there is a lot of that (EG how the different optimality notions for CDT and EDT don't help decide between them). In particular, the "don't be a stupid frequentist" move has blinded Bayesians (although frequentists have also been blinded in a different way). Solomonoff induction has a relatively good optimality notion (that it doesn't do too much worse than any computable prediction). AIXI has a relatively poor one (you only guarantee that you take the subjectively best action according to Solomonoff induction; but this is hardly any guarantee at all in terms of reward gained, which is supposed to be the objective). (There are variants of AIXI which have other optimality guarantees, but none very compelling afaik.) An example of a less trivial optimality notion is the infrabayes idea, where if the world fits within the constraints of one of your partial hypotheses, then you will eventually learn to do at least as well (reward-wise) as that hypothesis implies you can do.
Hmm. Well, I didn't really try to prove that 'physical causation' would persist as a hypothesis. I just tried to show that it wouldn't, and failed. If you're right, that'd be great! But here is what I am thinking: Firstly, yes, there is a market maker. You can think of the market maker as setting the price exactly where buys and sells balance; both sides stand to win the same amount if they're correct, because that amount is just the combined amount they've spent. Causality is a little funky because of fixed point stuff, but rather than imagining the traders hold shares for a long time, we can instead imagine that today's shares "pay out" overnight (at the next day's prices), and then traders have to re-invest if they still want to hold a position. (But this is fine, because they got paid the next day's prices, so they can afford to buy the same number of shares as they had.) But if the two traders don't reinvest, then tomorrow's prices (and therefore their profits) are up to the whims of the rest of the market. So I don't see how we can be sure that PCH loses out overall. LCH has to exploit PCH -- but if LCH tries it, then we're seemingly in a situation where LCH has to sell for PCH's prices, in which case it suffers the loss I described in the OP. Thanks for raising the question, though! It would be very interesting if PCH actually could not maintain its position. I have been thinking a bit more about this. I think it should roughly work like this: you have a 'conditional contract', which is like normal conditional bets, except normally a conditional bet (a|b) is made up of a conjunction bet (a&b) and a hedge on the negation of the condition (not-b); the 'conditional contract' instead gives the trader an inseparable pair of contracts (the a&x bet bound together with the not-b bet). Normally, the price of anything that's proved goes to one quickly (and zero for anything refuted), because traders are getting $1 per share (and $0 per share for what's been re

Because we have a “basic counterfactual” proposition for what would happen if we 1-box and what would happen if we 2-box, and both of those propositions stick around, LCH’s bets about what happens in either case both matter. This is unlike conditional bets, where if we 1-box, then bets conditional on 2-boxing disappear, refunded, as if they were never made in the first place.

I don't understand this part. Your explanation of PCDT at least didn't prepare me for it, it doesn't mention betting. And why is the payoff for the counterfactual-2-boxing determined b... (read more)

Not sure how to best answer. I'm thinking of all this in an LIDT setting, so all learning occurs through traders making bets. The payoff for 2-boxing is dependent on beliefs after 1-boxing because all share prices update every market day and the "payout" for a share is essentially what you can sell it for. Similarly, if a trader buys a share of an undecidable sentence (let's say, the consistency of PA) then the only "payoff" is whatever you can sell it for later, based on future market prices, because the sentence will never get fully decided one way or the other. My claim is: eventually, if you observe enough cases of "crossing" in similar circumstances, your expectation for "cross" should be consistent with the empirical history (rather than, say, -10 even though you've never experienced -10 for crossing). To give a different example, I'm claiming it is irrational to persist in thinking 1-boxing gets you less money in expectation, if your empirical history continues to show that it is better on average. And I claim that if there is a persistent disagreement between counterfactuals and evidential conditionals, then the agent will in fact experimentally try crossing infinitely often, due to the value-of-information of testing the disagreement (that is, this will be the limiting behavior of reduced temporal discounting, under the assumption that the agent isn't worried about traps). So the two will indeed converge (under those assumptions). The hope is that we can block the troll argument completely if proving B->A does not imply cf(A|B)=1, because no matter what predicate the troll uses, the inference from P to cf fails. So what we concretely need to do is give a version of counterfactual reasoning which lets cf(A|B) not equal 1 in some cases where B->A is proved. Granted, there could be some other problematic argument. However, if my learning-theoretic ideas go through, this provides another safeguard: Troll Bridge is a case where the agent never learns the em

are the two players physically precisely the same (including environment), at least insofar as the players can tell?

In the examples I gave yes. Because thats the case where we have a guarantee of equal policy, from which people try to generalize. If we say players can see their number, then the twins in the prisoners dilemma needn't play the same way either.

But this is one reason why correlated equilibria are, usually, a better abstraction than Nash equilibria.

The "signals" players receive for correlated equilibria are already semantic. So I'm suspicious t... (read more)

It's not something we would naively expect, but it does further speak in favor of CE, yes? In particular, if you look at those learnability results, it turns out that the "external signal" which the agents are using to correlate their actions is the play history itself. IE, they are only using information which must be available to learning agents (granted, sufficiently forgetful learning agents might forget the history; however, I do not think the learnability results actually rely on any detailed memory of the history -- the result still holds with very simple agents who only remember a few parameters, with no explicit episodic memory (unlike, eg, tit-for-tat).

Hum, then I'm not sure I understand in what way classical game theory is neater here?

Changing the labels doesn't make a difference classically.

As long as the probabilistic coin flips are independent on both sides


Do you have examples of problems with copies that I could look at and that you think would be useful to study?

No, I think you should take the problems of distributed computing, and translate them into decision problems, that you then have a solution to.

Well, if I understand the post correctly, you're saying that these two problems are fundamentally the same problem

No. I think:

...the reasoning presented is correct in both cases, and the lesson here is for our expectations of rationality...

As outlined in the last paragraph of the post. I want to convince people that TDT-like decision theories won't give a "neat" game theory, by giving an example where they're even less neat than classical game theory.

Actually it could. 

I think you're thinking about a realistic case (same algorithm, similar environment... (read more)

Hum, then I'm not sure I understand in what way classical game theory is neater here? As long as the probabilistic coin flips are independent on both sides (you also mention the case where they're symmetric, but let's put that aside for the example), then you can apply the basic probabilistic algorithm for leader election: both copies flip a coin n times to get a n-bit number, which they exchange. If the numbers are different, then the copy with the smallest one says 0 and the other says 1; otherwise they flip a coin and return the answer. With this algorithm, you have probability ≥1−12n of deciding different values, and so you can get as close as you want to 1 (by paying the price in more random bits). Do you have examples of problems with copies that I could look at and that you think would be useful to study?

The link would have been to better illustrate how the proposed system works, not about motivation. So, it seems that you understood the proposal, and wouldn't have needed it.

I don't exactly want to learn the cartesian boundary. A cartesian agent believes that its input set fully screens off any other influence on its thinking, and the outputs screen off any influence of the thinking on the world. Its very hard to find things that actually fulfill this. I explain how PDT can learn cartesian boundaries, if there are any, as a sanity/conservative extension check. But it can also learn that it controls copies or predictions of itself for example.

The apparent difference is based on the incoherent counterfactual "what if I say heads and my copy says tails"

I don't need counterfactuals like that to describe the game, only implications. If you say heads and your copy tails, you will get one util, just like how if 1+1=3, the circle can be squared.

The interesting thing here is that superrationality breaks up an equivalence class relative to classical game theory, and peoples intuitions don't seem to have incorporated this.

"The same" in what sense? Are you saying that what I described in the context of game theory is not surprising, or outlining a way to explain it in retrospect? 

Communication won't make a difference if you're playing with a copy.

Well, if I understand the post correctly, you're saying that these two problems are fundamentally the same problem, and so rationality should be able to solve them both if it can solve one. I disagree with that, because from the perspective of distributed computing (which I'm used to), these two problems are exactly the two kinds of problems that are fundamentally distinct in a distributed setting: agreement and symmetry-breaking. Actually it could. Basically all of distributed computing assumes that every process is running the same algorithm, and you can solve symmetry-breaking in this case with communication and additional constraint on the scheduling of processes (the difficulty here is that the underlying graph is symmetric, whereas if you had some form of asymmetry (like three processes in a line, such that the one in the middle has two neighbors but the others only have one), they you can use directly that asymmetry to solve symmetry-breaking. (By the way, you just gave me the idea that maybe I can use my knowledge of distributed computing to look at the sort of decision problems where you play with copies? Don't know if it would be useful, but that's interesting at least)

What is and isn't an isomorphism depends on what you want to be preserved under isomorphism. If you want everything thats game-theoretically relevant to be preserved, then of course those games won't turn out equivalent. But that doesn't explain anything. If my argument had been that the correct action in the prisoners dilemma depends on sunspot activity, you could have written your comment just as well.

It's easy to get confused between similar equivalence relations, so it's useful to formally distinguish them. See the other thread's arguing about sameness. Category theory language is relevant here because it gives a short description of your anomaly, so it may give you the tools to address it. And it is in fact unusual: For the cases of the underlying sets of a graph, group, ring, field, etc., one can find a morphism for every function. We can construct a similar anomaly for the case of rings by saying that every ring's underlying set contains 0 and 1, and that these are its respective neutral elements. Then a function that swaps 0 and 1 would have no corresponding ring morphism. The corresponding solution for your case would be to encode the structure not in the names of the elements of the underlying set, but in something that falls away when you go to the set. This structure would encode such knowledge as which decision is called heads and which tails. Then for any game and any function from its underlying set you could push the structure forward.

Right, but then, are all other variables unchanged? Or are they influenced somehow? The obvious proposal is EDT -- assume influence goes with correlation.

I'm not sure why you think there would be a decision theory in that as well. Obviously when BDT decides its output, it will have some theory about how its output nodes propagate. But the hypothesis as a whole doesn't think about influence. Its just a total probability distribution, and it includes that some things inside it are distributed according to BDT. It doesn't have beliefs about "if the output of ... (read more)

Adding other hypothesis doesn't fix the problem. For every hypothesis you can think of, theres a version of it that says "but I survive for sure" tacked on. This hypothesis can never lose evidence relative to the base version, but it can gain evidence anthropically. Eventually, these will get you. Yes, theres all sorts of considerations that are more relevant in a realistic scenario, thats not the point.

You don't need to add other hypothesis to know that there might be unknown additional hypothesis. 

The problem, as I understand it, is that there seem to be magical hypothesis you can't update against from ordinary observation, because by construction the only time they make a difference is in your odds of survival. So you can't update them from observation, and anthropics can only update in their favour, so eventually you end up believing one and then you die.

2Charlie Steiner3y
The amount that I care about this problem is proportional to the chance that I'll survive to have it.

Maybe the disagreement is in how we consider the alternative hypothesis to be? I'm not imagining a broken gun - you could examine your gun and notice it isn't, or just shoot into the air a few times and see it firing. But even after you eliminate all of those, theres still the hypothesis "I'm special for no discernible reason" (or is there?) that can only be tested anthropically, if at all. And this seems worrying.

Maybe heres a stronger way to formulate it: Consider all the copies of yourself across the multiverse. They will sometimes face situations where... (read more)

2Charlie Steiner3y
I think in the real world, I am actually accumulating evidence against magic faster than I am trying to commit elaborate suicide.

To clarify, do you think I was wrong to say UDT would play the game? I've read the two posts you linked. I think I understand Weis, and I think the UDT described there would play. I don't quite understand yours.

2Charlie Steiner3y
I agree with faul sname, ADifferentAnonymous, shminux, etc. If every single person in the world had to play russian roulette (1 bullet and 5 empty chambers), and the firing pin was broken on exactly one gun in the whole world, everyone except the person with the broken gun would be dead after about 125 trigger pulls. So if I remember being forced to pull the trigger 1000 times, and I'm still alive, it's vastly more likely that I'm the one human with the broken gun, or that I'm hallucinating, or something else, rather than me just getting lucky. Note that if you think you might be hallucinating, and you happen to be holding a gun, I recommend putting it down and going for a nap, not pulling the trigger in any way. But for the sake of argument we might suppose the only allowed hypotheses are "working gun" and "broken gun." Sure, if there are miraculous survivors, then they will erroneously think that they have the broken gun, in much the same way that if you flipped a coin 1000 times and just so happened to get all heads, you might start to think you had an unfair coin. We should not expect to be able to save this person. They are just doomed. It's like poker. I don't know if you've played poker, but you probably know that the basic idea is to make bets that you have the best hand. If you have 4 of a kind, that's an amazing hand, and you should be happy to make big bets. But it's still possible for your opponent to have a royal flush. If that's the case, you're doomed, and in fact when the opponent has a royal flush, 4 of a kind is almost the worst hand possible! It makes you think you can bet all your money when in fact you're about to lose it all. It's precisely the fact that four of a kind is a good hand almost all the time that makes it especially bad that remaining tiny amount of the time. The person who plays russian roulette and wins 1000 times with a working gun is just that poor sap who has four of a kind into a royal flush. (P.S.: My post is half explan

Another problem with this is that it isn't clear how to form the hypothesis "I have control over X".

You don't. I'm using talk about control sometimes to describe what the agent is doing from the outside, but the hypothesis it believes all have a form like "The variables such and such will be as if they were set by BDT given such and such inputs".

One problem with this is that it doesn't actually rank hypotheses by which is best (in expected utility terms), just how much control is implied.

For the first setup, where its trying to learn what it has control ov... (read more)

Right, but then, are all other variables unchanged? Or are they influenced somehow? The obvious proposal is EDT -- assume influence goes with correlation. Another possible answer is "try all hypotheses about how things are influenced."

From my perspective, Radical Probabilism is a gateway drug.

This post seemed to be praising the virtue of returning to the lower-assumption state. So I argued that in the example given, it took more than knocking out assumptions to get the benefit.

So, while I agree, I really don't think it's cruxy. 

It wasn't meant to be. I agree that logical inductors seem to de facto implement a Virtuous Epistemic Process, with attendent properties, whether or not they understand that. I just tend to bring up any interesting-seeming thoughts that are triggered during ... (read more)

Agreed. Simple Bayes is the hero of the story in this post, but that's more because the simple bayesian can recognize that there's something beyond.

Either way, we've made assumptions which tell us which Dutch Books are valid. We can then check what follows.

Ok. I suppose my point could then be made as "#2 type approaches aren't very useful, because they assume something thats no easier than what they provide".

I think this understates the importance of the Dutch-book idea to the actual construction of the logical induction algorithm. 

Well, you certainly know more about that than me. Where did the criterion come from in your view?

This part seems entirely addressed by logical induction, to me.

Quite p... (read more)

I wanted to separate what work is done by radicalizing probabilism in general, vs logical induction specifically. 

From my perspective, Radical Probabilism is a gateway drug. Explaining logical induction intuitively is hard. Radical Probabilism is easier to explain and motivate. It gives reason to believe that there's something interesting in the direction. But, as I've stated before, I have trouble comprehending how Jeffrey correctly predicted that there's something interesting here, without logical uncertainty as a motivation. In hindsight, I feel hi... (read more)

One of the most important things I learned, being very into nutrition-research, is that most people can't recognize malnutrition when they see it, and so there's a widespread narrative that it doesn't exist. But if you actually know what you're looking for, and you walk down an urban downtown and look at the beggars, you will see the damage it has wrought... and it is extensive.

Can someone recommed a way of learning to recognize this without having to spend effort on nutrition-in-general?

I think giving reasons made this post less effective. Reasons make naive!rationalist more likely to yield on this particular topic, but thats no longer a live concern, and it probably inhibits learning the general lesson.

What is actually left of Bayesianism after Radical Probabilism? Your original post on it was partially explaining logical induction, and introduced assumptions from that in much the same way as you describe here. But without that, there doesn't seem to be a whole lot there. The idea is that all that matters is resistance to dutch books, and for a dutch book to be fair the bookie must not have an epistemic advantage over the agent. Said that way, it depends on some notion of "what the agent could have known at the time", and giving a coherent account of thi... (read more)

Part of the problem is that I avoided getting too technical in Radical Probabilism, so I bounced back and forth between different possible versions of Radical Probabilism without too much signposting. I can distinguish at least three versions: 1. Jeffrey's version. I don't have a good source for his full picture. I get the sense that the answer to "what is left?" is "very little!" -- EG, he didn't think agents have to be able to articulate probabilities. But I am not sure of the details. 2. The simplification of Jeffrey's version, where I keep the Kolmogorov axioms (or the Jeffrey-Bolker axioms) but reject Bayesian updates. 3. Skyrms' deliberation dynamics. This is a pretty cool framework and I recommend checking it out (perhaps via his book The Dynamics of Rational Deliberation). The basic idea of its non-bayesian updates is, it's fine so long as you're "improving" (moving towards something good). 4. The version represented by logical induction. 5. The Shafer & Vovk version. I'm not really familiar with this version, but I hear it's pretty good. (I can think of more, but I cut myself off.) Making a broad generalization, I'm going to stick things into camp #2 above or camp #4. Theories in camp #2 have the feature that they simply assume a solid notion of "what the agent could have known at the time". This allows for a nice simple picture in which we can check Dutch Book arguments. However, it does lend itself more easily to logical omniscience, since it doesn't allow a nuanced picture of how much logical information the agent can generate. Camp #4 means we do give such a nuanced picture, such as the poly-time assumption. Either way, we've made assumptions which tell us which Dutch Books are valid. We can then check what follows. I think this understates the importance of the Dutch-book idea to the actual construction of the logical induction algorithm. The criterion came first, and the construction was finished soon after. So the hard part was the criteri

Definition (?). A non-anthropic update is one based on an observation E that has no (or a negligible) bearing on how many observers in your reference class there are.

Not what I meant. I would say anthropic information tells you where in the world you are, and normal information tell you what the world is like. An anthropic update, then, reasons about where you would be, if the world were a certain way, to update on world-level probabilities from anthropic information. So sleeping beauty with N outsiders is a purely anthropic update by my count. Big worlds ... (read more)

I have thought about this before posting, and I'm not sure I really believe in the infinite multiverse. I'm not even sure if I believe in the possibility of being an individual exception for some other sort of possibility. But I don't think just asserting that without some deeper explanation is really a solution either. We can't just assign zero probability willy-nilly.

That link also provides a relatively simple illustration of such an update, which we can use as an example:

I didn't consider that illustrative of my question because "I'm in the sleeping beauty problem" shouldn't lead to a "normal" update anyway. That said I haven't read Anthropic Bias, so if you say it really is supposed to be the anthropic update only then I guess. The definition in terms of "all else equal" wasn't very informative for me here.

To fix this issue we would need to include in your reference class whoever has the same background knowledge as

... (read more)
1Dmitriy Vasilyuk3y
Learning that "I am in the sleeping beauty problem" (call that E) when there are N people who aren't is admittedly not the best scenario to illustrate how a normal update is factored into the SSA update, because E sounds "anthropicy". But ultimately there is not really much difference between this kind of E and the more normal sounding E* = "I measured the CMB temperature to be 2.7K". In both cases we have: 1. Some initial information about the possibilities for what the world could be: (a) sleeping beauty experiment happening, N + 1 or N + 2 observers in total; (b) temperature of CMB is either 2.7K or 3.1K (I am pretending that physics ruled out other values already). 2. The observation: (a) I see a sign by my bed saying "Good morning, you in the sleeping beauty room"; (b) I see a print-out from my CMB apparatus saying "Good evening, you are in the part of spacetime where the CMB photons hit the detector with energies corresponding to 3.1K ". In either case you can view the observation as anthropic or normal. The SSA procedure doesn't care how we classify it, and I am not sure there is a standard classification. I tried to think of a possible way to draw the distinction, and the best I could come up with is: Definition (?). A non-anthropic update is one based on an observation E that has no (or a negligible) bearing on how many observers in your reference class there are. I wonder if that's the definition you had in mind when you were asking about a normal update, or something like it. In that case, the observations in 2a and 2b above would both be non-anthropic, provided N is big and we don't think that the temperature being 2.7K or 3.1K would affect how many observers there would be. If, on the other hand, N = 0 like in the original sleeping beauty problem, then 2a is anthropic.  Finally, the observation that you survived the Russian roulette game would, on this definition, similarly be anthropic or not depending on who you put in the reference class. If i

In most of the discussion from the above link, those fractions are 100% on either A or B, resulting, according to SSA, in your posterior credences being the same as your priors.

For the anthropic update, yes, but isn't there still a normal update? Where you just update on the gun not firing, as an event, rather than your existence? Your link doesn't have examples where that would be relevant either way. But if we didn't do this normal updating, then it seems like you could only learn from an obervation if some people in your reference class make the opposit... (read more)

2Dmitriy Vasilyuk3y
You have described some bizarre issues with SSA, and I agree that they are bizarre, but that's what defenders of SSA have to live with. The crucial question is: The normal updates are factored into the SSA update. A formal reference would be the formula for P(H|E) on p.173 of Anthropic Bias, which is the crux of the whole book. I won't reproduce it here because it needs a page of terminology and notation, but instead will give an equivalent procedure, which will hopefully be more transparently connected with the normal verbal statement of SSA, such as one given in That link also provides a relatively simple illustration of such an update, which we can use as an example: In this case, the reference class is not trivial, it includes N + 1 or N + 2 observers (observer-moments, to be more precise; and N = trillion), of which only 1 or 2 learn that they are in the sleeping beauty problem. The effect of learning new information (that you are in the sleeping beauty problem or, in our case, that the gun didn't fire for the umpteenth time) is part of the SSA calculation as follows: * Call the information our observer learns E (in the example above E = you are in the sleeping beauty problem) * You go through each possibility for what the world might be according to your prior. For each such possibility i (with prior probability Pi) you calculate the chance Qi of having your observations E assuming that you were randomly selected out of all observers in your reference class (set Qi = 0 if there no such observers). * In our example we have two possibilities: i = A, B, with Pi = 0.5. On A, we have N + 1 observers in the reference class, with only 1 having the information E that they are in the sleeping beauty problem. Therefore, QA = 1 / (N + 1) and similarly QB = 2 / (N + 2). * We update the priors Pi based on these probabilities, the lower the chance Qi of you having E in some possibility i, the stronger you penal

Hm. I think your reason here is more or less "because our current formalisms say so". Which is fair enough, but I don't think it gives me an additional reason - I already have my intuition despite knowing it contradicts them.

What if the game didn't kill you, it just made you sick? Would your reasoning still hold?

No. The relevant gradual version here is forgetting rather than sickness. But yes, I agree there is an embedding question here.

In that case, after every game, 1 in 6 of you die in the A scenario, and 0 in the B scenario, but in either scenario there are still plenty of "you"s left, and so SSA would say you shouldn't increase your credence in B (provided you remove your corpses from your reference class, which is perfectly fine a la Bostrom).

Can you spell that out more formally? It seems to me that so long as I'm removing the corpses from my reference class, 100% of people in my reference class remember surviving every time so far just like I do, so SSA just does normal bayesian up... (read more)

1Dmitriy Vasilyuk3y
Sure, as discussed for example here:, if there are two theories, A and B, that predict different (non-zero) numbers of observers in your reference class, then on SSA that doesn't matter. Instead, what matters is what fraction of observers in your reference class have the observations/evidence you do. In most of the discussion from the above link, those fractions are 100% on either A or B, resulting, according to SSA, in your posterior credences being the same as your priors. This is precisely the situation we are in for the case at hand, namely when we make the assumptions that: * The reference class consists of all survivors like you (no corpses allowed!) * The world is big (so there are non-zero survivors on both A and B). So the posteriors are again equal to the priors and you should not believe B (since your prior for it is low). I completely agree, it seems very strange to me too, but that's what SSA tells us. For me, this is just one illustration of serious problems with SSA, and an argument for SIA.  If your intuition says to not believe B even if you know the world is small then SSA doesn't reproduce it either. But note that if you don't know how big the world is you can, using SSA, conclude that you now disbelieve the combination small world + A, while keeping the odds of the other three possibilities the same - relative to one another - as the prior odds. So basically you could now say: I still don't believe B but I now believe the world is big. Finally, as I mentioned, I don't share your intuition, I believe B over A if these are the only options. If we are granting that my observations and memories are correct, and the only two possibilities are: I just keep getting incredibly lucky OR "magic", then with every shot I'm becoming more and more convinced in magic.

Isn't the prior probability of B the sum over all specific hypotheses that imply B?

I would say there is also a hypothesis that just says that your probability of survival is different, for no apparent reason, or only similarly stupid reasons like "this electron over there in my pinky works differently from other electrons" that are untestable for the same anthropic reasons.

Okay. So, we agree that your prior says that there's a 1/N chance that you are unkillable by Russian Roulette for stupid reasons, and you never get any evidence against this. And let's say this is independent of how much Russian Roulette one plays, except insofar as you have to stop if you die. Let's take a second to sincerely hold this prior.  We aren't just writing down some small number because we aren't allowed to write zero; we actually think that in the infinite multiverse, for every N agents (disregarding those unkillable for non-stupid reasons), there's one who will always survive Russian Roulette for stupid reasons. We really think these people are walking around the multiverse. So now let K be the base-5/6 log of 1/N. If N people each attempt to play K games of Russian Roulette (i.e. keep playing until they've played K games or are dead), one will survive by luck, one will survive because they're unkillable, and the rest will die (rounding away the off-by-one error). If N^2 people across the multiverse attempt to play 2K games of Russian Roulette, N of them will survive for stupid reasons, one of them will survive by luck, and the rest will die. Picture that set of N immortals and one lucky mortal, and remember how colossal a number N must be. Are the people in that set wrong to think they're probably immortals? I don't think they are.

Your going to have some prior on "this is safer for me, but not totally save, it actually has a 1/1000 chance of killing me." This seems no less reasonable than the no chance of killing you prior. 

If you've survived often enough, this can go arbitrarily close to 0.

I think that playing this game is the right move

Why? It seems to me like I have to pick between the theories "I am an exception to natural law, but only in ways that could also be produced by the anthropic effect" and "Its just the anthropic effect". The latter seems obviously more reasonable to me, and it implies I'll die if I play.

2Donald Hobson3y
Work out your prior on being an exception to natural law in that way. Pick a number of rounds such that the chance of you winning by luck is even smaller. You currently think that the most likely way for you to be in that situation is if you were an exception.   What if the game didn't kill you, it just made you sick? Would your reasoning still hold? There is no hard and sharp boundary between life and death. 

Sure, but with current theories, even after you've gotten an infinite amount of evidence against every possible alternative consideration, you'll still believe that youre certain to survive. This seems wrong.

Isn't the prior probability of B the sum over all specific hypotheses that imply B? So if you've gotten an arbitrarily large amount of evidence against all of those hypotheses, and you've won at Russian Roulette an arbitrarily high number of times... well, you'll just have to get more specific about those arbitrarily large quantities to say what your posterior is, right?

Hence, I (standing outside of PA) assert that (since I think PA is probably consistent) agents who use PA don't know whether PA is consistent, but, believe the world is consistent.

Theres two ways to express "PA is consistent". The first is . The other is a complicated construct about Gödel-encodings. Each has a corresponding version of "the world is consistent" (indeed,  this "world" is inside PA, so they are basically equivalent). The agent using PA will believe only the former. The Troll expresses the consistency of PA using provabilit... (read more)

If you're reasoning using PA, you'll hold open the possibility that PA is inconsistent, but you won't hold open the possibility that . You believe the world is consistent. You're just not so sure about PA.

Do you? This sounds like PA is not actually the logic you're using. Which is realistic for a human. But if PA is indeed inconsistent, and you don't have some further-out system to think in, then what is the difference to you between "PA is inconsistent" and "the world is inconsistent"? In both cases you just believe everything and its negatio... (read more)

Maybe this is the confusion. I'm not using PA. I'm assuming (well, provisionally assuming) PA is consistent. If PA is consistent, then an agent using PA believes the world is consistent -- in the sense of assigning probability 1 to tautologies, and also assigning probability 0 to contradictions. (At least, 1 to tautologies it can recognize, and 0 to contradictions it can recognize.) Hence, I (standing outside of PA) assert that (since I think PA is probably consistent) agents who use PA don't know whether PA is consistent, but, believe the world is consistent. If PA were inconsistent, then we need more assumptions to tell us how probabilities are assigned. EG, maybe the agent "respects logic" in the sense of assigning 0 to refutable things. Then It assigns 0 to everything. Maybe it "respects logic" in the sense of assigning 1 to provable things. Then it assigns 1 to everything. (But we can't have both. The two notions of "respect logic" are equivalent if the underlying logic is consistent, but not otherwise.) But such an agent doesn't have much to say for itself anyway, so it's more interesting to focus on what the consistent agent has to say for itself. And I think the consistent agent very much does not "hold open the possibility" that the world is inconsistent. It actively denies this.
Load More