All of Eigil Rischel's Comments + Replies

I mean, "is a large part of the state space" is basically what "high entropy" means!

For case 3, I think the right way to rule out this counterexample is the probabilistic criterion discussed by John - the vast majority of initial states for your computer don't include a zero-day exploit and a script to automatically deploy it. The only way to make this likely is to include you programming your computer in the picture, and of course you do have a world model (without which you could not have programmed your computer)

5Alex Flint1y
But the vast majority of initial states for a lump of carbon/oxygen/hydrogen/nitrogen atoms do not include a person programming a computer with the intention of taking over the internet. Shouldn't you apply the same logic there that you apply to the case of a computer? In fact a single zero day exploit is certainly much simpler than a full human, so aprior it's more likely for a computer with a zero day exploit to form from the void than for a computer with a competent human intent on taking over the internet to form from the void.

I think we're basically in agreement here (and I think your summary of the results is fair, I was mostly pushing back on a too-hyped tone coming from OpenAI, not from you)

Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment "arguments to nlinarith are fully invented by our model" above a proof which is literally the single line nlinarith [sq_nonneg (b - a), sq_nonneg (c - b), sq_nonneg (c - a)] makes me feel like they're trying too hard to convince me this is impressive.

The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of "traditional" search methods for proving algebraic inequalities, rather than an impressive AI milestone. ... (read more)

I think it's worth distinguishing how hard it is for a lean programmer to write the solution, how hard it is to solve the math problem in the first place, and how hard it is to write down an ML algorithm that spits out the right lean tactics.

Like, even if something can be written in a compact form, there might be only a dozen of combinations of ~10 tokens that give us a correct solution like nlinarith (b- a), ..., where by token I count "nlinarith", "sq_nonneg", "b", "-", "a", etc., and the actual search space for something of length 10 is probably ~(gram... (read more)

Yes, that's right. It's the same basic issue that leads to the Anvil Problem

Compare two views of "the universal prior"

  • AIXI: The external world is a Turing machine that receives our actions as input and produces our sensory impressions as output. Our prior belief about this Turing machine should be that it's simple, i.e. the Solomonoff prior
  • "The embedded prior": The "entire" world is a sort of Turing machine, which we happen to be one component of in some sense. Our prior for this Turing machine should be that it's simple (again, the Solomonoff prior), but we have to condition on the observation that it's complicated enough to c
... (read more)
Why wouldn't they be the same? Are you saying AIXI doesn't ask 'where did I come from?'

Right, that's essentially what I mean. You're of course right that this doesn't let you get around the existence of nonmeasurepreserving automorphisms. I guess what I'm saying is that, if you're trying to find a prior on , you should try to think about what system of finite measurements this is idealizing, and see if you can apply a symmetry argument to those bits. Which isn't always the case! You can only apply the principle of indifference if you're actually indifferent. But if it's the case that is generated in a way where "there's no reaso... (read more)

I strongly recommend Escardó's Seemingly impossible functional programs, which constructs a function search : ((Nat -> Bool) -> Bool) -> (Nat -> Bool) which, given a predicate on infinite bitstrings, finds an infinite bitstring satisfying the predicate if one exists. (In other words, if p : (Nat -> Bool) -> Bool and any bitstring at all satisfies p, then p (search p) == True

(Here I'm intending Nat to be the type of natural numbers, of course) .

Ha, I was just about to write this post. To add something, I think you can justify the uniform measure on bounded intervals of reals (for illustration purposes, say ) by the following argument: "Measuring a real number " is obviously simply impossible if interpreted literally, containing an infinite amount of data. Instead this is supposed to be some sort of idealization of a situation where you can observe "as many bits as you want" of the binary expansion of the number (choosing another base gives the same measure). If you now apply the princ... (read more)

Given any particular admissible representation of a topological space, I do agree you can generate a Borel probability measure by pushing forward the Haar measure of the digit-string space ΣN (considered as a countable product of ω copies of Σ, considered as a group with the modular-arithmetic structure of Z/|Σ|) along the representation. This construction is studied in detail in (Mislove, 2015). But, actually, the representation itself (in this case, the Cantor map) smuggles in Lebesgue measure, because each digit happens to cut the target space "in half" according to Lebesgue measure. If I postcompose, say, x↦√x after the Cantor map, that is also an admissible representation of [0,1], but it no longer induces Lebesgue measure. This works for any continuous bijection, so any absolutely continuous probability measure on [0,1] can be induced by such a representation. In fact, this is why the inverse-CDF algorithm for drawing samples from arbitrary distributions, given only uniform random bits, works. That being said, you can apply this to non-compact spaces. I could get a probability measure on R via a decimal representation, where, say, the number of leading zeros encodes the exponent in unary and the rest is the mantissa. [Edit: I didn't think this through all the way, and it can only represent real numbers ≥1. No big obstacle; post-compose x↦log(x−1).] The reason there doesn't seem to be a "correct" way to do so is that, because there's no Haar probability measure on non-compact spaces (at least, usually?), there's no digit representation that happens to cut up the space "evenly" according to such a canonical probability measure.

This seems somewhat connected to this previous argument. Basically, coherent agents can be modeled as utility-optimizers, yes, but what this really proves is that almost any behavior fits into the model "utility-optimizer", not that coherent agents must necessarily look like our intuitive picture of a utility-optimizer.

Paraphrasing Rohin's arguments somewhat, the arguments for universal convergence say something like "for "most" "natural" utility functions, optimizing that function will mean acquiring power, killing off adversaries, acquiring resources, et... (read more)

My impression from skimming a few AI ETFs is that they are more or less just generic technology ETFs with different branding and a few random stocks thrown in. So they're not catastrophically worse than the baseline "Google, Microsoft and Facebook" strategy you outlined, but I don't think they're better in any real way either.

This is really cool!

The example of inferring from the independence of and reminds me of some techniques discussed in Elements of Causal Inference. They discuss a few different techniques for 2-variable causal inference.

One of them, which seems to be essentially analogous to this example, is that if are real-valued variables, then if the regression error (i.e for some constant ) is independent of , it's highly likely that is downstream of . It sounds like factored sets (or some extension to capture continuous-valued variables) migh... (read more)

Thanks (to both of you), this was confusing for me as well.

At least one explanation for the fact that the Fall of Rome is the only period of decline on the graph could be this: data becomes more scarce the further back in history you go. This has the effect of smoothing the historical graph as you extrapolate between the few datapoints you have. Thus the overall positive trend can more easily mask any short-term period of decay.

Lsusr ran a survey here a little while ago, asking people for things that "almost nobody agrees with you on". There's a summary here

Starting over ten years ago, there were some similar posts about an "irrationality game", starting here. 

This argument proves that

  • Along a given time-path, the average change in entropy is zero
  • Over the whole space of configurations of the universe, the average difference in entropy between a given state and the next state (according to the laws of physics) is zero. (Really this should be formulated in terms of derivatives, not differences, but you get the point).

This is definitely true, and this is an inescapable feature of any (compact) dynamical system. However, somewhat paradoxically, it's consistent with the statement that, conditional on any given (n... (read more)

1Marco Discendenti3y
"conditional on any given (nonmaximal) level of entropy, the vast majority of states have increasing entropy" I don't think this statement can be true in any sense that would produce a non-symmetric behavior over a long time, and indeed it has some problem if you try to express it in a more accurate way: 1) what does "non-maximal" mean? You don't really have a single maximum, you have a an average maximum and random oscillations around it 2) the "vast majority" of states are actually little oscillations around an average  maximum value, and the downward oscillations are as frequent as the upward oscillations 3) any state of low entropy must have been reached in some way and the time needed to go from the maximum to the low entropy state should be almost equal to the time needed to go from the low entropy to the maximum: why shold it be different if the system has time symmetric laws? In your graph you take very few time to reach low entropy states from high entropy - compared to the time needed to reach high entropy again, but would this make the high-low transition look more natural or more "probable"? Maybe it would look even more innatural and improbable!

This argument doesn't work because limits don't commute with integrals (including expected values). (Since practical situations are finite, this just tells you that the limiting situation is not a good model).

To the extent that the experiment with infinite bets makes sense, it definitely has EV 0. We can equip the space with a probability measure corresponding to independent coinflips, then describe the payout using naive EV maximization as a function - it is on the point and everywhere else. The expected value/integral of ... (read more)

lim(EV(fn)) != EV(lim(fn)) Oooooh. Neat. Thank you. I guess... how do we know EV(lim(fn))=0? I don't know enough analysis anymore to remember how to prove this. [reads internet] Well, Wikipedia tells me two functions with the same values everywhere but measure 0 even if those values are +inf have the same integral, so looks good. :D 

The source of disagreement seems to be about how to compute the EV "in the limit of infinite bets". I.e given bets with a chance of winning each, where you triple your stake with each bet, the naive EV maximization strategy gives you a total expect value of , which is also the maximum achievable overall EV. Does this entail that the EV at infinite bets is ? No, because with probability one, you'll lose one of the bets and end up with zero money.

I don't find this argument for Kelly super convincing.

  • You can't actually bet an infinite number o

... (read more)
It's worse than that. The EV at infinite bets is actually ∞ even for naive EV maximization. WolframAlpha link

I don't wanna clutter the comments too much, so I'll add this here: I assume there was supposed to be links to the various community discussions of Why We Sleep (hackernews, r/ssc, etc), but these are just plain text for me.

Oh gosh, you're right... both here and on my blog. Sometimes things go wrong during translation to markdown and I don't even notice. Thanks for pointing it out, corrected.

This seems prima facie unlikely. If you're not worried about the risk of side effects from the "real" vaccine, why not just take it, too (since the efficacy of the homemade vaccine is far from certain)?. On the other hand, if you're the sort of person who worries about the side effects of a vaccine that's been through clinical trials, you're probably not the type to brew something up in your kitchen based on a recipe that you got off the internet and snort it.

This is great!

An idea which has picked up some traction in some circles of pure mathematicians is that numbers should be viewed as the "shadow" of finite sets, which is a more fundamental notion.

You start with the notion of finite set, and functions between them. Then you "forget" the difference between two finite sets if you can match the elements up to each other (i.e if there exists a bijection). This seems to be vaguely related to your thing about being invariant under permutation - if a property of a subset of positions (i.e those positions that are s... (read more)

Ooh, I like that formulation. It's cleaner - it jumps straight to numbers rather than having to extract them from counts.

My mom is a translator (mostly for novels), and as far as I know she exclusively translates into Danish (her native language). I think this is standard in the industry - it's extremely hard to translate text in a way that feels natural in the target language, much harder than it is to tease out subtleties of meaning from the source language.

This post introduces a potentially very useful model, both for selecting problems to work on and for prioritizing personal development. This model could be called "The Pareto Frontier of Capability". Simply put:

  1. By an efficient markets-type argument, you shouldn't expect to have any particularly good ways of achieving money/status/whatever - if there was an unusually good way of doing that, somebody else would already be exploiting it.
  2. The exception to this is that if only a small amount of people can exploit an opportunity, you may have a shot. So you s
... (read more)
This pointed out some things I hadn't noticed about how the post relates to comparative advantage. Thanks.

I hadn't, thanks!

I took the argument about the large-scale "stability" of matter from Jaynes (although I had to think a bit before I felt I understood it, so it's also possible that I misunderstood it).

I think I basically agree with Eliezer here?

The Second Law of Thermodynamics is actually probabilistic in nature - if you ask about the probability of hot water spontaneously entering the "cold water and electricity" state, the probability does exist, it's just very small. This doesn't mean Liouville's Theorem is violated with small probability; a theorem

... (read more)
5Steven Byrnes3y
Sure. I think even more interesting than the ratio / frequency argument is the argument that if you check whether the ice cube has coalesced, then that brings you into the system too, and now you can prove that the entropy increase from checking is, in expectation, larger than the entropy decrease from the unlikely chance that you find an ice cube. Repeat many times and the law of large numbers guarantees that this procedure increases entropy. Hence no perpetual motion. Well anyway, that's the part I like, but I'm not disagreeing with you. :-)

This may be poorly explained. The point here is that

  • is supposed to be always well-defined. So each state has a definite next state (since X is finite, this means it will eventually cycle around).
  • Since is well-defined and bijective, each is for exactly one .
  • We're summing over every , so each also appears on the list of s (by the previous point), and each also appears on the list of s (since it's in )

E.g. suppose and when , and . Then is . But ... (read more)

Having implicit closed timelike curves seems highly irregular. In such a setup it is doubtful whether stepping "advances" time. That explains that the math works out. T gives each state a future but unintuitive part is that future is guaranteed to be among the events. Most regular scenarios are open towards the future ie have future edges where causation can run away from the region. One would expect for each event to have a cause and an effect but the cause of the pastest event to be outside of the region and the effect of the most future event to be outside of the region. Having CTCs probably will not extend to any "types of dynamics that actually show up in physics"

But then shouldn't there be a natural biextensional equivalence ? Suppose , and denote . Then the map is clear enough, it's simply the quotient map. But there's not a unique map - any section of the quotient map will do, and it doesn't seem we can make this choice naturally.

I think maybe the subcategory of just "agent-extensional" frames is reflective, and then the subcategory of "environment-extensional" frames is coreflective. And there's a canonical (i.e natural) zig-zag

2Scott Garrabrant3y
You might be right, I am not sure. It looks to me like it satisfies the definition on wikipedia, which does not require that the morphism rB is unique, only that it exists.

Does the biextensional collapse satisfy a universal property? There doesn't seem to be an obvious map either or (in each case one of the arrows is going the wrong way), but maybe there's some other way to make it universal?

2Scott Garrabrant3y
I think the right way to think about biextensional collapse categorically is as a reflector.

What do you think about "cognitive biases as an edge"?

One story we can tell about the markets and coronavirus is this: It was not hard to come to the conclusion, by mid-to-late February, that a global COVID-19 pandemic was extremely likely, and that it was highly probable it would cause a massive catastrophe in the US. A few people managed to take this evidence seriously enough to trade on it, and made a killing, but the vast majority of the market simply didn't predict this fairly predictable course of events. Why not? Because it didn't feel like the sort... (read more)

Sure that could be an edge, but likely only if it doesn't happen too often. E.g. people are often averse to exiting their losing trades. I think in most markets an Intuitive trader wouldn't find an easy way to profit on that because Algorithmic traders already took care of that.
  • What are some reputable activist short-sellers?
  • Where do you go to identify Robinhood bubbles? (Maybe other than "lurk r/wallstreetbets and inverse whatever they're hyping").

I guess this question is really a general question about where you go for information about the market, in a general sense. Is it just reading a lot of "market news" type sites?

9Wei Dai4y
I'm reluctant to give out specific names because I'm still doing "due diligence" on them myself. But generally, try to find activist short-sellers who have a good track record in the past, and read/listen to some of their interviews/reports/articles to see how much sense they make. I was using but it seems that Robinhood has stopped providing the underlying data. So now I've set up a stock screener to look for big recent gains, and then check whether the stock has any recent news to justify the rally, and check places like SeekingAlpha, Reddit, and StockTwits to see what people are saying about it. Also just follow general market news because really extreme cases like Hertz will be reported. Podcasts seem to be a good source, especially ones that interview a variety of guests so I can get diverse perspectives without seeking them out myself. I currently follow "Real Vision Daily", "Macro Voices", and "What Goes Up".

Thank you very much!

I guess an argument of this type rules out a lot of reasonable-seeming inference rules - if a computable process can infer "too much" about universal statements from finite bits of evidence, you do this sort of Gödel argument and derive a contradiction. This makes a lot of sense, now that I think about it.

There is also predictionbook, which seems to be a similar sort of thing.

Of course, there's also metaculus, but that's more of a collaborative prediction aggregator, not so much a personal tool for tracking your own predictions.

If anyone came across this comment in the future - the CFAR Participant Handbook is now online, which is more or less the answer to this question.

The Terra Ignota sci-fi series by Ada Palmer depicts a future world which is also driven by "slack transportation". The mechanism, rather than portals, is a super-cheap global network of autonomous flying cars (I think they're supposed to run on nuclear engines? The technical details are not really developed). It's a pretty interesting series, although it doesn't explore the practical implications so much as the political/sociological ones (and this is hardly the only thing driving the differences between the present world and the depicted future)

I think, rather than "category theory is about paths in graphs", it would be more reasonable to say that category theory is about paths in graphs up to equivalence, and in particular about properties of paths which depend on their relations to other paths (more than on their relationship to the vertices)*. If your problem is most usefully conceptualized as a question about paths (finding the shortest path between two vertices, or counting paths, or something in that genre, you should definitely look to the graph theory literature instead)

* I realize this i

... (read more)

As an algebraic abstractologist, let me just say this is an absolutely great post. My comments:

Category theorists don't distinguish between a category with two objects and an edge between them, and a category with two objects and two identified edges between them (the latter object doesn't really even make sense in the usual account). In general, the extra equivalence relation that you have to carry around makes certain things more complicated in this version.

I do tend to agree with you that thinking of categories as objects, edges and an equivalence relat

... (read more)

Thanks for pointing out the identified edges thing, I hadn't noticed it before. I'll update the examples once I've updated my intuition.

Also I'm glad you like it! :)

UPDATE: fixed it.

This is a reasonable way to resolve the paradox, but note that you're required to fix the max number of people ahead of time - and it can't change as you receive evidence (it must be a maximum across all possible worlds, and evidence just restricts the set of possible worlds). This essentially resolves Pascal's mugging by fixing some large number X and assigning probability 0 to claims about more than X people.

I understand why this is from a theoretical perspective: if you define X as a finite number, then an "infinite" gamble with low probability can have lower expected value than a finite gamble. It also seems pretty clear that increasing X if the probability of an X-achieving event gets too low is not great. But from a practical perspective, why do we have to define X in greater detail than just "it's a very large finite number but I don't know what it is" and then compare analytically? That is to say * comparing infinite-gambles to finite gambles by analytically showing that, for large enough X, one of them is higher value than the other * comparing infinite-gambles to finite gambles by analytically showing that, for large enough X, the infinite-gamble is higher value than the finite gamble * compare finite gambles to finite gambles as normal Another way to think about this is that, when we decide to take an action, we shouldn't use the function If limX→∞EVAction A(X)>limx→∞EVAction B(X) then return Action A, Otherwise Action B because we know X is a finite number and taking the limit washes out the important of any terms that don't scale with X. Instead, we should put the decision output inside the limit, in keeping with the definition that X is just an arbitrarily large finite number: To the query, "should I Action A over Action B", outputlimx→∞(EVAction A(X)>EVAction B(X)) If we analogize Action A and Action B to wager A and wager B, we see that the ">" evaluator returns FALSE for all X larger than some value of X. Per the epsilon-delta definition of a limit, this concludes that we should not take wager A over wager B and gives us the appropriate decision. However, if we analogize Action A to "take Pacal's Mugging" and Action B to "Don't do that", we see that at some finite X, the "EV(Pascal's Mugging) > EV(No Pascal's Mugging)" function will return TRUE and always return TRUE for larger values of X. Thus we conclude that we should be Pascally mugged. A

Just to sketch out the contradiction between unbounded utilities and gambles involving infinitely many outcomes a bit more explicitly.

If your probability function is unbounded, we can consider the following wager: You win 2 utils with probability 1/2, 4 utils with probability 1/4, and so on. The expected utility of this wager is infinite. (If there are no outcomes with utility exactly 2, 4, etc, we can award more - this is possible because utility is unbounded).

Now consider these wagers on a (fair) coinflip:

  • A: Play the above game if heads, pay out 0 util
... (read more)
Yes, thanks, I didn't bother including it in the body of the post but that's basically how it goes. Worth noting that this: kind of shortcutting a bit (at least as Savage/Fishburn[0] does it; he proves indifference between things of infinite expected utility separately after proving that expected utility works when it's finite), but that is the essence of it, yes. (As for the actual argument... eh, I don't have it in front of me and don't feel like rederiving it...) [0]I initially wrote Savage here, but I think this part is actually due to Fishburn. Don't have the book in front of me right now though.
Is there a reason we can't just solve this by proposing arbitrarily large bounds on utility instead of infinite bounds? For instance, if we posit that utility is bounded by some arbitrarily high value X, then the wager can only payout values X for probabilities below 1/X. This gives the folllowing summation for the total expected value: test sum(from i=1 to i=log2(X)) (1/2^i)*2^i + sum(from i = log2(X) to i=infty) (1/2^i)*X The above, for any arbitarily large X, is clearly finite (the former term is a bounded summation and the latter term is a convergent geometric series). So, we can believe that wager B is better for any arbitrarily large bound on our utility function. This might seem unsatisfactory but for problems like it seems easier to just reject the claim that our universe can contain infinite people and instead just go with the assumption that it can contain X people, where X is an arbitrarily large but finite number.

Information about people behaving erratically/violently is better at grabbing your brain's "important" sensor? (Noting that I had exactly the same instinctual reaction). This seems to be roughly what you'd expect from naive evopsych (which doesn't mean it's a good explanation, of course)

4mako yass4y
I'd guess there weren't as many nutcases in the average ancestral climate, as there are in modern news/rumor mills. We underestimate how often it's going to turn out that there wasn't really a reason they did those things.

CFAR must have a lot of information about the efficacy of various rationality techniques and training methods (compared to any other org, at least). Is this information, or recommendations based on it, available somewhere? Say, as a list of techniques currently taught at CFAR - which are presumably the best ones in this sense. Or does one have to attend a workshop to find out?

3Eigil Rischel4y
If anyone came across this comment in the future - the CFAR Participant Handbook is now online, which is more or less the answer to this question.

There's some recent work in the statistics literature exploring similar ideas. I don't know if you're aware of this, or if it's really relevant to what you're doing (I haven't thought a lot about the comparisons yet), but here are some papers.

It is indeed relevant, I'll probably have a review of the Beckers & Halpern paper at some point (as well as their more recent extension). I'm working on essentially the same problem as them. Also thanks for the link to the Chalukpa-Perona-Eberhardt paper, I hadn't seen that one yet.

A thought about productivity systems/workflow optimization:

One principle of good design is "make the thing you want people to do, the easy thing to do". However, this idea is susceptible to the following form of Goodhart: often a lot of the value in some desirable action comes from the things that make it difficult.

For instance, sometimes I decide to migrate some notes from one note-taking system to another. This is usually extremely useful, because it forces me to review the notes and think about how they relate to each other and to the new system. If I m

... (read more)

This is a great list.

The main criticism I have is that this list overlaps way too much with my own internal list of high-quality sites, making it not very useful.

The example of associativity seems a little strange, I'm note sure what's going on there. What are the three functions that are being composed?

Should there be an arrow going from n*f(n-1) to f (around n==0?) ? The output of the system also depends on n*f(n-1), not just on whether or not n is zero.

The "n==0?" node is intended to be a ternary operator; its output is n*f(n-1) in the case where n is not 0 (and when n is 0, its output is hardcoded to 1).

A simple remark: we don't have access to all of , only up until the current time. So we have to make sure that we don't get a degenerate pair which diverges wildly from the actual universe at some point in the future.

Maybe this is similar to the fact that we don't want AIs to diverge from human values once we go off-distribution? But you're definitely right that there's a difference: we do want AIs to diverge from human behaviour (even in common situations).

I'm curious about the remaining 3% of people in the 97% program, who apparently both managed to smuggle some booze into rehab, and then admitted this to the staff while they were checking out. Lizardman's constant?

Those people may be the same 4-5% of people who give the "Yes, shapeshifting lizardmen run the world" answer in the surveys discussed in one of the earlier articles of this series.
7snog toddgrass4y
Even in the deepest darkness, there are warriors for truth.

I've noticed a sort of tradeoff in how I use planning/todo systems (having experimented with several such systems recently). This mainly applies to planning things with no immediate deadline, where it's more about how to split a large amount of available time between a large number of tasks, rather than about remembering which things to do when. For instance, think of a personal reading list - there is no hurry to read any particular things on it, but you do want to be spending your reading time effectively.

On one extreme, I make a commitment to myself to

... (read more)
I've addressed this sort of problem with a fairly ruthless "when updating my todo-lists, they always start empty rather than full of previous stuff. I have to make a constant choice to keep old things around if they still feel 'alive.'"
5Sunny from QAD4y
I used to be big on todo lists, and I always had the exact same problem. I mostly hung out on the "keeping old tasks around for too long" end of the spectrum. Now, I no longer struggle with this nearly as much. The solution turned out to be a paradigm shift that occurred when I read Nate Soare's replacing guilt series. If you aren't already familiar with it, I highly recommend reading it sometime.

I've managed to implement this for computer monitors, but not for glasses. But my glasses seem to get smudged frequently enough that I need to wipe them about every day anyways. I guess I fidget with them much more than you?

Load More