I think we're basically in agreement here (and I think your summary of the results is fair, I was mostly pushing back on a too-hyped tone coming from OpenAI, not from you)
Most of the heavy lifting in these proofs seem to be done by the Lean tactics. The comment "arguments to nlinarith
are fully invented by our model" above a proof which is literally the single line
nlinarith [sq_nonneg (b - a), sq_nonneg (c - b), sq_nonneg (c - a)]
makes me feel like they're trying too hard to convince me this is impressive.
The other proof involving multiple steps is more impressive, but this still feels like a testament to the power of "traditional" search methods for proving algebraic inequalities, rather than an impressive AI milestone. ...
I think it's worth distinguishing how hard it is for a lean programmer to write the solution, how hard it is to solve the math problem in the first place, and how hard it is to write down an ML algorithm that spits out the right lean tactics.
Like, even if something can be written in a compact form, there might be only a dozen of combinations of ~10 tokens that give us a correct solution like nlinarith (b- a), ..., where by token I count "nlinarith", "sq_nonneg", "b", "-", "a", etc., and the actual search space for something of length 10 is probably ~(gram...
Compare two views of "the universal prior"
Right, that's essentially what I mean. You're of course right that this doesn't let you get around the existence of nonmeasurepreserving automorphisms. I guess what I'm saying is that, if you're trying to find a prior on , you should try to think about what system of finite measurements this is idealizing, and see if you can apply a symmetry argument to those bits. Which isn't always the case! You can only apply the principle of indifference if you're actually indifferent. But if it's the case that is generated in a way where "there's no reaso...
I strongly recommend Escardó's Seemingly impossible functional programs, which constructs a function search : ((Nat -> Bool) -> Bool) -> (Nat -> Bool)
which, given a predicate on infinite bitstrings, finds an infinite bitstring satisfying the predicate if one exists. (In other words, if p : (Nat -> Bool) -> Bool
and any bitstring at all satisfies p
, then p (search p) == True
(Here I'm intending Nat
to be the type of natural numbers, of course) .
Ha, I was just about to write this post. To add something, I think you can justify the uniform measure on bounded intervals of reals (for illustration purposes, say ) by the following argument: "Measuring a real number " is obviously simply impossible if interpreted literally, containing an infinite amount of data. Instead this is supposed to be some sort of idealization of a situation where you can observe "as many bits as you want" of the binary expansion of the number (choosing another base gives the same measure). If you now apply the princ...
This seems somewhat connected to this previous argument. Basically, coherent agents can be modeled as utility-optimizers, yes, but what this really proves is that almost any behavior fits into the model "utility-optimizer", not that coherent agents must necessarily look like our intuitive picture of a utility-optimizer.
Paraphrasing Rohin's arguments somewhat, the arguments for universal convergence say something like "for "most" "natural" utility functions, optimizing that function will mean acquiring power, killing off adversaries, acquiring resources, et...
My impression from skimming a few AI ETFs is that they are more or less just generic technology ETFs with different branding and a few random stocks thrown in. So they're not catastrophically worse than the baseline "Google, Microsoft and Facebook" strategy you outlined, but I don't think they're better in any real way either.
This is really cool!
The example of inferring from the independence of and reminds me of some techniques discussed in Elements of Causal Inference. They discuss a few different techniques for 2-variable causal inference.
One of them, which seems to be essentially analogous to this example, is that if are real-valued variables, then if the regression error (i.e for some constant ) is independent of , it's highly likely that is downstream of . It sounds like factored sets (or some extension to capture continuous-valued variables) migh...
At least one explanation for the fact that the Fall of Rome is the only period of decline on the graph could be this: data becomes more scarce the further back in history you go. This has the effect of smoothing the historical graph as you extrapolate between the few datapoints you have. Thus the overall positive trend can more easily mask any short-term period of decay.
Eg see also the bronze age collapse: https://en.wikipedia.org/wiki/Late_Bronze_Age_collapse?wprov=sfla1
This argument proves that
This is definitely true, and this is an inescapable feature of any (compact) dynamical system. However, somewhat paradoxically, it's consistent with the statement that, conditional on any given (n...
This argument doesn't work because limits don't commute with integrals (including expected values). (Since practical situations are finite, this just tells you that the limiting situation is not a good model).
To the extent that the experiment with infinite bets makes sense, it definitely has EV 0. We can equip the space with a probability measure corresponding to independent coinflips, then describe the payout using naive EV maximization as a function - it is on the point and everywhere else. The expected value/integral of ...
The source of disagreement seems to be about how to compute the EV "in the limit of infinite bets". I.e given bets with a chance of winning each, where you triple your stake with each bet, the naive EV maximization strategy gives you a total expect value of , which is also the maximum achievable overall EV. Does this entail that the EV at infinite bets is ? No, because with probability one, you'll lose one of the bets and end up with zero money.
I don't find this argument for Kelly super convincing.
You can't actually bet an infinite number o
I don't wanna clutter the comments too much, so I'll add this here: I assume there was supposed to be links to the various community discussions of Why We Sleep (hackernews, r/ssc, etc), but these are just plain text for me.
(John made a post, I'll just post this here so others can find it: https://www.lesswrong.com/posts/Dx9LoqsEh3gHNJMDk/fixing-the-good-regulator-theorem)
This seems prima facie unlikely. If you're not worried about the risk of side effects from the "real" vaccine, why not just take it, too (since the efficacy of the homemade vaccine is far from certain)?. On the other hand, if you're the sort of person who worries about the side effects of a vaccine that's been through clinical trials, you're probably not the type to brew something up in your kitchen based on a recipe that you got off the internet and snort it.
This is great!
An idea which has picked up some traction in some circles of pure mathematicians is that numbers should be viewed as the "shadow" of finite sets, which is a more fundamental notion.
You start with the notion of finite set, and functions between them. Then you "forget" the difference between two finite sets if you can match the elements up to each other (i.e if there exists a bijection). This seems to be vaguely related to your thing about being invariant under permutation - if a property of a subset of positions (i.e those positions that are s...
My mom is a translator (mostly for novels), and as far as I know she exclusively translates into Danish (her native language). I think this is standard in the industry - it's extremely hard to translate text in a way that feels natural in the target language, much harder than it is to tease out subtleties of meaning from the source language.
This post introduces a potentially very useful model, both for selecting problems to work on and for prioritizing personal development. This model could be called "The Pareto Frontier of Capability". Simply put:
I hadn't, thanks!
I took the argument about the large-scale "stability" of matter from Jaynes (although I had to think a bit before I felt I understood it, so it's also possible that I misunderstood it).
I think I basically agree with Eliezer here?
...The Second Law of Thermodynamics is actually probabilistic in nature - if you ask about the probability of hot water spontaneously entering the "cold water and electricity" state, the probability does exist, it's just very small. This doesn't mean Liouville's Theorem is violated with small probability; a theorem
This may be poorly explained. The point here is that
E.g. suppose and when , and . Then is . But ...
But then shouldn't there be a natural biextensional equivalence ? Suppose , and denote . Then the map is clear enough, it's simply the quotient map. But there's not a unique map - any section of the quotient map will do, and it doesn't seem we can make this choice naturally.
I think maybe the subcategory of just "agent-extensional" frames is reflective, and then the subcategory of "environment-extensional" frames is coreflective. And there's a canonical (i.e natural) zig-zag
Does the biextensional collapse satisfy a universal property? There doesn't seem to be an obvious map either or (in each case one of the arrows is going the wrong way), but maybe there's some other way to make it universal?
What do you think about "cognitive biases as an edge"?
One story we can tell about the markets and coronavirus is this: It was not hard to come to the conclusion, by mid-to-late February, that a global COVID-19 pandemic was extremely likely, and that it was highly probable it would cause a massive catastrophe in the US. A few people managed to take this evidence seriously enough to trade on it, and made a killing, but the vast majority of the market simply didn't predict this fairly predictable course of events. Why not? Because it didn't feel like the sort...
I guess this question is really a general question about where you go for information about the market, in a general sense. Is it just reading a lot of "market news" type sites?
Thank you very much!
I guess an argument of this type rules out a lot of reasonable-seeming inference rules - if a computable process can infer "too much" about universal statements from finite bits of evidence, you do this sort of Gödel argument and derive a contradiction. This makes a lot of sense, now that I think about it.
There is also predictionbook, which seems to be a similar sort of thing.
Of course, there's also metaculus, but that's more of a collaborative prediction aggregator, not so much a personal tool for tracking your own predictions.
If anyone came across this comment in the future - the CFAR Participant Handbook is now online, which is more or less the answer to this question.
The Terra Ignota sci-fi series by Ada Palmer depicts a future world which is also driven by "slack transportation". The mechanism, rather than portals, is a super-cheap global network of autonomous flying cars (I think they're supposed to run on nuclear engines? The technical details are not really developed). It's a pretty interesting series, although it doesn't explore the practical implications so much as the political/sociological ones (and this is hardly the only thing driving the differences between the present world and the depicted future)
I think, rather than "category theory is about paths in graphs", it would be more reasonable to say that category theory is about paths in graphs up to equivalence, and in particular about properties of paths which depend on their relations to other paths (more than on their relationship to the vertices)*. If your problem is most usefully conceptualized as a question about paths (finding the shortest path between two vertices, or counting paths, or something in that genre, you should definitely look to the graph theory literature instead)
* I realize this i
...As an algebraic abstractologist, let me just say this is an absolutely great post. My comments:
Category theorists don't distinguish between a category with two objects and an edge between them, and a category with two objects and two identified edges between them (the latter object doesn't really even make sense in the usual account). In general, the extra equivalence relation that you have to carry around makes certain things more complicated in this version.
I do tend to agree with you that thinking of categories as objects, edges and an equivalence relat
...Thanks for pointing out the identified edges thing, I hadn't noticed it before. I'll update the examples once I've updated my intuition.
Also I'm glad you like it! :)
UPDATE: fixed it.
This is a reasonable way to resolve the paradox, but note that you're required to fix the max number of people ahead of time - and it can't change as you receive evidence (it must be a maximum across all possible worlds, and evidence just restricts the set of possible worlds). This essentially resolves Pascal's mugging by fixing some large number X and assigning probability 0 to claims about more than X people.
Just to sketch out the contradiction between unbounded utilities and gambles involving infinitely many outcomes a bit more explicitly.
If your probability function is unbounded, we can consider the following wager: You win 2 utils with probability 1/2, 4 utils with probability 1/4, and so on. The expected utility of this wager is infinite. (If there are no outcomes with utility exactly 2, 4, etc, we can award more - this is possible because utility is unbounded).
Now consider these wagers on a (fair) coinflip:
Information about people behaving erratically/violently is better at grabbing your brain's "important" sensor? (Noting that I had exactly the same instinctual reaction). This seems to be roughly what you'd expect from naive evopsych (which doesn't mean it's a good explanation, of course)
CFAR must have a lot of information about the efficacy of various rationality techniques and training methods (compared to any other org, at least). Is this information, or recommendations based on it, available somewhere? Say, as a list of techniques currently taught at CFAR - which are presumably the best ones in this sense. Or does one have to attend a workshop to find out?
There's some recent work in the statistics literature exploring similar ideas. I don't know if you're aware of this, or if it's really relevant to what you're doing (I haven't thought a lot about the comparisons yet), but here are some papers.
A thought about productivity systems/workflow optimization:
One principle of good design is "make the thing you want people to do, the easy thing to do". However, this idea is susceptible to the following form of Goodhart: often a lot of the value in some desirable action comes from the things that make it difficult.
For instance, sometimes I decide to migrate some notes from one note-taking system to another. This is usually extremely useful, because it forces me to review the notes and think about how they relate to each other and to the new system. If I m
...This is a great list.
The main criticism I have is that this list overlaps way too much with my own internal list of high-quality sites, making it not very useful.
The example of associativity seems a little strange, I'm note sure what's going on there. What are the three functions that are being composed?
Should there be an arrow going from n*f(n-1) to f (around n==0?) ? The output of the system also depends on n*f(n-1), not just on whether or not n is zero.
A simple remark: we don't have access to all of , only up until the current time. So we have to make sure that we don't get a degenerate pair which diverges wildly from the actual universe at some point in the future.
Maybe this is similar to the fact that we don't want AIs to diverge from human values once we go off-distribution? But you're definitely right that there's a difference: we do want AIs to diverge from human behaviour (even in common situations).
I'm curious about the remaining 3% of people in the 97% program, who apparently both managed to smuggle some booze into rehab, and then admitted this to the staff while they were checking out. Lizardman's constant?
I've noticed a sort of tradeoff in how I use planning/todo systems (having experimented with several such systems recently). This mainly applies to planning things with no immediate deadline, where it's more about how to split a large amount of available time between a large number of tasks, rather than about remembering which things to do when. For instance, think of a personal reading list - there is no hurry to read any particular things on it, but you do want to be spending your reading time effectively.
On one extreme, I make a commitment to myself to
...I've managed to implement this for computer monitors, but not for glasses. But my glasses seem to get smudged frequently enough that I need to wipe them about every day anyways. I guess I fidget with them much more than you?
I mean, "is a large part of the state space" is basically what "high entropy" means!
For case 3, I think the right way to rule out this counterexample is the probabilistic criterion discussed by John - the vast majority of initial states for your computer don't include a zero-day exploit and a script to automatically deploy it. The only way to make this likely is to include you programming your computer in the picture, and of course you do have a world model (without which you could not have programmed your computer)