All of gwern's Comments + Replies

What are the gears of gluten sensitivity?

If you are concerned about gluten sensitivity, why not directly test for the antibodies or celiac-related genetic variants (eg 23andMe)? You can do both at home via mail for like $200 total. That information sounds much more dispositive than reducing gluten and maybe observing some effect, and given the long-term harms of problems like celiac, this is not a problem one wants to cheap out on solving.

2G Gordon Worley III14dI'll look into it. I was unaware until this comment that such testing existed. My 23andMe results show I don't have the markers so celiac disease, and SNPedia findings don't show anything likely to be related to gluten. I'll see if it's possible to find testing that might indicate non-allergic food sensitivities that isn't also bogus.
1bvbvbvbvbvbvbvbvbvbvbv14dMy several cents : It's not just about celiac disease. There are several other disorders that seem to be worsened by gluten, the first one that comes to my mind is endometriosis. The way I see it, it seems that some people are susceptible to pro inflammatory effects of gluten, whereas most are not. So if you already have an inflammatory disorder or autoimmune condition AND have this inflammatory sensitivity then gluten will worsen it. I don't know if this susceptibility to gluten's inflammatory effect is genetic, or genetically identified / identifiable. Whereas you can indeed identify celiac related antibodies using various tests.
6ChristianKl14dMany people who speak about gluten sensitivity (and other allergic symptoms) think that often even when the antibody levels don't reach levels that are detectable by antibody tests there can be still gluten sensitivity and making the experiment of reducing gluten to test whether there's an effect is more sensitive then the antibody tests. While running tests is you can rationally deal with the results can be useful, actually doing the experiment of whether reducing gluten will have a noticeable effect is worthwhile. If you do run antigen tests, do blood tests. Skin tests don't really provide useful medical data use but are easy to bill to insurance companies. When it comes to running the experiment sometimes heartrate or HRV are sensitive enough to detect allergy reactions that you wouldn't notice otherwise. The real high cost action would be "Eat only rice for a week, see if symptoms disappear, then add additional foods one-by-one to see when symptoms reappear".
"Decision Transformer" (Tool AIs are secret Agent AIs)

Rewards need not be written in natural language as crudely as "REWARD: +10 UTILONS". Something to think about as you continue to write text online.

And what of the dead? I own that I thought of myself, at times, almost as dead. Are they not locked below ground in chambers smaller than mine was, in their millions of millions? There is no category of human activity in which the dead do not outnumber the living many times over. Most beautiful children are dead. Most soldiers, most cowards. The fairest women and the most learned men – all are dead. Their bodi

... (read more)
Curated conversations with brilliant rationalists

IMO, that's shockingly cheap, and there's little reason to not do transcripts for any podcast which has a listening audience larger than "your gf and your dog" and pretensions to being more than tissue-level entertainment to be discarded after use. If a podcast is worth taking hours to do and expecting hundreds/thousands of listeners to sit through spending man-hours apiece and trying to advertise or spread it in any way, then it's almost certainly also then worth $100 to transcribe it. A transcript buys you search-engine visibility (as well as easy search... (read more)

The EMH is False - Specific Strong Evidence

There are currently high return trades (5% a month at least, possibly more) with extremely low risk (you can lose 1-2% max, probably less depending on execution).

Worth noting that a new Metaculus market estimates ~50% chance of Polymarket being a counterparty risk in some sense 2021-2022.

Article on IQ: The Inappropriately Excluded

The sample consisted of mid-level leaders from multinational private-sector companies.

This sort of pre-filtered sample suffers from issues like Berkson's paradox. For example, for those managers who have IQ>120, why are they underperforming? Perhaps for lack of leadership qualities, which they make up for on intelligence. On the flip side, for managers who have unimpressive IQs (as low as <100), why are they so successful? This is why longitudinal samples like SMPY are so much more useful when you want to talk about what high IQs are or are not go... (read more)

1Jay24dThis isn't really my field, and I see your point. The poster asked for other studies so I linked a study I'd recently seen. It's less about me endorsing the study than about trying to provide an entry point into the relevant literature.
Re: Fierce Nerds

"Fierce nerd" sounds a bit like rediscovering Eysenck's paradigm of genius: intelligence, energy, and Psychoticism (essentially, low Agreeableness).

Get your gun license

Considering how frequent mental issues are around here, this post seems to buy entirely the wrong kinds of optionality.

EDIT: oh look what's on the main page a day later

Strongly agree. I think it was Rob Wiblin (or maybe Katja Grace) who wrote a post once about how they'd investigated the statistically-most-probable ways they could die in the next decade. And the answer (given various demographic facts) turned out to be suicide. Instead of dismissing this, they took seriously the fact that some people who haven't previously considered suicide later do so (but in a bad moment, such that following through would definitively be a mistake). So they took steps to decrease their suicide risk, the way one might take steps to dec... (read more)

What will 2040 probably look like assuming no singularity?

The 3 babies from He Jiankui will be adults by then, definitely; one might quibble about how 'designer' they are, but most people count selection as 'designer' and GenPred claims to have at least one baby so far selected on their medical PGSes (unclear if they did any EDU/IQ PGSes in any way, but as I've always pointed out, because of the good genetic correlations of those with many diseases, any selection on complex diseases will naturally also boost those).

How to determine the value of optionality?

The value of optionality is defined by drawing out the decision tree for the scenarios with and without the option, doing backwards induction for the optimal strategy and estimating the value of each. (In financial option theory, you calculate the price of a literal option by simulating out all of the possible price trajectories and how you would respond to them, to figure out what would be a too cheap or too expensive price.) Because scenarios can be arbitrarily complex, no general answer is possible. If an option wouldn't be used at any state of the worl... (read more)

Agency in Conway’s Game of Life

OP said I can initialize a large chunk as I like (which I initialize to be empty aside from my constructors to avoid interfering with placing the pixels), and then the rest might be randomly or arbitrarily initialized, which is why I brought up the wall of still-life eaters to seal yourself off from anything that might then disrupt it. If his specific values don't give me enough space, but larger values do, then that's an answer to the general question as nothing hinges on the specific values.

4alexflint1moI was imagining that the goal configuration would be defined over the whole grid, so that it wouldn't be possible to satisfy the objective within the initial region, since that seems most analogous to constructing an AI in, say, a single room on Earth and having it eventually influence the overall arrangement of matter and energy in the Milky Way.
Agency in Conway’s Game of Life

My immediate impulse is to say that it ought to be possible to create the smiley face, and that it wouldn't be that hard for a good Life hacker to devise it.

I'd imagine it to go something like this. Starting from a Turing machine or simpler, you could program it to place arbitrary 'pixels': either by finding a glider-like construct which terminates at specific distances into a still, so the constructor can crawl along an x/y axis, shooting off the terminating-glider to create stable pixels in a pre-programmed pattern. (If that doesn't exist, then one could... (read more)

7Oscar_Cunningham1moThis sounds like you're treating the area as empty space, whereas the OP specifies that it's filled randomly outside the area where our AI starts.
Self-Predicting Markets

To update on this: Hertz stock is now worth $5-8 as it comes out of bankruptcy. I hope OP didn't short it, because he would've lost his shorts based on his belief that EMH is false and he's smarter than the markets.

Challenge: know everything that the best go bot knows about go

An even more pointed example: chess endgame tables. What does it mean to 'fully understand' it beyond understanding the algorithms which construct them, and is it a reasonable goal to attempt to play chess endgames as well as the tables?

4paulfchristiano1moIf you have a "lazy" version of the goal, like "have a question-answerer that can tell you anything the model knows" or "produce a locally human-legible but potentially giant object capturing everything the model knows" then chess endgame tables are a reasonably straightforward case ("position X is a win for white").
[link] If something seems unusually hard for you, see if you're missing a minor insight

This reminds me of pg:

If you think something's supposed to hurt, you're less likely to notice if you're doing it wrong. That about sums up my experience of graduate school.

"How To Do What You Love"

(Of course, there's a certain aspect of learned-helplessness here: because so many things are terrible, people often assume that something is just another broken malicious tool or workflow, when it's quite the opposite.)

But really the single most important way to learn to use a search engine is this: Know people who are better at using search engines than

... (read more)
gwern's Shortform

2-of-2 escrow: what is the exploding Nash equilibrium? Did it really originate with NashX? I've been looking for the history & real name of this concept for years now and have failed to refind it. Anyone?

gwern's Shortform

Humanities satirical traditions: I always enjoy the CS/ML/math/statistics satire in the annual SIGBOVIK and Ig Nobels; physics has Arxiv April Fools papers (like "On the Impossibility of Supersized Machines") & journals like Special Topics; and medicine has the BMJ Christmas issue, of course.

What are the equivalents in the humanities, like sociology or literature? (I asked a month ago on Twitter and got zero suggestions...)

gwern's Shortform

Normalization-free Bayes: I was musing on Twitter about what the simplest possible still-correct computable demonstration of Bayesian inference is, that even a middle-schooler could implement & understand. My best candidate so far is ABC Bayesian inference*: simulation + rejection, along with the 'possible worlds' interpretation.

Someone noted that rejection sampling is simple but needs normalization steps, which adds complexity back. I recalled that somewhere on LW many years ago someone had a comment about a Bayesian interpretation where you don't nee... (read more)

4Wei_Dai2moDoing another search, it seems I made at least one comment that is somewhat relevant, although it might not be what you're thinking of: []
1eigen2moFunny that you have your great LessWrong whale as I do, and that you recall that it may be from Wei Dai as well (while him not recalling)
Gradations of Inner Alignment Obstacles

I claim that if we're clever enough, we can construct a hypothetical training regime T' which trains the NN to do nearly or exactly the same thing on T, but which injects malign behavior on some different examples. (Someone told me that this is actually an existing area of study; but, I haven't been able to find it yet.)

I assume they're referring to data poisoning backdoor attacks like or or

How can we increase the frequency of rare insights?

Have you looked at the "incubation effect"?

3eigen2moA talk given by Rogen Penrose is apt here: The Problem of Modelling the Mathematical Mind []. He tries to define how the mind of a sufficiently good mathematician may work with emphasis on parallelization of mathematical solutions. And an interesting book may be The Mathematician's Mind: The Psychology of Invention in the Mathematical Field by Jacques Hadamard.
Parameter count of ML systems through time?

It's not the numerical precision but the model architecture being sparse such that you only active a few experts at runtime, and only a small fraction of the model runs for each input. It may be 1.3t parameters or whatever, but then at runtime, only, I dunno, 20b parameters actually compute anything. This cheapness of forward passes/inferencing is the big selling point of MoE for training and deployment: that you don't actually ever run 1.3t parameters. But it's hard for parameters which don't run to contribute anything to the final result, whereas in GPT-... (read more)

LessWrong help desk - free paper downloads and more

If Reddit falls through, email me and I can order a scan for you. (Might want to delete your duplicate comments here too.) EDIT: ordered a scan

Parameter count of ML systems through time?

You should probably also be tracking kind of parameter. I see you have Switch and Gshard in there, but, as you can see in how they are visibly outliers, MoEs (and embeddings) use much weaker 'parameters', as it were, than dense models like GPT-3 or Turing-NLG. Plotting by FLOPS would help correct for this - perhaps we need graphs like training-FLOPS per parameter? That would also help correct for comparisons across methods, like to older architectures such as SVMs. (Unfortunately, this still obscures that the key thing about Transformers is better scaling laws than RNNs or n-grams etc, where the high FLOPS-per-parameter translates into better curves...)

1Jsevillamol2moThank you for the feedback, I think what you say makes sense. I'd be interested in seeing whether we can pin down exactly in what sense are Switch parameters "weaker". Is it because of the lower precision? Model sparsity (is Switch sparse on parameters or just sparsely activated?)? What do you think, what typology of parameters would make sense / be useful to include?
March 2021 newsletter

"'Nash equilibrium strategy' is not necessarily synonymous to 'optimal play'. A Nash equilibrium can define an optimum, but only as a defensive strategy against stiff competition. More specifically: Nash equilibria are hardly ever maximally exploitive. A Nash equilibrium strategy guards against any possible competition including the fiercest, and thereby tends to fail taking advantage of sub-optimum strategies followed by competitors. Achieving maximally exploitive play generally requires deviating from the Nash strategy, and allowing for defensive leaks in one's own strategy."

2020 AI Alignment Literature Review and Charity Comparison

That's interesting. I did see YC listed as a major funding source, but given Sam Altman's listed loans/donations, I assumed, because YC has little or nothing to do with Musk, that YC's interest was Altman, Paul Graham, or just YC collectively. I hadn't seen anything at all about YC being used as a cutout for Musk. So assuming the Guardian didn't screw up its understanding of the finances there completely (the media is constantly making mistakes in reporting on finances and charities in particular, but this seems pretty detailed and specific and hard to get... (read more)

The best frequently don't rise to the top

One of the most interesting media experiments I know of is the Yahoo Media experiments:

  1. "Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market", Salganik et al 2006:

    We investigated this paradox experimentally, by creating an artificial ‘‘music market’’ in which 14,341 participants downloaded previously unknown songs either with or without knowledge of previous participants’ choices. Increasing the strength of social influence increased both inequality and unpredictability of success. Success was also only partly determi

... (read more)
2adamzerner3moThis is fantastic. Thank you!
The EMH is False - Specific Strong Evidence

I knew someone was going to ask that. Yes, it's impure indexing, it's true. The reason is the returns to date on the whole-world indexes have been lower, the expense is a bit higher, and after thinking about it, I decided that I do have a small opinion about the US overperforming (mostly due to tech/AI and a general sense that people persistently underestimate the US economically) and feel pessimistic about the rest of the world. Check back in 20 years to see how that decision worked out...

1Rune3moThanks. I'm just trying to understand what people's reasons are. My portfolio is also US-weighted more than market caps would dictate.
Against evolution as an analogy for how humans will create AGI

As described above, I expect AGI to be a learning algorithm—for example, it should be able to read a book and then have a better understanding of the subject matter. Every learning algorithm you’ve ever heard of—ConvNets, PPO, TD learning, etc. etc.—was directly invented, understood, and programmed by humans. None of them were discovered by an automated search over a space of algorithms. Thus we get a presumption that AGI will also be directly invented, understood, and programmed by humans.

For a post criticizing the use of evolution for end to end ML, t... (read more)

7Richard_Ngo3moI personally found this post valuable and thought-provoking. Sure, there's plenty that it doesn't cover, but it's already pretty long, so that seems perfectly reasonable. I particularly I dislike your criticism of it as strawmanish. Perhaps that would be fair if the analogy between RL and evolution were a standard principle in ML. Instead, it's a vague idea that is often left implicit, or else formulated in idiosyncratic ways. So posts like this one have to do double duty in both outlining and explaining the mainstream viewpoint (often a major task in its own right!) and then criticising it. This is most important precisely in the cases where the defenders of an implicit paradigm don't have solid articulations of it, making it particularly difficult to understand what they're actually defending. I think this is such a case. If you disagree, I'd be curious what you consider a non-strawmanish summary of the RL-evolution analogy. Perhaps Clune's AI-GA paper? But from what I can tell opinions of it are rather mixed, and the AI-GA terminology hasn't caught on.

Thanks for all those great references!

My current thinking is: (1) Outer-loop meta-learning is slow, (2) Therefore we shouldn't expect to get all that many bits of information out of it, (3) Therefore it's a great way to search for parameter settings in a parameterized family of algorithms, but not a great way to do "the bulk of the real design work", in the sense that programmers can look at the final artifact and say "Man, I have no idea what this algorithm is doing and why it's learning anything at all, let alone why it's learning things very effectively... (read more)

4adamShimi3moJust wanted to say that this comment made me add a lot of things on my reading list, so thanks for that (but I'm clearly not well-read enough to go into the discussion).
Thirty-three randomly selected bioethics papers
Promoted by Raemon

This was exactly what I expected. The problem with the field of bioethics has never been the papers being 100% awful, but how it operates in the real world, the asymmetry of interventions, and what its most consequential effects have been. I would have thought 2020 made this painfully clear. (That is, my grandmother did not die of coronavirus while multiple highly-safe & highly-effective vaccines sat on the shelf unused, simply because some bioethicist screwed up a p-value in a paper somewhere. If only!)

The actual day-to-day churn of publishing bioethics papers/research... Well, HHGttG said it best in describing humans in general:

Mostly Harmless.

The EMH is False - Specific Strong Evidence

I haven't heard that claim before. My understanding was that such a claim would be improbable or cherrypicking of some sort, as a priori risk-adjusted etc returns should be similar or identical but by deliberately narrowing your index, you do predictably lose the benefits of diversification. So all else equal (such as fees and accessibility of making the investment), you want the broadest possible index.

The EMH is False - Specific Strong Evidence

Since we're discussing EMH and VTSAX, seems as good a place to add a recent anecdote:

Chatting with someone, investments came up and they asked me where I put mine. I said 100% VTSAX. Why? Because I think the EMH is as true as it needs to be, I don't understand why markets rise and fall when they do even when I think I'm predicting future events accurately (such as, say, coronavirus), and I don't think I can beat the stock markets, at least not without investing far more effort than I care to. They said they thought it wasn't that hard, and had (unlike me) ... (read more)

1Srdjan Miletic2moOut of curiosity, why a US tracker fund instead of a global one like FTSE all-world?
4emanuele ascani3moWhy not S&P500?
3jmh3moI think this points to two very important things about investing and trading regardless of EMH. 1. psychology of the person 2. We tend to note the loses we avoided (that is the money we kept) much higher than the gains we missed (the money we actually lost by not playing in the game). Unless someone has a good plan for how to manage and overcome those two aspects of their own mind I suspect they will find it difficult to ever commit to any investing or trading program/strategy. It will not take too much to push them back into the behavior reflected in your comments.
3Rune3moIs there a reason you only invest in the US stock market and not the whole world (VTWAX)? Or is VTSAX good enough and it's not worth the effort to decide whether you should globally diversify and in what proportion?
4deluks9173moI think you need to follow good practices. Getting out with no plan is not following good practices. I will write up my opinion on best practices soon. Though it is nothing too crazy.
What's a good way to test basic machine learning code?

ALE is doubtless the Atari Learning Environment. I've never seen an 'ALE' in DRL discussions which refers to something else.

1Kenny3moThanks! Of course you would know :)
[AN #142]: The quest to understand a network well enough to reimplement it by hand

It is quite possible that CLIP “knows” that the image contains a Granny Smith apple with a piece of paper saying “iPod”, but when asked to complete the caption with a single class from the ImageNet classes, it ends up choosing “iPod” instead of “Granny Smith”. I’d caution against saying things like “CLIP thinks it is looking at an iPod”; this seems like too strong a claim given the evidence that we have right now.

Yes, it's already been solved. These are 'attacks' only in the most generous interpretation possible (since it does know the difference), and ... (read more)

6rohinmshah3moAh excellent, thanks for the links. I'll send the Twitter thread in the next newsletter with the following summary:

Harper's has a new article on meditation which delves into some of these issues. It doesn't mention PNSE or Martin by name, but some of the mentioned results parallel them, at least:

...Compared with an eight-person control group, the subjects who meditated for more than thirty minutes per day experienced shallower sleep and woke up more often during the night. The more participants reported meditating, the worse their sleep became... A 2014 study from Carnegie Mellon University subjected two groups of participants to an interview with openly hostile eval

... (read more)
2Davidmanheim3moFor betting markets, the market maker may need to manage the odds differently, and for prediction markets, it's because otherwise you're paying people in lower brier scores for watching the games, rather than being good predictors beforehand. (The way that time-weighted brier scores work is tricky - you could get it right, but in practice it seems that last minute failures to update are fairly heavily penalized.)
Resolutions to the Challenge of Resolving Forecasts

In such cases, perhaps the rules would be to pick a probability based on the resolution of past games - with the teams tied, it resolves at 50%, and with one team up by 3 runs in the 7th inning, it resolves at whatever percentage of games where a team is up by 3 runs at that point in the game wins.

Sounds like Pascal's problem of the points, where the solution is to provide the expected value of winnings, and not merely allocate all winnings to which player has the highest probability of victory. Suppose 1 team has 51% probability of winning - should the... (read more)

2Davidmanheim3moYes, that was exactly what I was thinking of, but 1) I didn't remember the name, and 2) I wanted a concrete example relevant to prediction markets. And I agree it's hard to estimate in general, but the problem can still be relevant in many cases - which is why I used my example. In the baseball game, if the market closes before the game begins - we don't have a model as good as the market, but once the game is 7/9th complete, we can do better than the pre-game market prediction.
February 2021 newsletter

No; I've only seen the first season of AoT, if there are armored trains in the rest I am unaware of that. It's actually from someone on either DSL or Naval Gazing, I think, linking to a short history of Zaamurets which is patchy but interesting in its own right.

The average North Korean mathematician

To noodle a bit more about tails coming apart: asymptotically, no matter how large r, the probability of a 'double max' (a country being the top/max on variable A correlated r with variable B also being top/max on B) decreases to 1/n. The decay is actually quite rapid, even with small samples you need r>0.9 to get anywhere.

A concrete example here: you can't get 100%, but let's say we only want a 50% chance of a double-max. And we're considering just a small sample like 192 (roughly the number of countries in the world, depending on how you count). What ... (read more)

The average North Korean mathematician

The tails coming apart is "Nigeria has the best Scrabble players in the world, but the persons with the richest English vocabulary in the world are probably not Nigerian"

No. The tails coming apart here would be "gameplaying of game A correlates with national variable B but the top players of game A are not from the top country on variable B".

I say it's borderline circular because while they aren't the same explanation, they can be made trivially the same depending on how you shuffle your definitions to save the appearances. For example, consider the hyp... (read more)

The average North Korean mathematician

There are many countries besides Nigeria where English is an official language, elite language, or widely taught. And language proficiency apparently has little to do with Scrabble success at pro levels where success depends on memorizing an obsolete dictionary's words (apparently even including not really-real words, to the point where I believe someone won the French Scrabble world championship or something without knowing any French beyond the memorized dictionary words).

Above the Narrative

I assume you're referring to the 'vault' thing WP mentions there as "Recently credited by Alan Sherman"? Then no, Chaum is irrelevant to Satoshi except inasmuch as his Digicash was a negative example to the cryptopunks about the vulnerability of trusted third parties & centralization to government interference & micromanagers (some of whom, like Szabo, worked for him). The vault thing didn't inspire Satoshi because it inspired no one; if it had, it wouldn't need any Alan Sherman to dig it up in 2018. You will not find it cited in the Bitcoin whitep... (read more)

2cousin_it3moHuh, interesting. Indeed it seems post hoc-ed in this case, I should've looked at bitcoin's history more closely before making a confident statement. Thanks!
The average North Korean mathematician

While the greater male variance hypothesis, and tail effects in general, are always interesting, I'm not sure if it's too illuminating here. It is not surprising that there are some weird outliers at the top of the IMO list, 'weird' in the sense of 'outperforming' what you'd expect given some relevant variable like GDP, intellectual freedom, HDI index, national IQ, or whatever. That's simply what it means for the correlation between IMO scores & that variable to be <1. If the IMO list was an exact rank-order correspondence, then the correlation woul... (read more)

2gwern3moTo noodle a bit more about tails coming apart: asymptotically [], no matter how large r, the probability of a 'double max' (a country being the top/max on variable A correlated r with variable B also being top/max on B) decreases to 1/n. The decay is actually quite rapid, even with small samples you need r>0.9 to get anywhere. A concrete example here: you can't get 100%, but let's say we only want a 50% chance of a double-max. And we're considering just a small sample like 192 (roughly the number of countries in the world, depending on how you count). What sort of r do we need? We turn out to need r ~ 0.93! There are not many correlations like that in the social sciences (not even when you are taking multiple measurements of the same construct). -------------------------------------------------------------------------------- Some R code to Monte Carlo estimates of the necessary r for n = 1-193 & top-p = 50%: p_max_bivariate_montecarlo <- function(n,r,iters=60000) { library(MASS) # for 'mvrnorm' mean(replicate(iters, { sample <- mvrnorm(n, mu=c(0,0), Sigma=matrix(c(1, r, r, 1), nrow=2)) which.max(sample[,1])==which.max(sample[,2]) })) } find_r_for_topp <- function(n, topp_target=0.5) { r_solver <- function(r) { topp <- p_max_bivariate_montecarlo(n, r) return(abs(topp_target - topp)) } optim(0.925, r_solver, method="Brent", lower=0, upper=1)$par } library(parallel); library(plyr) # parallelism rs <- ldply(mclapply(2:193, find_r_for_topp))$V1 # c(0.0204794413, 0.4175067131, 0.5690806174, 0.6098019663, 0.6994770020, 0.7302042200, 0.7517989571, 0.7652371794, 0.7824824776, 0.7928299227, 0.7911903664, 0.8068905240, 0.8177673342, 0.8260679686, 0.8301939461, 0.8258472869, 0.8314810573, 0.8457114147, 0.8477265340, 0.8599239760, 0.8541010795, 0.8539345369, 0.8578597015, 0.8581440013, 0.8584451493, 0.8612079626, 0.8640382310, 0
3ChristianKl3moIt might not just be specialization on Scrabble. English is the official language in Nigeria. I think it's plausible that Nigerian elite English education focuses more strongly on learning a lot of words then US English education.
6Elmer of Malmesbury3moAs far as I understand, the tails coming apart and the moment attribution are two different, superimposed problems. The tails coming apart is "Nigeria has the best Scrabble players in the world, but the persons with the richest English vocabulary in the world are probably not Nigerian". The moment attribution is "the best Scrabble players in the world are Nigerian, but Nigerians are probably not the best Scrabble players in the world". In the first case, we are talking about the distribution of country scores for two correlated variables, in the second we are talking about the distribution of individuals within a country for a single variable. Also, thank you for bringing up Nigerian Scrabble, that would have made a somehow funnier example than NK's math olympiads.
1Oskar Mathiasen4moThe comparison to chess is maybe more accurate than you think. See stuff like: Beginnings: The first IMO was held in Romania [] in 1959. It was initially founded for eastern European [] member countries of the Warsaw Pact [], under the USSR [] bloc of influence, but later other countries participated as well.[2] [] (source [] Also classic geometry is (to my knowledge) taught more generally in many eastern European countries (and make up 1/6-1/3 of the imo). Also the note about incentives being larger in North Korea also applies to much of eastern Europa to a lesser degree, where qualifying for imo is seemingly enough to get access to any university (source: Sankt Petersberg university gave an open offer at Baltic Way (a regional math competition), and i know someone who used something like that to get into Moscow university) (Romania, Serbia, Poland, Russia, Ukraine Hungary are the eastern european countries with consistently good results)
Fun with +12 OOMs of Compute

One man's a priori is another man's a posteriori, one might say; there are many places one can acquire informative priors... Learning 'tacit knowledge' can be so fast as to look instantaneous. An example here would be OA's Dactyl hand: it learns robotic hand manipulation in silico, using merely a model simulating physics, with a lot of randomization of settings to teach it to adapt on the fly, to whatever new model it finds itself in. This enables it to, without ever once training on an actual robot hand (only simulated ones), successfully run on an actual... (read more)

Takeaways from one year of lockdown

I was recently tracking down a reference in the Sequences and found that the author was so afraid of COVID that he failed to seek medical care for appendicitis and died of sepsis.

Wow! Who was that?

and the faint but pretty smell of vanilla.

I think you mean "...and a presumption that once our eyes watered." (As time passes, this is increasingly how I feel about my grandmother dying of coronavirus.)

Justin Corwin (obituary, LW account), quoted in this post. I'm sorry about your grandmother. And about Justin, and that death exists in general :(

Mentorship, Management, and Mysterious Old Wizards

Michael Nielsen calls something similar "volitional philanthropy", with some examples.

What happens to variance as neural network training is scaled? What does it imply about "lottery tickets"?

'Variance' is used in an amusing number of ways in these discussions.You use 'variance' in one sense (the bias-variance tradeoff), but "Explaining Neural Scaling Laws", Bahri et al 2021 talks about a difference kind of variance limit in scaling, while "Learning Curve Theory", Hutter 2001's toy model provides statements on yet others kinds of variances about scaling curves themselves (and I think you could easily dig up a paper from the neural tangent kernel people about scaling approximating infinite width models which only need to make infinitesimally sma... (read more)

Meetup Notes: Ole Peters on ergodicity

So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters' general idea, but isn't just log-wealth maximization?

Yes. As I've pointed out before, a lot of these problems go away if you simply solve the actual problem instead of a pseudo-problem. Decision theory, and Bayesian decision theory, has no problem with multi-step processes, like POMDPs/MDPs - or at least, I have yet to see anyone explain what, if anything, of Peters/Taleb's 'criticisms' of expected-value goes away i... (read more)

4abramdemski4moI like the "Bellman did it better" retort ;p FWIW, I remain pretty firmly in the expected-utility camp; but I'm quite interested in looking for cracks around the edges, and exploring possibilities. I agree that there's no inherent decision-theory issue with multi-step problems (except for the intricacies of tiling issues!). However, the behavior of Bayesian agents with utility linear in money, on the Kelly-betting-style iterated investment game, for high number of iterations, seems viscerally wrong. I can respect treating it as a decision-theoretic counterexample, and looking for decision theories which don't "make that mistake". I'm interested in seeing what the proposals look like.
Load More