gwern

gwern's Comments

Under what circumstances is "don't look at existing research" good advice?

Since you mention physics, it's worth noting Feynman was a big proponent of this for physics, and seemed to have multiple reasons for it.

Minicamps on Rationality and Awesomeness: May 11-13, June 22-24, and July 21-28

If you have relatively few choices and properties are correlated (as of course they are), I'm not sure how much it matters. I did a simulation of this for embryo selection with n=10, and partially randomized the utility weights made little difference.

Planned Power Outages

(Quite a lot is public outside Google, I've found. It's not necessarily easy to find, but whenever I talk to Googlers or visit, I find out less than I expected. Only a few things I've been told genuinely surprised me, and honestly, I suspected them anyway. Google's transparency is considerably underrated.)

Why the tails come apart

Heh. I've sometimes thought it'd be nice to have a copy of Eureqa or the other symbolic tools, to feed the Monte Carlo results into and see if I could deduce any exact formula given their hints. I don't need exact formulas often but it's nice to have them. I've noticed people can do apparently magical things with Mathematica in this vein. All proprietary AFAIK, though.

Why the tails come apart

You can simulate it out easily, yeah, but the exact answer seems more elusive. I asked on CrossValidated whether anyone knew the formula for 'probability of the maximum on both variables given a r and n', since it seems like something that order statistics researchers would've solved long ago because it's interesting and relevant to contests/competitions/searches/screening, but no one's given an answer yet.

Bíos brakhús

PDFs support hyperlinks: they can define anchors at arbitrary points within themselves for a hyperlink, and they can hyperlink out. You can even specify a target page in a PDF which doesn't define any usable anchors (which is dead useful and I use it all the time in references): eg https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/pdf_open_parameters.pdf#page=5

So I guess the issue here is having a tool which parses and edits PDFs to insert hyperlinks. That's hard. Even if you solve the lookup problem by going through something like Semantic Scholar (the way I use https://ricon.dev/ on gwern.net for reverse citation search), PDFs aren't made for this: when you look at a bit of text which is the name of a book or paper, it may not even be text, it may just be an image... Plus, your links will die. You shouldn't trust any of those sites to stay up long-term at the exact URLs they are at.

Is there a website for tracking fads?

Calling something a 'fad' has many of the same problems as calling something a 'bubble'. It's an invitation to selective reasoning. As Sumner likes to point out, most of the things which get called a 'bubble' never turn out to be that, it was just an insult and then a bunch of cherrypicked examples and flexible reasoning (think of all the people who called Bitcoin a bubble when it collapsed to a price far higher than when they called it a bubble).

I think you could get something more useful from a more neutral formulation, like specific cultural artifacts. So 2 recent relevant papers which come to mind would be https://www.nature.com/articles/s41467-019-09311-w http://barabasi.com/f/995.pdf . You could do a post hoc analysis and operationalize 'fad' as anything which rose and fell with a sufficiently steep average slope. (Obviously, anything which rises rapidly and then never decays, or only slowly decays, doesn't match what anyone would think of as a 'fad', or rises slowly and decays slowly etc.)

Reading list: Starting links and books on studying ontology and causality

I recently read gwern's excellent "Why Correlation Usually ≠ Causation" notes, and, like any good reading, felt a profound sense of existential terror that caused me to write up a few half-formed thoughts on it. How exciting!

It may not be directly related, but I'd like to highlight that I just added a new section contextualizing the essay as a whole and explaining how it connects to the rest of my beliefs about correlation & causality: https://www.gwern.net/Causality#overview-the-current-situation

As for the broader question of where do our ontologies come from: I'd take a pragmatic point of view and point out that they must have evolved like the rest of us because thinking is for actions.

Is there a website for tracking fads?

How would you define 'fad' in an objective and non-pejorative way?

What are some non-purely-sampling ways to do deep RL?

You mean stuff like model-predictive control and planning? You can use backprop to do gradient ascent over a sequence of actions if you have a differentiable environment and/or reward model. This also has a lot of application to image CNNs: reversing GANs to encode an image for editing, optimizing to maximize a particular class (like maximally 'dog' or 'NSFW' images) etc. I cover some of the uses and history in https://www.gwern.net/Faces#reversing-stylegan-to-control-modify-images

My most recent suggestion in this vein was about OA/Christiano's preference learning. From my notes:

It's interesting how all my preference learning PPO runs for both music & poetry keep diverging into highly repetitive adversarial instances like a GAN mode collapse.

While waiting for the PPO, I began to wonder: why couldn't one remoe the blackbox optimizers entirely and optimizing directly? It seems like a waste to go to all this effort to build a differentiable surrogate (reward) model of the environment (the human), and then treat it like just another blackbox. But it's not, that's the whole point of preference learning! Since GPT-2 is differentiable, it ought to be possible to backprop through it to do planning and optimization like MPC.

For example, one could generate high-scoring music pieces by generating a random sequence, text-embedding it into the vector for the reward model, and then doing gradient ascent on the vector. No PPO cluster required. Ratings can be done pairwise on the various optimized sequences (random pairs of high-scoring sequences, although before/after comparisons might be more informative) and then the reward model trained.

If gradient ascent is too slow for routine use, then one can just distill the reward model via training the GPT-2 on successively better corpuses in the usual efficient quick likelihood training (imitation learning) way, for a sort of 'expert iteration': generate improved versions of a corpus by generating & selecting new datapoints above a threshold (perhaps using a corpus of human datapoints as starting points), and train to generate that.

This could be a lot faster since it exploits the whitebox nature of a learned reward model instead of treating it as a high-variance blackbox.

If nothing else, I think it would help with the adversarial instances. Part of the problem with them is that each PPO run seems to collapse into a single specific adversarial instance. I do a bunch of ratings which penalizes that instance and fixes it, but then I wait another day or two, and then that run collapses into a new single adversarial instance. The reward model seems to gradually get better and the adversarial instances seem to gradually increase in complexity, but the process is very slow and serial. The gradient ascent approach may also run into the problem that it will find adversarial instances for the reward model, but at least it will do so in parallel: if I can run a minibatch n=11 of GPT-2-117M reward models each starting with a different random initial sequence being optimized and do gradient ascent on each in parallel, they will probably find several different adversarial instances in parallel, while the PPO would only find the one it collapses on. So one would get a lot more useful adversarial instances to rate per run.

Load More