ErickBall

I am currently a nuclear engineer with a focus in nuclear plant safety and probabilistic risk assessment. I am also an aspiring EA, interested in X-risk mitigation and the intersection of science and policy.

Comments

Covid 9/10: Vitamin D

I would love to have a link to send my parents to convince them to take Vitamin D as a prophylactic. The one RCT, as noted above, has various issues that make it not ideal for that purpose. Does anyone know of an article (by some sort of expert) that makes a good case for supplementation?

How hard would it be to change GPT-3 in a way that allows audio?

Since the same transformer architecture works on images with basically no modification, I suspect it would do well on audio prediction too. Finding a really broad representative dataset for speech might be difficult, but I guess audiobooks are a good start. The context window might cause problems, because 2000 byte pairs of text takes up a lot more than 4000 bytes in audio form. But I bet it would be able to mimic voices pretty well even with a small window. (edit: Actually probably not, see Gwern's answer.)

If your question is whether the trained GPT-3 model could be modified to work with audio, I suspect not. In principle there are layers of abstraction that a transformer should be able to take advantage of, so that word prediction is mostly uncoupled from audio processing, but there's not a perfect separation, and we wouldn't know how to interface them. Maybe you could train a separate transformer model that just transcribes audio into text, and stitch them together that way, but there's not much reason to think it would be a big improvement over existing speech recognition systems.

The Fusion Power Generator Scenario

Weapons grade is kind of a nebulous term. In the broadest sense it means anything isotopically pure enough to make a working bomb, and in that sense Little Boy obviously qualifies. However, standard enrichment for later uranium bombs is typically around 90%, and according to Wikipedia, Little Boy was around 80% average enrichment.

It is well known that once you have weapons-grade fissile material, building a crude bomb requires little more than a machine shop. Isotopic enrichment is historically slow and expensive (and hard to hide), but there could certainly be tricks not yet widely known...

Noise on the Channel

I'm a big fan of crowd noises for improving concentration when you need to drown out other voices, especially a TV. Much more effective than other forms of white noise.

List of public predictions of what GPT-X can or can't do?

I think your formatting with the semicolons and the equals sign has confused the transformer. All the strange words, plurals, and weird possessives may also be confusing. On TTT, if I use common words and switch to colon and linebreak as the separators, it at least picks up that the pattern is gibberish: words.

For example:

kobo: book ntthrgeS: Strength rachi: chair sviion: vision drao: road ntiket: kitten dewdngi: wedding lsptaah: asphalt veon: oven htoetsasu: southeast rdeecno: encoder lsbaap1: phonetics nekmet: chic-lookin' zhafut: crinkly lvtea: question mark cnhaetn: decorated gelsek: ribbon odrcaa: ribbon nepci: ball plel: half cls: winged redoz: brightness star: town moriub:

Everyday Lessons from High-Dimensional Optimization

It seems like the situation with bridges is roughly analagous to neural networks: the cost has nothing to do with how much you change the design (distance) but instead is proportional to how many times you change the design. Evaluating any change, big or small, requires building a bridge (or more likely, simulating a bridge). So you can't just take a tiny step in each of n directions, because it would still have n times the cost of taking a step in one direction. E. Coli is actually pretty unusual in that the evaluation is nearly free, but the change in position is expensive.

Sparsity and interpretability?

I like this visualization tool. There are some very interesting things going on here when you look into the details of the network in the second-to-last MNIST figure. One is that it seems to mostly identify each digit by ruling out the others. For instance, the first two dot-product boxes (on the lower left) could be described as "not-a-0" detectors, and will give a positive result if they detect pixels in the center, near the corners, or at the extreme edges. The next two boxes could be loosely called "not-a-9" detectors (though they also contribute to the 0 and 4 classifications) and the three after that are "not-a-4" detectors. (The use of ReLU makes this functionally different from having a "4" detector with a negative weight.)

Now, take a look at the two boxes that contribute to output 1 (I would call these "not-a-1" detectors). If you select the "7" input and see how those two boxes respond to it, they both react pretty strongly to the very bottom of the 7 (the fact that it's near the edge) and that's what allows the network to distinguish a 7 from a 1. Intuitively, this seems like a shared feature--so why is the network so sure that anything near the bottom cannot be part of a 1?

It looks to me like it's taking advantage of the way the MNIST images are preprocessed, with each digit's center-of-mass translated to the center of the image. Because the 7 has more of its mass near the top, its lower extremity can reach farther from the center. The 1, on the other hand, is not top-heavy or bottom-heavy, so it won't be translated by any significant amount in preprocessing and its extremities can't get near the edges.

The same thing happens with the "not-a-3" detector box when you select input 2. The "not-a-3" detector triggers quite strongly because of the tail that stretches out to the right. That area could never be occupied by a 3, because the 3 has most of its pixel mass near its right edge and will be translated left to get its center of mass centered in the image. The "7" detector (an exception to the pattern of ruling digits out) mostly identifies a 7 by the fact that it does not have any pixels near the top of the image (and to a lesser extent, does not have pixels in the lower-right corner).

What does this pattern tell us? First, that a different preprocessing technique (centering a bounding box in the image, for instance, instead of the digit's center of mass) would require a very different strategy. I don't know off hand what it would look like--maybe there's some trick for making the problems equivalent, maybe not. Second, that it can succeed without noticing most of what humans would consider the key features of these digits. For the most part it doesn't need to know the difference between straight lines and curved lines, or whether the parts connect the way they're supposed to, or even whether lines are horizontal or vertical. It can use simple cues like how far each digit extends in different directions from its center of mass. Maybe with so few layers (and no convolution) it has to use those simple cues.

As far as interpretability, this seems difficult to generalize to non-visual data, since humans won't intuitively grasp what's going on as easily. But it certainly seems worthwhile to explore ideas for how it could work.

Open & Welcome Thread—May 2020

I like LearnObit so far. We should talk sometime about possible improvements to the interface. Are you familiar with quantum.country? Similar goal, different methods, possible synergy. They demo what is (in my opinion) a very effective teaching technique, but provide no good method for people to create new material. I think with some minor tweaks LearnObit might be able to fill that gap.

Restricted Diet and Longevity, does eating pattern matter?

Take a look at the Fasting-Mimicking Diet, which has some decent evidence going for it. It's a 5-day period of low calorie consumption with restricted carb and protein intake, repeated every few months.

Some people actually think the benefits of caloric restriction (to the extent there are any benefits in humans beyond just avoiding overfat) may result from incidental intermittent fasting. I'm no expert but my fairly vague understanding is that the re-feeding period after a fast promotes some kind of cellular repair process that doesn't occur if you're continuously well-fed. I guess people who restrict calories overall would generally get little doses of this every once in a while as their food intake fluctuates by chance.

Covid-19: My Current Model

Thank you for the dose of empiricism. However, I see that the abstract says they found "little geographic variation in transmissibility" and do not draw any specific conclusions about heterogeneity in individuals (which obviously must exist to some extent).

They suggest that the R0 of the pandemic flu increased from one wave to the next, but there's considerable overlap in their confidence intervals so it's not totally clear that's what happened. Their waves are also a full year each, so some loss of immunity seems plausible. I wonder, too, if heterogeneity among individuals is more extreme when most people are taking precautions (as they are now).

Load More