Skip to main content

May 2021 News

May 2021 Gwern.net newsletter with links on AI hardware, diffusion models, optogenetics, brain scanning.

May 2021’s Gwern.net newsletter is now out; previous, April 2021 (archives). This is a collation of links and summary of major changes, overlapping with my Changelog; brought to you by my donors on Patreon.

Note: I will be in Denver 12–13 June 2021 for a conference.

Writings


  1. What is a diffusion model like DDPM? To try to explain it as simply as possible without the math:

    DDPM is a neural net which is trained to fix noise in an image: it takes a noisy image and ‘sharpens’ it to produce a new image. You train it by adding dirt to a normal image, and teaching it to turn the dirty version into the original. As it gets better, it learns what the images all tend to look like so it can ‘see through’ ever more noise, to turn smudged hints of the original image into its best guess. Once it’s done training, what happens if you give it a completely dirty photo, which is pure static noise? Well, it produces a slightly less dirty ‘photo’. And if you do it again? it’s a little cleaner still. Now, what if you do this many times? It has to get cleaner each time. The end result: the static noise goes in, and a face pops out! The DDPM has hallucinated a face out of the noise. One little blob of static here turned into a nose, and another blob turned into an ear, and it went from there.↩︎

  2. Why do larger animals need so much more neurons to control their body, when one would expect the hierarchical structure to be efficient? One possibility from an ANN perspective is the tradeoff between width & depth (wide vs deep models learn different things): wide shallow nets have low latency, but tend to be parameter-inefficient compared to deeper nets (perhaps because they lern more redundant but parallel representations?). Because larger animals live in the same world as smaller ones and still need to act with reasonable latency on the millisecond to second time scale, they presumably are forced towards wider nets, and away from a latency-unconstrained parameter or FLOPS-optimal architecture & scaling.↩︎