Streaming Science on Twitch

A Ray

Recently I was watching a livestream of a poker professional. I was surprised and interested in how it wasn’t purely-gut calls, and also wasn’t purely technical decisions, but a blend of both (with some other stochasticity thrown in).

I’ve been thinking about how to get more good scientists, especially scientists earlier in their career.

I think it should be possible to stream science — the actual practice of it.

I think this would be disproportionally useful for younger/earlier-career people, who I expect would find twitch streams more appealing, and lack the access to advisement/mentorship that often comes later.

This probably only works first with sciences (like mine — I’m biased here) that can happen mostly/exclusively on computers. Software sciences, data sciences, and machine learning sciences seem like good candidates for this.

A bunch of it would probably be boring stuff. Debugging experiments, sorting/processing data, formatting and reformatting plots.

Also, a bunch of the good stuff probably happens internally in ways that wouldn’t be stream-able. There’s a lot of processing in my head that I don’t have access to, and I assume this is true for a lot of people who work on science.

There’s also probably tasks or modes of work that recording/streaming/broadcasting would be distracting. I assume this would also be true of video games, so maybe this effect is not that bad.

The stream could also make science better! It’s possible that (if people watch it and interact live) someone in the audience spots a bug or issue that would have been missed, or proposes a better method of doing something.

A big limitation here is that probably a lot of great science is not share-able (due to company secrets or academic results that cannot be shared before publication to avoid scoops, etc). This would necessitate working on projects of secondary quality/importance, and using a totally different code base and set of tools.

As I’m writing this, I notice that “science” feels a bit too small of a category. I would expect policy research, governance and other sorts of topics would also fit in this category. I expect these (and many others) are also fields where young/early career people have hard times finding advisement/mentorship.

What could be done in this direction soon:

I expect it would be worthwhile to go over and see how some high quality programming streams have gone. Handmade hero comes to mind, but I would bet there are a lot. Any best practices from programming streams probably carry over.

If there are scientists that stream their work, it’s probably worth going over a few of those to see what works well.

My experience recently is pretty narrow in terms of “technical alignment on language models” but I expect there’s a bunch which would be interesting and share-able. Without going into my research that I can’t share, I could do a bunch of work-relevant things:

Stream reading / analyzing / annotating papers, with narration of thought process/etc. This is probably pretty amenable to interacting with chat/questions.
Building basic research tools and utilities. In my experience, a bunch of research is this, so its a fine thing to share, even if it’s not “science-y”. Examples here are tools to visualize evaluations of data, or even just utilities like fast tokenizers.
Debugging/Launching experiments. There’s a run-up before bigger (>1hr) runs where there needs to be a bunch of iteration to make sure the larger run goes correctly. These skills are super useful (and often non-obvious) because they allow for much more rapid iteration and feedback than the default “run the whole experiment and see at the end if there was a failure”.
Analyzing experimental results. After experiments have run, there’s a bunch that needs to be done to gather and explore the data. This is one of my favorite parts of the job, because this is where the first glimpses of confirmation/refutation happen for various hypotheses.
Re-implementations of papers. I think this one is actually pretty boring and not very useful, but is a great step for debugging why things aren’t working. I’m pretty opinionated that most results aren’t real (in that they don’t extrapolate to general utility beyond the narrow domain of the paper) and their utility usually fades in the limit of compute and data.
Building evaluations. One of the hard problems with large language models (and modern machine learning) is that now that we have more general and robust systems, evaluating them becomes more difficult. Relying on existing evaluations, or naive metrics like loss, works up to a point, but developing new evaluations can unlock a bunch of better research.
Scaling laws. More generally, doing quantitative analysis of machine learning results, that allows us to propose predictive laws over resources like compute or data or model size. This is a pretty powerful research technique, in that it goes beyond the “A/B test” paradigm and lets us try to predict what future models might be like.
Analyzing / synthesizing / augmenting datasets. Data is a foundation for machine learning research, so both understanding what’s in your data, and modifying it to do more of what you want the model to learn. Understanding and creating better datasets seems to be under-valued in the industry currently, according to me.
Test time tricks. “Prompt engineering” is in this category, and it involves what can be done after training is finished in order to better understand the model, or get better results. This is particularly useful to people outside of major institutions, since it takes less resources than training the models from scratch.

As I’m writing the list, I think I could go on for a while, so there’s probably a lot to do to start with.

LESSWRONG
LW

Streaming Science on Twitch

21

21