Buck — LessWrong

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

If we are ever arguing on LessWrong and you feel like it's kind of heated and would go better if we just talked about it verbally, please feel free to contact me and I'll probably be willing to call to discuss briefly.

Out of curiosity about usage, I ctrl-f'd through the Securing Model Weights report to see how they use the word "insider". I found:

"A key goal of SL3 is reducing the risks from insider threats (e.g., company employees)"
- This seems to imply that it's including non-employees
Almost everywhere that the report talks about insider threats, it only mentions employees.

I'd add that a common reason to choose not to act against someone is that many of those factors are combined.

I think situations where it's (e.g.) purely "they have power to hurt you" or "you lack legible evidence" are much rarer than situations where it's an awkward combination of those with other things, and so it's hard to even know whether you should take on the project of acting against someone carefully and well.

People who work on politics often have to deal with adversaries who are openly sneering internet trolls (or similar), and sometimes run across valuable opportunities that require cooperating with them.

When faced with tradeoffs, you should value the ones that mean you still get to make more trades. Never put that on the line.

What about this: you can press button A or button B. If you press button A, you get a million dollars but then have to sit out round two. In button B, you play a game of chess against someone and the winner gets $10. Surely you should press A?

The heuristic you've described is probably good in a lot of situations but it's definitely not universally applicable.

Sure; I think extra speed from practicing it (and e.g. more instantly knowing that 100M is 1e8) is worth it.

This is a great list!

Here's some stuff that isn't in your list that I think comes up often enough that aspiring ML researchers should eventually know it (and most of this is indeed universally known). Everything in this comment is something that I've used multiple times in the last month.

Linear algebra tidbits
- Vector-matrix-vector products
  - Probably einsums more generally
    - And the derivative of an einsum wrt any input
- Matrix multiplication of matrices of shape [A,B] and [B,C] takes 2ABC flops.
- This stuff comes up when doing basic math about the FLOPs of a neural net architecture.
Stuff that I use as concrete simple examples when thinking about ML
- A deep understanding of linear regression, covariance, correlation. (This is useful because it is a simple analogy for fitting a probabilistic model, and it lets you remember a bunch of important facts.)
- Basic facts about (multivariate) Gaussians; Bayesian updates on Gaussians
Variance reduction, importance sampling. Lots of ML algorithms, e.g. value baselining, are basically just variance reduction tricks. Maybe consider the difference between paired and unpaired t-tests as a simple example.
- This is relevant for understanding ML algorithms, for doing basic statistics to understand empirical results, and for designing sample-efficient experiments and algorithms.
Errors go as 1/sqrt(n) so sample sizes need to grow 4x if you want your error bars to shrink 2x
AUROC is the probability that a sample from distribution A will be greater than a sample from distribution B, this is the obvious natural way of comparing distributions over a totally ordered set
Maximum likelihood estimation, MAP estimation, full Bayes
The Boltzmann distribution (aka softmax)

And some stuff I'm personally very glad to know:

The Price equation/the breeder's equation--we're constantly thinking about how neural net properties change as you train them, it is IMO helpful to have the quantitative form of natural selection in your head as an example
SGD is not parameterization invariant; natural gradients
Bayes nets
Your half-power-of-ten times tables
(barely counts) Conversions between different units of time (e.g. "there are 30M seconds in a year, there are 3k seconds in an hour, there are 1e5 seconds in a day")

I think it's worth drilling your halfish-power-of-ten times tables, by which I mean memorizing the products of numbers like 1, 3, 10, 30, 100, 300, etc, while pretending that 3x3=10.

For example, 30*30=1k, 10k times 300k is 3B, etc.

I spent an hour drilling these on a plane a few years ago and am glad I did.

(Note that this message and its parent are talking about different things: the parent talked about whether the current value is negative, and the child talked about whether the total value has been negative.)

For your point 2, are you thinking about founders in organizations that have theories of change other than doing research? Or are you thinking of founders at research orgs?

I don't think I quite understand the distinction you are trying to draw between "founders" and (not a literal quote) "people who do object-level work and make intellectual contributions by writing".

If you're the CEO of a company, it's your job to understand the space your company works in and develop extremely good takes about where the field is going and what your company should do, and use your expertise in leveraged ways to make the company go better.

In the context of AI safety, the key product that organizations are trying to produce is often itself research, and a key input is hiring talented people. So I think it makes a lot of sense that e.g. I spend a lot of my time thinking about the research that's happening at my org.

Analogously, I don't think it should be considered surprising or foolish if Elon Musk knows a lot about rockets and spends a lot of his time talking to engineers about rockets.

I do think that I am personally more motivated to do novel intellectual work than would be optimal for Redwood's interests.

I also think that the status gradients and social pressures inside the AI safety community have a variety of distorting effects on my motivations that probably cause me to take worse actions.

I think you personally feel the status gradient problems more than other AI safety executives do because a lot of AI safety people undervalue multiplier efforts. And this has meant that working at MATS is less prestigious and therefore has more trouble hiring than I'd like.

Anthropic is (probably) not meeting its RSP security commitments