I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks... I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.

He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:

In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right.

I think this is a coherent thing to do?

Reply

Dario Amodei — Machines of Loving Grace

Adam Jermyn24d110

I get 1e7 using 16 bit-flips per bfloat16 operation, 300K operating temperature, and 312Tflop/s (from Nvidia's spec sheet). My guess is that this is a little high because a float multiplication involves more operations than just flipping 16 bits, but it's the right order-of-magnitude.

Reply

1

Yann LeCun: We only design machines that minimize costs [therefore they are safe]

Adam Jermyn5mo62

Another objection is that you can minimize the wrong cost function. Making "cost" go to zero could mean making "the thing we actually care about" go to (negative huge number).

Reply

MIRI 2024 Communications Strategy

Adam Jermyn5mo2512

One day a mathematician doesn’t know a thing. The next day they do. In between they made no observations with their senses of the world.

It’s possible to make progress through theoretical reasoning. It’s not my preferred approach to the problem (I work on a heavily empirical team at a heavily empirical lab) but it’s not an invalid approach.

Reply

1

The LessWrong 2022 Review

Adam Jermyn1y20

I'm guessing that the sales numbers aren't high enough to make $200k if sold at plausible markups?

Reply

How useful is mechanistic interpretability?

Adam Jermyn1y70

In Towards Monosemanticity we also did a version of this experiment, and found that the SAE was much less interpretable when the transformer weights were randomized (https://transformer-circuits.pub/2023/monosemantic-features/index.html#appendix-automated-randomized).

Reply

RSPs are pauses done right

Adam Jermyn1yΩ474

Anthropic’s RSP includes evals after every 4x increase in effective compute and after every 3 months, whichever comes sooner, even if this happens during training, and the policy says that these evaluations include fine-tuning.

Reply

Alignment Grantmaking is Funding-Limited Right Now

Adam Jermyn1yΩ9163

This matches my impression. At EAG London I was really stunned (and heartened!) at how many skilled people are pivoting into interpretability from non-alignment fields.

Reply

EIS IX: Interpretability and Adversaries

Adam Jermyn1yΩ230

Second, the measure of “features per dimension” used by Elhage et al. (2022) might be misleading. See the paper for details of how they arrived at this quantity. But as shown in the figure above, “features per dimension” is defined as the Frobenius norm of the weight matrix before the layer divided by the number of neurons in the layer. But there is a simple sanity check that this doesn’t pass. In the case of a ReLU network without bias terms, multiplying a weight matrix by a constant factor will cause the “features per dimension” to be increased by that factor squared while leaving the activations in the forward pass unchanged up to linearity until a non-ReLU operation (like a softmax) is performed. And since each component of a softmax’s output is strictly increasing in that component of the input, scaling weight matrices will not affect the classification.

It's worth noting that Elhage+2022 studied an autoencoder with tied weights and no softmax, so there isn't actually freedom to rescale the weight matrix without affecting the loss in their model, making the scale of the weights meaningful. I agree that this measure doesn't generalize to other models/tasks though.

They also define a more fine-grained measure (the dimensionality of each individual feature) in a way that is scale-invariant and which broadly agrees with their coarser measure...

Reply