Adam Jermyn — LessWrong

Tracing the Thoughts of a Large Language Model

As long as you make it clear at the header that it's your unofficial translation, go for it!

Tracing the Thoughts of a Large Language Model

I would guess that models plan in this style much more generally. It's just useful in so many contexts. For instance, if you're trying to choose what article goes in front of a word, and that word is fixed by other constraints, you need a plan of what that word is ("an astronomer" not "a astronomer"). Or you might be writing code and have to know the type of the return value of a function before you've written the body of the function, since Python type annotations come at the start of the function in the signature. Etc. This sort of thing just comes up all over the place.

Tracing the Thoughts of a Large Language Model

Adam Jermyn7mo80

It's not so much that we didn't think models plan ahead in general, as that we had various hypotheses (including "unknown unknowns") and this kind of planning in poetry wasn't obviously the best one until we saw the evidence.

[More generally: in Interpretability we often have the experience of being surprised by the specific mechanism a model is using, even though with the benefit of hindsight it seems obvious. E.g. when we did the work for Towards Monosemanticity we were initially quite surprised to see the "the in <context>" features, thought they were indicative of a bug in our setup, and had to spend a while thinking about them and poking around before we realized why the model wanted them (which now feels obvious).]

evhub's Shortform

Adam Jermyn10mo105

I can also confirm (I have a 3:1 match).

Personal AI Planning

Adam Jermyn1y136

Unless we build more land (either in the ocean or in space)?

Dario Amodei — Machines of Loving Grace

Adam Jermyn1y84

There is Dario's written testimony before Congress, which mentions existential risk as a serious possibility: https://www.judiciary.senate.gov/imo/media/doc/2023-07-26_-_testimony_-_amodei.pdf

He also signed the CAIS statement on x-risk: https://www.safe.ai/work/statement-on-ai-risk

Dario Amodei — Machines of Loving Grace

Adam Jermyn1y97

He does start out by saying he thinks & worries a lot about the risks (first paragraph):

I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks... I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.

He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:

In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right.

I think this is a coherent thing to do?

Dario Amodei — Machines of Loving Grace

Adam Jermyn1y130

I get 1e7 using 16 bit-flips per bfloat16 operation, 300K operating temperature, and 312Tflop/s (from Nvidia's spec sheet). My guess is that this is a little high because a float multiplication involves more operations than just flipping 16 bits, but it's the right order-of-magnitude.

Yann LeCun: We only design machines that minimize costs [therefore they are safe]

Adam Jermyn1y62

Another objection is that you can minimize the wrong cost function. Making "cost" go to zero could mean making "the thing we actually care about" go to (negative huge number).

MIRI 2024 Communications Strategy

Adam Jermyn1y2516

One day a mathematician doesn’t know a thing. The next day they do. In between they made no observations with their senses of the world.

It’s possible to make progress through theoretical reasoning. It’s not my preferred approach to the problem (I work on a heavily empirical team at a heavily empirical lab) but it’s not an invalid approach.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments