LESSWRONG
LW

145
Adam Jermyn
1688Ω449211641
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Tracing the Thoughts of a Large Language Model
Adam Jermyn5mo31

As long as you make it clear at the header that it's your unofficial translation, go for it!

Reply
Tracing the Thoughts of a Large Language Model
Adam Jermyn6mo73

I would guess that models plan in this style much more generally. It's just useful in so many contexts. For instance, if you're trying to choose what article goes in front of a word, and that word is fixed by other constraints, you need a plan of what that word is ("an astronomer" not "a astronomer"). Or you might be writing code and have to know the type of the return value of a function before you've written the body of the function, since Python type annotations come at the start of the function in the signature. Etc. This sort of thing just comes up all over the place.

Reply
Tracing the Thoughts of a Large Language Model
Adam Jermyn6mo80

It's not so much that we didn't think models plan ahead in general, as that we had various hypotheses (including "unknown unknowns") and this kind of planning in poetry wasn't obviously the best one until we saw the evidence.

[More generally: in Interpretability we often have the experience of being surprised by the specific mechanism a model is using, even though with the benefit of hindsight it seems obvious. E.g. when we did the work for Towards Monosemanticity we were initially quite surprised to see the "the in <context>" features, thought they were indicative of a bug in our setup, and had to spend a while thinking about them and poking around before we realized why the model wanted them (which now feels obvious).]

Reply1
evhub's Shortform
Adam Jermyn8mo105

I can also confirm (I have a 3:1 match).

Reply32
Personal AI Planning
Adam Jermyn10mo136

Unless we build more land (either in the ocean or in space)?

Reply
Dario Amodei — Machines of Loving Grace
Adam Jermyn11mo84

There is Dario's written testimony before Congress, which mentions existential risk as a serious possibility: https://www.judiciary.senate.gov/imo/media/doc/2023-07-26_-_testimony_-_amodei.pdf

He also signed the CAIS statement on x-risk: https://www.safe.ai/work/statement-on-ai-risk

Reply1
Dario Amodei — Machines of Loving Grace
Adam Jermyn11mo97

He does start out by saying he thinks & worries a lot about the risks (first paragraph):

I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks... I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.

He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:

In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right.

I think this is a coherent thing to do?

Reply
Dario Amodei — Machines of Loving Grace
Adam Jermyn11mo130

I get 1e7 using 16 bit-flips per bfloat16 operation, 300K operating temperature, and 312Tflop/s (from Nvidia's spec sheet). My guess is that this is a little high because a float multiplication involves more operations than just flipping 16 bits, but it's the right order-of-magnitude.

Reply11
Yann LeCun: We only design machines that minimize costs [therefore they are safe]
Adam Jermyn1y62

Another objection is that you can minimize the wrong cost function. Making "cost" go to zero could mean making "the thing we actually care about" go to (negative huge number).

Reply
MIRI 2024 Communications Strategy
Adam Jermyn1y2516

One day a mathematician doesn’t know a thing. The next day they do. In between they made no observations with their senses of the world.

It’s possible to make progress through theoretical reasoning. It’s not my preferred approach to the problem (I work on a heavily empirical team at a heavily empirical lab) but it’s not an invalid approach.

Reply1
Load More
Tensor Networks
3 years ago
305Tracing the Thoughts of a Large Language Model
Ω
5mo
Ω
24
141Auditing language models for hidden objectives
Ω
6mo
Ω
15
36Conditioning Predictive Models: Open problems, Conclusion, and Appendix
Ω
3y
Ω
3
28Conditioning Predictive Models: Deployment strategy
Ω
3y
Ω
0
32Conditioning Predictive Models: Interactions with other approaches
Ω
3y
Ω
2
27Conditioning Predictive Models: Making inner alignment as easy as possible
Ω
3y
Ω
2
20Conditioning Predictive Models: The case for competitiveness
Ω
3y
Ω
3
72Conditioning Predictive Models: Outer alignment via careful conditioning
Ω
3y
Ω
15
89Conditioning Predictive Models: Large language models as predictors
Ω
3y
Ω
4
30Underspecification of Oracle AI
Ω
3y
Ω
12
Load More