LESSWRONG
LW

dhar174
2020
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
GPT-4 Specs: 1 Trillion Parameters?
dhar1742y10

You're missing the possibility that parameters during training were larger than models used for inference. It is common practice now to train large, then distill into a series of smaller models that can be used based on the task need.

Reply
The idea that ChatGPT is simply “predicting” the next word is, at best, misleading
dhar1742y34

To those that believe language models do not have internal representations of concepts:

I can help at least partially disprove the assumptions behind that.

There is convincing evidence otherwise, as demonstrated through an Othello in an actual experiment:

https://thegradient.pub/othello/ The researchers conclusion:

"Our experiment provides evidence supporting that these language models are developing world models and relying on the world model to generate sequences." )

Reply
No posts to display.