LESSWRONG
LW

ghost-in-the-weights
26060
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Aaron_Scher's Shortform
ghost-in-the-weights13d20

Can we use similar methods to estimate the size and active parameters of GPT-4.5?

Naively extrapolating from the 1800B-A280B estimate for GPT-4 and the fact that GPT4.5 costs about 2.5x as much, we get 4500B-A700B.

I have no idea if that's a good guess, but hopefully someone can come up with a better one.

Reply
Why do LLMs hallucinate?
ghost-in-the-weights2mo60

I would recommend looking at the Hallucinations section of Anthropic's Tracing the Thoughts of a Large Language Model:

https://www.anthropic.com/research/tracing-thoughts-language-model

They found that Claude has a refusal/"I don't know" circuit that is activated by default, and gets deactivated by a "known entities" feature when knowledge is found.

They hypothesize that hallucinations are often caused by faulty suppression of this circuit.

Reply1
ChristianKl's Shortform
ghost-in-the-weights3mo10

Confusingly I believe o stands for "Omni" in the context of GPT-4o, since it's "omni-modal". Based on some quick googling, the o in o1/o3/o4 seems to emphasize that o1 was resetting the counter back to 1 (so it's more like zero1).

Reply
A Technique of Pure Reason
ghost-in-the-weights3mo*80

This seems similar to the "platonic ideal" model that Sam Altman described in a recent talk:

https://www.reddit.com/r/singularity/s/zzo8NhT9bd

https://m.youtube.com/watch?v=qhnJDDX2hhU (full talk)

Basically, his (and by extension OpenAI's) ideal model:

  • is very small and fast
  • has "super-human reasoning capabilities"
  • has a very long ("trillion token") context from which it can retrieve information
  • has access to a massive number of tools

The big question for these pure reasoning models is obviously how to build them. I think pure reasoning would probably have to be designed into the pretraining, rather than bolted on afterwards. Two very high-level directions I could see it being pursued:

  • For each token or sequence in the pretraining corpus, we somehow supply the model with the "knowledge" required to predict the next token, without telling it the "reasoning". This would force the model's weights to compress reasoning strategies, while the knowledge is hopefully squeezed out of them because it's redundant.
  • Optimize something other than the standard cross-entropy (GPT) loss. Cross-entropy is inherently mean-seeking, meaning that the model is incentivized to know every high-probability next token (which requires knowledge). A mode seeking loss, such as the reverse kl divergence (GAN objective) or RL rewards, would incentivize the model to know at least one of the next tokens with high reward or probability, which requires much less knowledge.
Reply
How Self-Aware Are LLMs?
ghost-in-the-weights3mo40

The best example of LLM metacognition that I've seen is this (unverified) reddit post:

https://www.reddit.com/r/ChatGPT/s/eLNe5BBM1Q

Essentially, a ChatGPT instance was fine-tuned to start each line with letters that spell "HELLO". When asked what made itself special, the model was able to correctly deduce what its special pattern was. Notably, it correctly described the pattern on only the second line.

This is really interesting because the model was not trained to describe the pattern, nor were there any examples in its context. It was somehow able to figure out its own characteristics just from the changes in its parameters.

Reply
silentbob's Shortform
ghost-in-the-weights4mo112

There has actually been some work visualizing this process, with a method called the "logit lens".

The first example that I know of: https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens

A more thorough analysis: https://arxiv.org/abs/2303.08112 

Reply
No posts to display.