skybrian — LessWrong

Yes, I agree that confabulation happens a lot, and also that our explanations of why we do things aren't particularly trustworthy; they're often self-serving. I think there's also pretty good evidence that we remember our thoughts at least somewhat, though. A personal example: when thinking about how to respond to someone online, I tend to write things in my head when I'm not at a computer.

AI chatbots don't know why they did it

skybrian2y10

That's a good question! I don't know but I suppose it's possible, at least when the input fits in the context window. How well it actually does at this seems like a question for researchers?

There's also a question of why it would do it when the training doesn't have any way of rewarding accurate explanations over human-like explanations. We also have many examples of explanations that don't make sense.

There are going to be deductions about previous text that are generally useful, though, and would need to be reconstructed. This will be true even if the chatbot didn't write the text in the first place (it doesn't know either way). The deductions couldn't be constructing the original thought process, though, when the chatbot didn't write the text.

So I think this points to a weakness in my explanation that I should look into, though it's likely still true that it confabulates explanations.

Contra LeCun on "Autoregressive LLMs are doomed"

skybrian3y20

I'm wondering what "doom" is supposed to mean here. It seems a bit odd to think that longer context windows will make things worse. More likely, LeCun meant that things won't improve enough? (Problems we see now don't get fixed with longer context windows.)

So then, "doom" is a hyperbolic way of saying that other kinds of machine learning will eventually win, because LLM doesn't improve enough.

Also, there's an assumption that longer sequences are exponentially more complicated and I don't think that's true for human-generated text? As documents grow longer, they do get more complex, but they tend to become more modular, where each section depends less on what comes before it. If long-range dependencies grew exponentially then we wouldn't understand them or be able to write them.

GPTs are Predictors, not Imitators

skybrian3y*10

Okay, but I'm still wondering if Randall is claiming he has private access, or is it just a typo?

Edit: looks like it was a typo?

At MIT, Altman said the letter was “missing most technical nuance about where we need the pause” and noted that an earlier version claimed that OpenAI is currently training GPT-5. “We are not and won’t for some time,” said Altman. “So in that sense it was sort of silly.”

https://www.theverge.com/2023/4/14/23683084/openai-gpt-5-rumors-training-sam-altman

GPTs are Predictors, not Imitators

skybrian3y10

Base64 encoding is a substitution cipher. Large language models seem to be good at learning substitutions.

GPTs are Predictors, not Imitators

skybrian3y30

Did you mean GPT-4 here? (Or are you from the future :-)

GPTs are Predictors, not Imitators

skybrian3y10

Yes, predicting some sequences can be arbitrarily hard. But I have doubts that LLM training will try to predict very hard sequences.

Suppose that some sequences are not only difficult but impossible to predict, because they're random? I would expect that with enough training, it would overfit and memorize them, because they get visited more than once in the training data. Memorization rather than generalization seems likely to happen for anything particularly difficult?

Meanwhile, there is a sea of easier sequences. Wouldn't it be more "evolutionarily profitable" to predict those instead? Pattern recognizers that predict easy sequences seem more likely to survive than pattern-recognizers that predict hard sequences. Maybe the recognizers for hard sequences would be so rarely used and make so little progress that they'd get repurposed?

Thinking like a compression algorithm, a pattern recognizer needs to be worth its weight, or you might as well leave the data uncompressed.

I'm reasoning by analogy here, so these are only possibilities. Someone will need to actually research what LLM's do. Does it work to think of LLM training as pattern-recognizer evolution? What causes pattern recognizers to be kept or dropped?

Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.

skybrian3y10

I find that explanation unsatisfying because it doesn't help with other questions I have about how well ChatGPT works:

How does the language model represent countries and cities? For example, does it know which cities are near each other? How well does it understand borders?
Are there any capitals that it gets wrong? Why?
How well does it understand history? Sometimes a country changes its capital. Does it represent this fact as only being true at some times?
What else can we expect it to do with this fact? Maybe there are situations where knowing the capital of France helps it answer a different question?

These aren't about a single prompt, they're about how well its knowledge generalizes to other prompts, and what's going to happen when you go beyond the training data. Explanations that generalize are more interesting than one-off explanations of a single prompt.

Knowing the right answer is helpful, but it only helps you understand what it will do if you assume it never makes mistakes. There are situations (like Clever Hans) where the way the horse got the right answer is actually pretty interesting. Or consider knowing that visual AI algorithms rely on textures more than shape (though this is changing).

Do you realize that you're arguing against curiosity? Understanding hidden mechanisms is inherently interesting and useful.

Want to predict/explain/control the output of GPT-4? Then learn about the world, not about transformers.

skybrian3y10

I agree that as users of a black box app, it makes sense to think this way. In particular, I'm a fan of thinking of what ChatGPT does in literary terms.

But I don't think it results in satisfying explanations of what it's doing. Ideally, we wouldn't settle for fan theories of what it's doing, we'd have some kind of debug access that lets us see how it does it.

The Waluigi Effect (mega-post)

skybrian3y10

Fair enough; comparing to quantum physics was overly snarky.

However, unless you have debug access to the language model and can figure out what specific neurons do, I don't see how the notion of superposition is helpful? When figuring things out from the outside, we have access to words, not weights.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments