anonymousaisafety — LessWrong

i.e. splitting hairs and swirling words around to create a perpetual motte-and-bailey fog that lets him endlessly nitpick and retreat and say contradictory things at different times using the same words, and pretending to a sort of principle/coherence/consistency that he does not actually evince.

Yeah, almost like splitting hairs around whether making the public statement "I now categorize Said as a liar" is meaningfully different than "Said is a liar".

Or admonishing someone for taking a potshot at you when they said

However, I suspect that Duncan won't like this idea, because he wants to maintain a motte-and-bailey where his posts are half-baked when someone criticizes them but fully-baked when it's time to apportion status.

...while acting as though somehow that would have been less offensive if they had only added "I suspect" to the latter half of that sentence as well. Raise your hand if you think that "I suspect that you won't like this idea, because I suspect that you have the emotional maturity of a child" is less offensive because it now represents an unambiguously true statement of an opinion rather than being misconstrued as a fact. A reasonable person would say "No, that's obviously intended to be an insult" -- almost as though there can be meaning beyond just the words as written.

The problem is that if we believe in your philosophy of constantly looking for the utmost literal interpretation of the written word, you're tricking us into playing a meta-gamed, rules-lawyered, "Sovereign citizen"-esque debate instead of, what's the word -- oh, right, Steelmanning. Assuming charity from the other side. Seeking to find common ground.

For example, I can point out that Said clearly used the word "or" in their statement. Since reading comprehension seems to be an issue for a "median high-karma LWer" like yourself, I'll bold it for you.

Said: Well, I think that “criticism”, in a context like this topic of discussion, certainly includes something like “pointing to a flaw or lacuna, or suggesting an important or even necessary avenue for improvement”.

Is it therefore consistent for "asking for examples" to be contained by that set, while likewise not being pointing to a flaw? Yes, because if we say that a thing is contained by a set of "A or B", it could be "A", or it could be "B".

Now that we've done your useless exercise of playing with words, what have we achieved? Absolutely nothing, which is why games like these aren't tolerated in real workplaces, since this is a waste of everyone's time.

You are behaving in a seriously insufferable way right now.

Sorry, I meant -- "I think that you are behaving in what feels like to me a seriously insufferable way right now, where by insufferable I mean having or showing unbearable arrogance or conceit".

Moderation notes re: recent Said/Duncan threads

anonymousaisafety3y-212

Yes, I have read your posts.

I note that in none of them did you take any part of the responsibility for escalating the disagreement to its current level of toxicity.

You have instead pointed out Said's actions, and Said's behavior, and the moderators lack of action, and how people "skim social points off the top", etc.

Moderation notes re: recent Said/Duncan threads

anonymousaisafety3y4737

@Duncan_Sabien I didn't actually upvote @clone of saturn's post, but when I read it, I found myself agreeing with it.

I've read a lot of your posts over the past few days because of this disagreement. My most charitable description of what I've read would be "spirited" and "passionate".

You strongly believe in a particular set of norms and want to teach everyone else. You welcome the feedback from your peers and excitedly embrace it, insofar as the dot product between a high-dimensional vector describing your norms and a similar vector describing the criticism is positive.

However, I've noticed that when someone actually disagrees with you -- and I mean disagreement in the sense of "I believe that this claim rests on incorrect priors and is therefore false." -- I have been shocked by the level of animosity you've shown in your writing.

Full disclosure: I originally messaged the moderators in private about your behavior, but I'm now writing this in public because in part because of your continued statements on this thread that you've done nothing wrong.

I think that your responses over the past few days have been needlessly escalatory in a way that Said's weren't. If we go with the Socrates metaphor, Said is sitting there asking "why" over and over, but you've let emotions rule and leapt for violence (metaphorically, although you then did then publish a post about killing Socrates, so YMMV).

There will always be people who don't communicate in a way that you'd prefer. It's important (for a strong, functioning team) to handle that gracefully. It looks to me that you've become so self-convinced that your communication style is "correct" that you've taken a war path towards the people who won't accept it -- Zack and Said.

In a company, this is problematic because some of the things that you're asking for are actually not possible for certain employees. Employees who have English as a second language, or who come from a different culture, or who may have autism, all might struggle with your requirements. As a concrete example, you wrote at length that saying "This is insane" is inflammatory in a way that "I think that this is insane" wouldn't be -- while I understand and appreciate the subtlety of that distinction, I also know that many people will view the difference between those statements as meaningless filler at best. I wrote some thoughts on that here: https://www.lesswrong.com/posts/9vjEavucqFnfSEvqk/on-aiming-for-convergence-on-truth?commentId=rGaKpCSkK6QnYBtD4

I believe that you are shutting down debates prematurely by casting your peers as antagonist towards you. In a corporate setting, as an engineer acquires more and more seniority, it becomes increasingly important for them to manage their emotions, because they're a role model for junior engineers.

I do think that @Said Achmiz can improve their behavior too. In particular, I think Said could recognize that sometimes their posts are met with hostility, and rather than debating this particular point, they could gracefully disengage from a specific conversation when they determine that someone does not appreciate their contributions.

However, I worry that you, Duncan, are setting an increasingly poor example. I don't know that I agree with the ability to ban users from posts. I think I lean more towards "ability to hide any posts from a user" as a feature, more than "prevent users from commenting". That is to say, I think if you're triggered by Said or Zack, then the site should offer you tools to hide those posts automatically. But I don't think that you should be able to prevent Said or Zack from commenting on your posts, or prevent other commentators from seeing that criticism. In part, I agree strongly (and upvoted strongly) with @Wei_Dai's point elsewhere in this thread that blocking posters means we can't tell the difference between "no one criticized this" and "people who would criticize it couldn't", unless they write their own post, as @Zack_M_Davis did.

On "aiming for convergence on truth"

anonymousaisafety3y2114

Sometimes when you work at a large tech-focused company, you'll be pulled into a required-but-boring all-day HR meeting to discuss some asinine topic like "communication styles".

If you've had the ~~misfortune~~ fun of attending one of those meetings, you might remember that the topic wasn't about teaching a hypothetically "best" or "optimal" communication style. The goal was to teach employees how to recognize when you're speaking to someone with a different communication style, and then how to tailor your understanding of what they're saying with respect to them. For example, some people are more straightforward than others, so a piece of seemingly harsh criticism like "This won't work for XYZ reason." doesn't mean that they disrespect you -- they're just not the type of person who would phrase that feedback as "I think that maybe we've neglected to consider the impact of XYZ on the design."

I have read the many pages of debate on this current disagreement over the past few days. I have followed the many examples of linked posts that were intended to show bad behavior by one side or the other.

I think Zack and gjm have a good job at communicating with each other despite differences in their preferred communication styles, and in particular, I agree strongly with gjm's analysis:

I think this is the purpose of Duncan's proposed guideline 5. Don't engage in that sort of adversarial behaviour where you want to win while the other party loses; aim at truth in a way that, if you are both aiming at truth, will get you both there. And don't assume that the other party is being adversarial, unless you have to, because if you assume that then you'll almost certainly start doing the same yourself; starting out with a presumption of good faith will make actual good faith more likely.

And then with Zack's opinion:

That said, I don't think there's a unique solution for what the "right" norms are. Different rules might work better for different personality types, and run different risks of different failure modes (like nonsense aggressive status-fighting vs. nonsense passive-aggressive rules-lawyering). Compared to some people, I suppose I tend to be relatively comfortable with spaces where the rules err more on the side of "Punch, but be prepared to take a punch" rather than "Don't punch anyone"—but I realize that that's a fact about me, not a fact about the hidden Bayesian structure of reality. That's why, in "'Rationalist Discourse' Is Like 'Physicist Motors'", I made an analogy between discourse norms and motors or martial arts—there are principles governing what can work, but there's not going to be a unique motor, a one "correct" martial art.

I also agree with Zack when they said:

I'm unhappy with the absence of an audience-focused analogue of TEACH. In the following, I'll use TEACH to refer to making someone believe X if X is right; whether the learner is the audience or the interlocutor B isn't relevant to what I'm saying.

I seldom write comments with the intent of teaching a single person. My target audience is whoever is reading the posts, which is overwhelmingly going to be more than one person.

From Duncan, I agree with the following:

It is in fact usually the case that, when two people disagree, each one possesses some scrap of map that the other lacks; it's relatively rare that one person is just right about everything and thoroughly understands and can conclusively dismiss all of the other person's confusions or hesitations. If you are trying to see and understand what's actually true, you should generally be hungry for those scraps of map that other people possess, and interested in seeing, understanding, and copying over those bits which you were missing.

Almost all of my comments tend to focus on a specific disagreement that I have with the broader community. That disagreement is due to some prior that I hold, that is not commonly held here.

And from Said, I agree with this:

Examples?

This community is especially prone to large, overly-wordy armchair philosophy about this-or-that with almost no substantial evidence that can tie the philosophy back down to Earth. Sometimes that philosophy gets camouflaged in a layer of pseudo-math; equations, lemmas, writing as if the post is demonstrating a concrete mathematical proof. To that end, focusing the community on providing examples is a valuable, useful piece of constructive feedback. I strongly disagree that this is an unfair burden on authors.

EDIT: I forgot to write an actual conclusion. Maybe "don't expect everyone to communicate in the same way, even if we assume that all interested parties care about the truth"?

Is "Strong Coherence" Anti-Natural?

anonymousaisafety3y51

It seems to me that humans are more coherent and consequentialist than other animals. Humans are not perfectly coherent, but the direction is towards more coherence.

This isn't a universally held view. Someone wrote a fairly compelling argument against it here: https://sohl-dickstein.github.io/2023/03/09/coherence.html

The surprising parameter efficiency of vision models

anonymousaisafety3y61

We don't do any of these things for diffusion models that output images, and yet these diffusion models manage to be much smaller than models that output words, while maintaining an even higher level of output quality. What is it about words that makes the task different?

I'm not sure that "even higher level of output quality" is actually true, but I recognize that it can be difficult to judge when an image generation model has succeeded. In particular, I think current image models are fairly bad at specifics in much the same way as early language models.

But I think the real problem is that we seem to still be stuck on "words". When I ask GPT-4 a logic question, and it produces a grammatically correct sentence that answers the logic puzzle correctly, only part of that is related to "words" -- the other part is a nebulous blob of reasoning.

I went all the way back to GPT-1 (117 million parameters) and tested next word prediction -- specifically, I gave a bunch of prompts, and I looked for only if the very next word was what I would have expected. I think it's incredibly good at that! Probably better than most humans.

Or are you suggesting that image generators could also be greatly improved by training minimal models, and then embedding those models within larger networks?

No, because this is already how image generators work. That's what I said in my first post when I noted the architectural differences between image generators and language models. An image generator, as a system, consists of multiple models. There is a text -> image space, and then an image space -> image. The text -> image space encoder is generally trained first, then it's normally frozen during the training of the image decoder.^[1] Meanwhile, the image decoder is trained on a straightforward task: "given this image, predict the noise that was added". In the actual system, that decoder is put into a loop to generate the final result. I'm requoting the relevant section of my first post below:

The reason why I'm discussing the network in the language of instructions, stack space, and loops is because I disagree with a blanket statement like "scale is all you need". I think it's obvious that scaling the neural network is a patch on the first two constraints, and scaling the training data is a patch on the third constraint.
This is also why I think that point #3 is relevant. If GPT-3 does so well because it's using the sea of parameters for unrolled loops, then something like Stable Diffusion at 1/200th the size probably makes sense.

^{^}
Refer to figure 2 in https://cdn.openai.com/papers/dall-e-2.pdf. Or read this:
The trick here is that they decoupled the encoding from training the diffusion model. That way, the autoencoder can be trained to get the best image representation and then downstream several diffusion models can be trained on the so-called latent representation
This is the idea that I'm saying could be applied to language models, or rather, to a thing that we want to demonstrate "general intelligence" in the form of reasoning / problem solving / Q&A / planning / etc. First train a LLM, then train a larger system with the LLM as a component within it.

The surprising parameter efficiency of vision models

anonymousaisafety3y20

Yes, it's my understanding that OpenAI did this for GPT-4. It's discussed in the system card PDF. They used early versions of GPT-4 to generate synthetic test data and also as an evaluator of GPT-4 responses.

The surprising parameter efficiency of vision models

anonymousaisafety3y101

First, when we say "language model" and then we talk about the capabilities of that model for "standard question answering and factual recall tasks", I worry that we've accidentally moved the goal posts on what a "language model" is.

Originally, a language model was a stochastic parrot. They were developed to answer questions like "given these words, what comes next?" or "given this sentence, with this unreadable word, what is the most likely candidate?" or "what are the most common words?"^[1] It was not a problem that required deep learning.

Then, we applied deep learning to it, because the path of history so far has been to take straightforward algorithms, replace them with a neural network, and see what happens. From that, we got ... stochastic parrots! Randomizing the data makes perfect sense for that.

Then, we scaled it. And we scaled it more. And we scaled it more.

And now we've arrived at a thing we keep calling a "language model" due to history, but it isn't a stochastic parrot anymore.

Second, I'm not saying "don't randomize data", I'm saying "use a tiered approach to training". We would use all of the same techniques: randomization, masking, adversarial splits, etc. What we would not do is throw all of our data and all of our parameters into a single, monolithic model and expect that would be efficient.^[2] Instead, we'd first train a "minimal" LLM, then we'd use that LLM as a component within a larger NN, and we'd train that combined system (LLM + NN) on all of the test cases we care about for abstract reasoning / problem solving / planning / etc. It's that combined system that I think would end up being vastly more efficient than current language models, because I suspect the majority of language model parameters are being used for embedding trivia that doesn't contribute to the core capabilities we recognize as "general intelligence".

^{^}
This wasn't for auto-complete, it was generally for things like automatic text transcription from images, audio, or videos. Spam detection was another use-case.
^{^}
Recall that I'm trying to offer a hypothesis for why a system like GPT-3.5 takes so much training and has so many parameters and it still isn't "competent" in all of the ways that a human is competent. I think "it is being trained in an inefficient way" is a reasonable answer to that question.

The surprising parameter efficiency of vision models

anonymousaisafety3y94

I suspect it is a combination of #3 and #5.

Regarding #5 first, I personally think that language models are being trained wrong. We'll get OoM improvements when we stop randomizing the examples we show to models during training, and instead provide examples in a structured curriculum.

This isn't a new thought, e.g. https://arxiv.org/abs/2101.10382

To be clear, I'm not saying that we must present easy examples first and then harder examples later. While that is what has been studied in the literature, I think we'd actually get better behavior by trying to order examples on a spectrum of "generalizes well" to "very specific, does not generalize" and then training in that order. Sometimes this might be equivalent to "easy examples first", but that isn't necessarily true.

I recognize that the definitions of "easy" and "generalizes" are nebulous, so I'm going to try and explain the reasoning that led me here.

Consider the architecture of transformers and feed-forward neural networks (specifically not recurrent neural networks). We're given some input, and we produce some output. In a model like GPT, we're auto-regressive, so as we produce our outputs, those outputs become part of the input during the next step. Each step is fundamentally a function .

Given some input, the total output can be thought as:

def reply_to(input):
output = ""
while True:
token = predict_next(input + output)
if token == STOP:
break
output += token
return output

We'd like to know exactly what `predict_next` is doing, but unfortunately, the programmer who wrote it seems to have done their implementation entirely in matrix math and they didn't include any comments. In other words, it's deeply cursed and not terribly different from the output of Simulink's code generator.

def predict_next(input):
# ... matrix math
return output

Let's try to think about the capabilities and constraints on this function.

There is no unbounded `loop` construct. The best we can do is approximate loops, e.g. by supporting an unrolled loop up to some bounded number of iterations. What determines the bounds? Probably the depth of the network?
If the programmer were sufficiently deranged, they could implement `predict_next` in such a way that if they've hit the bottom of their unrolled loop, they could rely on the fact that `predict_next` will be called again, and continue their previous calculations during the next call. What would be the limitations on this? Probably the size of each hidden layer. If you wanted to figure out if this is happening, you'd want to look for prompts where the network can answer the prompt correctly if it is allowed to generate text before the answer (e.g. step-by-step explanations) but is unable to do so if asked to provide the answer without any associated explanations.
How many total "instructions" can fit into this function? The size of the network seems like a decent guess. Unfortunately, the network conflates instructions and data, and the network must use all parameters available to it. This leads to trivial solutions where the network just over-fits to the data (analogous to baking in a lookup table on the stack). It's not unsurprising that throwing OoM more data at a fixed size NN results in better generalization. Once you're unable to cheat with over-fitting you must learn algorithms that work more efficiently.

The reason why I'm discussing the network in the language of instructions, stack space, and loops is because I disagree with a blanket statement like "scale is all you need". I think it's obvious that scaling the neural network is a patch on the first two constraints, and scaling the training data is a patch on the third constraint.

This is also why I think that point #3 is relevant. If GPT-3 does so well because it's using the sea of parameters for unrolled loops, then something like Stable Diffusion at 1/200th the size probably makes sense.

To tie this back to point #5:

We start with a giant corpus of data. On the order of "all written content available in digital form". We might generate additional data in an automated fashion, or digitize books, or caption videos.
We divide it into training data and test data.
We train the network on random examples from the training data, and then verify on random examples from the test data. For simplicity, I'm glossing over various training techniques like masking data or connections between nodes.
Then we fine-tune it, e.g with Q&A examples.
And then generally we deploy it with some prompt engineering, e.g. prefixing queries with past transcript history, to fake a conversation.

At the end of this process, what do we have?

I want to emphasize that I do not think it is a "stochastic parrot". I think it is very obvious that the final system has internalized actual algorithms (or at least, pseudo-algorithms due to the limitation on loops) for various tasks, given the fact that the size of the data set is significantly larger than the size of the model. I think people who are surprised by the capabilities of these systems continue to assume it is "just" modeling likelihoods, when there was no actual requirement on that.

I also suspect we've wasted an enormous quantity of our parameters on embedding knowledge that does not directly contribute to system's capabilities.

My hypothesis for how to fix this is vaguely similar to the idea of "maximizing divergence" discussed here https://ljvmiranda921.github.io/notebook/2022/08/02/splits/.

I think we could train a LLM on a minimal corpus to "teach" a language^[1] and then place that LLM inside of a larger system that we train to minimize loss on examples teaching logic, mathematics, and other components of reasoning. That larger system would distinguish between the weights for the algorithms it learns and the weights representing embedded knowledge. It would also have the capability to loop during the generation of an output. For comparison, think of the experiments being done with hooking up GPT-4 to a vector database, but now do that inside of the architecture instead of as a hack on top of the text prompts.

I think an architecture that cleanly separates embedded knowledge ("facts", "beliefs", "shards", etc) from the algorithms ("capabilities", "zero-shot learning") is core to designing a neural network that remains interpretable and alignable at scale.

If you read the previous paragraphs and think, "that sounds familiar", it's probably because I'm describing how we teach humans: first language, then reasoning, then specialization. A curriculum. We need language first because we want to be able to show examples, explain, and correct mistakes. Especially since we can automate content generation with existing LLMs to create the training corpus in these steps. Then we want to teach reasoning, starting with the most general forms of reasoning, and working into the most specific. Finally, we grade the system (not train!) on a corpus of specific knowledge-based activities. Think of this step as describing the rules of a made-up game, providing the current game state, and then asking for the optimal move. Except that for games, for poems, for math, for wood working, for engineering, etc. The whole point of general intelligence is that you can reason from first principles, so that's what we need to be grading the network on: minimizing loss with respect to arbitrarily many knowledge-based tasks that must be solved using the facts provided only during the test itself.

^{^}
Is English the right language to teach? I think it would be funny if a constructed language actually found a use here.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments