If all you're using is ChatGPT, then now's a good time to cancel the subscription because GPT-4o seems to be similarly powerful as GPT-4, and GPT-4o is available for free.

Anthropic release Claude 3, claims >GPT-4 Performance

Caspar Oesterheld5moΩ121611

As one further data point, I also heard people close to/working at Anthropic giving "We won't advance the state of the art."-type statements, though I never asked about specifics.
My sense is also that Claude 3 Opus is only slightly better than the best published GPT-4. To add one data point: I happen to work on a benchmark right now and on that benchmark, Opus is only very slightly better than gpt-4-1106. (See my X/Twitter post for detailed results.) So, I agree with LawrenceC's comment that they're arguably not significantly advancing the state of the art.
I suppose even if Opus is only slightly better (or even just perceived to be better) and even if we all expect OpenAI to release a better GPT-4.5 soon, Anthropic could still take a bunch of OpenAI's GPT-4 business with this. (I'll probably switch from ChatGPT-4 to Claude, for instance.) So it's not that hard to imagine an internal OpenAI email saying, "Okay, folks, let's move a bit faster with these top-tier models from now on, lest too many people switch to Claude." I suppose that would already be quite worrying to people here. (Whereas, people would probably worry less if Anthropic took some of OpenAI's business by having models that are slightly worse but cheaper or more aligned/less likely to say things you wouldn't want models to say in production.)

AI things that are perhaps as important as human-controlled AI

Caspar Oesterheld5mo42

In short, the idea is that there might be a few broad types of “personalities” that AIs tend to fall into depending on their training. These personalities are attractors.

I'd be interested in why one might think this to be true. (I only did a very superficial ctrl+f on Lukas' post -- sorry if that post addresses this question.) I'd think that there are lots of dimensions of variation and that within these, AIs could assume a continuous range of values. (If AI training mostly works by training to imitate human data, then one might imagine that (assuming inner alignment) they'd mostly fall within the range of human variation. But I assume that's not what you mean.)

How LLMs are and are not myopic

Caspar Oesterheld8moΩ6104

This means that the model can and will implicitly sacrifice next-token prediction accuracy for long horizon prediction accuracy.

Are you claiming this would happen even given infinite capacity?

I think that janus isn't claiming this and I also think it isn't true. I think it's all about capacity constraints. The claim as I understand it is that there are some intermediate computations that are optimized both for predicting the next token and for predicting the 20th token and that therefore have to prioritize between these different predictions.

How LLMs are and are not myopic

Caspar Oesterheld8moΩ7100

Here's a simple toy model that illustrates the difference between 2 and 3 (that doesn't talk about attention layers, etc.).

Say you have a bunch of triplets . Your want to train a model that predicts $z_{1}$ from $x$ and $z_{2}$ from $x, z_{1}$ .

Your model consists of three components: $f, g_{1}, g_{2}$ . It makes predictions as follows:
$y = f (x)$
$z_{1} = g_{1} (y)$
$z_{2} = g_{2} (y, z_{1})$

(Why have such a model? Why not have two completely separate models, one for predicting $z_{1}$ and one for predicting $z_{2}$ ? Because it might be more efficient to use a single $f$ both for predicting $z_{1}$ and for predicting $z_{2}$ , given that both predictions presumably require "interpreting" $x$ .)

So, intuitively, it first builds an "inner representation" (embedding) of $x$ . Then it sequentially makes predictions based on that inner representation.

Now you train $f$ and $g_{1}$ to minimize the prediction loss on the $(x, z_{1})$ parts of the triplets. Simultaneously you train $f, g_{2}$ to minimize prediction loss on the full $(x, z_{1}, z_{2})$ triplets. For example, you update $f$ and $g_{1}$ with the gradients
$\nabla_{θ_{0}, θ_{1}} l (z_{1}, g_{1}^{θ_{1}} (f^{θ_{0}} (x))$

and you update $f$ and $g_{2}$ with the gradients

$\nabla_{θ_{0}, θ_{2}} l (z_{2}, g_{2}^{θ_{2}} (z_{1}, (f^{θ_{0}} (x)))$ .
(The $z_{1}$ here is the "true" $z_{1}$ , not one generated by the model itself.)

This training pressures $g_{1}$ to be myopic in the second and third sense described in the post. In fact, even if we were to train $θ_{0}, θ_{2}$ with the $z_{1}$ predicted by $g_{1}$ rather than the true $z_{1}$ , $g_{1}$ is pressured to be myopic.

Type 3 myopia: Training doesn't pressure $g_{1}$ to output something that makes the $z_{2}$ follow an easier-to-predict (computationally or information-theoretically) distribution. For example, imagine that on the training data $z_{1} = 0$ implies $z_{2} = 0$ , while under $z_{1} = 1$ , $z_{2}$ follows some distribution that depends in complicated ways on $x$ . Then $g_{1}$ will not try to predict $z_{1} = 0$ more often.
Type 2 myopia: $g_{1}$ won't try to provide useful information to $g_{2}$ in its output, even if it could. For example, imagine that the $z_{1}$ s are strings representing real numbers. Imagine that $x$ is always a natural number, that $z_{1}$ is the $x$ -th Fibonacci number and $z_{2}$ is the $x + 1$ -th Fibonacci number. Imagine further that the model representing $g_{1}$ is large enough to compute the $x$ -th Fibonacci number, while the model representing $g_{2}$ is not. Then one way in which one might think one could achieve low predictive loss would be for $g_{1}$ to output the $x$ -th Fibonacci number and then encode, for example, the $x - 1$ -th Fibonacci number in the decimal digits. (E.g., $g_{1} (10) = 55.0000000000034$ .) And then $g_{2}$ computes the $x + 1$ -th Fibonacci number from the $x$ -th decimal. But the above training will not give rise to this strategy, because $g_{2}$ gets the true $z_{1}$ as input, not the one produced by $g_{1}$ . Further, even if we were to change this, there would still be pressure against this strategy because $g_{1}$ ( $θ_{1}$ ) is not optimized to give useful information to $g_{2}$ . (The gradient used to update $θ_{1}$ doesn't consider the loss on predicting $z_{2}$ .) If it ever follows the policy of encoding information in the decimal digits, it will quickly learn to remove that information to get higher prediction accuracy on $z_{1}$ .

Of course, $g_{1}$ still won't be pressured to be type-1-myopic. If predicting $z_{1}$ requires predicting $z_{2}$ , then $g_{1}$ will be trained to predict ("plan") $z_{2}$ .

(Obviously, $g_2$ is pressured to be myopic in this simple model.)

Now what about $f$ ? Well, $f$ is optimized both to enable predicting $z_{1}$ from $f (x)$ and predicting $z_{2}$ from $f (x), z_{1}$ . Therefore, if resources are relevantly constrained in some way (e.g., the model computing $f$ is small, or the output of $f$ is forced to be small), $f$ will sometimes sacrifice performance on one to improve performance on the other. So, adapting a paragraph from the post: The trained model for $f$ (and thus in some sense the overall model) can and will sacrifice accuracy on $z_{1}$ to achieve better accuracy on $z_{2}$ . In particular, we should expect trained models to find an efficient tradeoff between accuracy on $z_{1}$ and accuracy on $z_{2}$ . When $z_{1}$ is relatively easy to predict, $f$ will spend most of its computation budget on predicting $z_{2}$ .

So, $f$ is not "Type 2" myopic. Or perhaps put differently: The calculations going into predicting $z_{1}$ aren't optimized purely for predicting $z_{2}$ .

However, $f$ is still "Type 3" myopic. Because the prediction made by $g_{1}$ isn't fed (in training) as an input to $g_{2}$ or the loss, there's no pressure towards making $f$ influence the output of $g_{1}$ in a way that has anything to do with $z_{2}$ . (In contrast to the myopia of $g_{1}$ , this really does hinge on not using $g_{2} (f (x), g_{1} (f (x)))$ in training. If $g_{2} (f (x), g_{1} (f (x)))$ mattered in training, then there would be pressure for $f$ to trick $g_{1}$ into performing calculations that are useful for predicting $z_{2}$ . Unless you use stop-gradients...)

* This comes with all the usual caveats of course. In principle, the inductive bias may favor a situationally aware model that is extremely non-myopic in some sense.

Paper: LLMs trained on “A is B” fail to learn “B is A”

Caspar Oesterheld10mo93

At least in this case (celebrities and their largely unknown parents), I would predict the opposite. That is, people are more likely to be able to correctly answer "Who is Mary Lee Pfeiffer's son?" than "Who is Tom Cruise's mother?" Why? Because there are lots of terms / words / names that people can recognize passively but not produce. Since Mary Lee Pfeiffer is not very well known, I think Mary Lee Pfeiffer will be recognizable but not producable to lots of people. (Of people who know Mary Lee Pfeiffer in any sense, I think the fraction of people who can only recognize her name is high.) As another example, I think "Who was born in Ulm?" might be answered correctly by more people than "Where was Einstein born?", even though "Einstein was born in Ulm" is a more common sentence for people to read than "Ulm is the city that Einstein was born in".

If I had to run an experiment to test whether similar effects apply in humans, I'd probably try to find cases where A and B in and of themselves are equally salient but the association A -> B is nonetheless more salient than the association B -> A. The alphabet is an example of this (where the effect is already confirmed).

Fundamentally Fuzzy Concepts Can't Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

Caspar Oesterheld1y42

I mean, translated to algorithmic description land, my claim was: It's often difficult to prove a negative and I think the non-existence of a short algorithm to compute a given object is no exception to this rule. Sometimes someone wants to come up with a simple algorithm for a concept for which I suspect no such algorithm to exist. I usually find that I have little to say and can only wait for them to try to actually provide such an algorithm.

So, I think my comment already contained your proposed caveat. ("The concept has K complexity at least X" is equivalent to "There's no algorithm of length <X that computes the concept.")

Of course, I do not doubt that it's in principle possible to know (with high confidence) that something has high description length. If I flip a coin n times and record the results, then I can be pretty sure that the resulting binary string will take at least ~n bits to describe. If I see the graph of a function and it has 10 local minima/maxima, then I can conclude that I can't express it as a polynomial of degree <10. And so on.

Fundamentally Fuzzy Concepts Can't Have Crisp Definitions: Cooperation and Alignment vs Math and Physics

Caspar Oesterheld1y123

I think I sort of agree, but...

It's often difficult to prove a negative and I think the non-existence of a crisp definition of any given concept is no exception to this rule. Sometimes someone wants to come up with a crisp definition of a concept for which I suspect no such definition to exist. I usually find that I have little to say and can only wait for them to try to actually provide such a definition. And sometimes I'm surprised by what people can come up with. (Maybe this is the same point that Roman Leventov is making.)

Also, I think there are many different ways in which concepts can be crisp or non-crisp. I think cooperation can be made crisp in some ways and not in others.

For example, I do think that (in contrast to human values) there are approximate characterizations of cooperation that are useful, precise and short. For example: "Cooperation means playing Pareto-better equilibria."

One way in which I think cooperation isn't crisp, is that you can give multiple different sensible definitions that don't fully agree with each other. (For example, some definitions (like the above) will include coordination in fully cooperative (i.e., common-payoff) games, and others won't.) I think in that way it's similar to comparing sets by size, where you can give lots of useful, insightful, precise definitions that disagree with each other. For example, bijection, isomorphism, and the subset relationship can each tell us when one set is larger than or as large as another, but they sometimes disagree and nobody expects that one can resolve the disagreement between the concepts or arrive at "one true definition" of whether one set is larger than another.

When applied to the real world rather than rational agent models, I would think we also inherit fuzziness from the application of the rational agent model to the real world. (Can we call the beneficial interaction between two cells cooperation? Etc.)

Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner's Dilemma

Caspar Oesterheld1y41

I guess we have talked about this a bunch last year, but since the post has come up again...

It then becomes clear what the requirements are besides “I believe we have compatible DTs” for Arif to believe there is decision-entanglement:

“I believe we have entangled epistemic algorithms (or that there is epistemic-entanglement[5], for short)”, and
“I believe we have been exposed to compatible pieces of evidence”.

I still don't understand why it's necessary to talk about epistemic algorithms and their entanglement as opposed to just talking about the beliefs that you happen to have (as would be normal in decision and game theory theory).

Say Alice has epistemic algorithm A with inputs x that gives rise to beliefs b and Bob has a completely different [ETA: epistemic] algorithm A' with completely different inputs x' that happens to give rise to beliefs b as well. Alice and Bob both use decision algorithm D to make decisions. Part of b is the belief that Alice and Bob have the same beliefs and the same decision algorithm. It seems that Alice and Bob should cooperate. (If D is EDT/FDT/..., they will cooperate.) So it seems that the whole A,x,A',x' stuff just doesn't matter for what they should do. It only matters what their beliefs are. My sense from the post and past discussions is that you disagree with this perspective and that I don't understand why.

(Of course, you can talk about how in practice, arriving at the right kind of b will typically require having similar A, A' and similar x, x'.)

(Of course, you need to have some requirement to the extent that Alice can't modify her beliefs in such a way that she defects but that she doesn't (non-causally) make it much more likely that Bob also defects. But I view this as an assumption about decision-theoretic not epistemic entanglement: I don't see why an epistemic algorithm (in the usual sense of the word) would make such self-modifications.)

GPT-4

Caspar Oesterheld1y10

Three months later, I still find that:
a) Bing Chat has a lot of issues that the ChatGPTs (both 3.5 or 4) don't seem to suffer from nearly as much. For example, it often refuses to answer prompts that are pretty clearly harmless.
b) Bing Chat has a harder time than I expected when answering questions that you can answer by copy-and-pasting the question into Google and then copy-and-pasting the right numbers, sentence or paragraph from the first search result. (Meanwhile, I find that Bing Chat's search still works better than the search plugins for ChatGPT 4, which seem to still have lots of mundane technical issues.) Occasionally ChatGPT (even ChatGPT 3.5) gives better (more factual or relevant) answers "from memory" than Bing Chat gives by searching.

However, when I pose very reasoning-oriented tasks to Bing Chat (i.e., tasks that mostly aren't about searching on Google) (and Bing Chat doesn't for some reason refuse to answer and doesn't get distracted by unrelated search results it gets), it seems clear that Bing Chat is more capable than ChatGPT 3.5, while Bing Chat and ChatGPT 4 seem similar in their capabilities. I pose lots of tasks that (in contrast to variants of Monty Hall (which people seem to be very interested in), etc.) I'm pretty sure aren't in the training data, so I'm very confident that this improvement isn't primarily about memorization. So I totally buy that people who asked Bing Chat the right questions were justified in being very confident that Bing Chat is based on a newer model than ChatGPT 3.5.

Also:
>I've tried (with little success) to use Bing Chat instead of Google Search.
I do now use Bing Chat instead of Google Search for some things, but I still think Bing Chat is not really a game changer for search itself. My sense is that Bing Chat doesn't/can't comb through pages and pages of different documents to find relevant info and that it also doesn't do one search to identify relevant search times for a second search, etc. (Bing Chat seems to be restricted to a few (three?) searches per query.) For the most part it seems to enter obvious search terms into Bing Search and then give information based on the first few results (even if those don't really answer the question or are low quality). The much more important feature from a productivity perspective is the processing of the information it finds, such as the processing of the information on some given webpage into a bibtex entry or applying some method from Stack Exchange to the particularities of one's code.

LESSWRONG
LW

Posts

Wiki Contributions

Comments