bayesed

66

What do you mean when you say you're "willing to bet according to the Kelly Criterion"? If you're proposing a bet with 99% odds and your actual belief that you'll win the bet is also 99%, then the Kelly Criterion would advise against betting (since the EV of such a bet would be zero, i.e. merely neutral).

Perhaps you mean that the other person should come up with the odds, and then you'll determine your bet amount using the Kelly criterion, assuming a 99% probability of winning for yourself.

86

Yeah, based on EY's previous tweets regarding this, it seemed like it was supposed to be a TED talk.

43

Example origin scenario of this Nash equilibrium from GPT-4:

In this hypothetical scenario, let's imagine that the prisoners are all part of a research experiment on group dynamics and cooperation. Prisoners come from different factions that have a history of rivalry and distrust.

Initially, each prisoner sets their dial to 30 degrees Celsius, creating a comfortable environment. However, due to the existing distrust and rivalry, some prisoners suspect that deviations from the norm—whether upward or downward—could be a secret signal from one faction to another, possibly indicating an alliance, a plan to escape, or a strategy to undermine the other factions.

To prevent any secret communication or perceived advantage, the prisoners agree that any deviation from the agreed-upon temperature of 30 degrees Celsius should be punished with a higher temperature of 100 degrees Celsius in the next round. They believe that this punishment will deter any secret signaling and maintain a sense of fairness among the factions.

Now, imagine an external party with access to the temperature control system decides to interfere with the experiment. This person disables the dials and changes the temperature to 99 degrees Celsius for a few rounds, heightening the prisoners' confusion and distrust.

When the external party re-enables the dials, the prisoners regain control of the temperature. However, their trust has been severely damaged, and they are now unsure of each other's intentions. In an attempt to maintain fairness and prevent further perceived manipulation, they decide to adopt a strategy of punishing any deviations from the new 99 degrees Celsius temperature.

As a result, the prisoners become trapped in a suboptimal Nash equilibrium where no single prisoner has an incentive to deviate from the 99 degrees Celsius strategy, fearing retaliation in the form of higher temperatures. In this scenario, a combination of technical glitches, external interference, miscommunication, and distrust leads to the transition from an agreed-upon temperature of 30 to a suboptimal Nash equilibrium of 99 degrees Celsius.

As time goes on, this strategy becomes ingrained, and the prisoners collectively settle on an equilibrium temperature of 99 degrees Celsius. They continue to punish any deviations—upward or downward—due to their entrenched suspicion and fear of secret communication or potential advantage for one faction over another. In this situation, the fear of conspiracy and the desire to maintain fairness among the factions lead to a suboptimal Nash equilibrium where no single prisoner has an incentive to deviate from the 99 degrees Celsius strategy.

30

Hmm, I've not seen people refer to (ChatGPT + Code execution plugin) as an LLM. IMO, an LLM is supposed to be language model consisting of just a neural network with a large number of parameters.

3-2

I'm a bit confused about this post. Are you saying it is theoretically impossible to create an LLM that can do 3*3 matrix multiplication without using chain of thought? That seems false.

The amount of computation an LLM has done so far will be a function of both the size of the LLM (call it the *s* factor) and the number of tokens generate so far (*n*). Let's say matrix multiplication of *n*n* matrices requires *cn^3* amount of computation (actually, there are more efficient algos, but it doesn't matter).

You can do this by either using a small LLM and *n^3* tokens so that *sn^3 > cn^3*. Or you can use a bigger LLM, so that *s_big*n> cn^3*. So then just need *n* tokens.

In general, you can always get a bigger and bigger constant factor to solve problems with higher n.

If your claim is that, for any LLM that works in the same way as GPT, there will exist a value of *n* for which it will stop being capable of doing *n*n* matrix multiplication without chain of thought/extra work, I'd cautiously agree.

An LLM takes the same amount of computation for each generated token, regardless of how hard it is to predict. This limits the complexity of any problem an LLM is trying to solve.

For a given LLM, yes, there will be a limit on amount of computation it can do while generating a token. But this amount can be arbitrarily large. And you can always make a bigger/smarter LLM.

Consider two statements:

- "The richest country in North America is the United States of ______"
- "The SHA1 of 'abc123', iterated 500 times, is _______"
An LLM's goal is to predict the best token to fill in the blank given its training and the previous context. Completing statement 1 requires knowledge about the world but is computationally trivial. Statement 2 requires a lot of computation. Regardless, the LLM performs the same amount of work for either statement.

Well, it might do the same amount of "computation" but for problem 1 that might mostly be filler work while for problem 2 it can do intelligent work. You can always use more compute than necessary, so why does statement 1 using the same amount of compute as statement 2 imply any sort of limitation on the LLM?

It cannot correctly solve computationally hard statements like #2. Period. If it could, that would imply that all problems can be solved in constant time, which is provably (and obviously) false.

Why does this matter? It puts some bounds on what an LLM can do.

But it's easy to imagine a huge LLM capable doing 500 iterations of SH1 of small strings in one shot (even without memorization)? Why do you think that's impossible? (Just imagine a transformer circuit calculating SHA1, repeated 500 times). This doesn't imply that all problems can be solved in constant time. It just means that the LLM will only be able to do this until the length of string is bigger than a certain limit. After that, you'll need to make the LLM bigger/smarter.

51

I don't think GPT4 can be used with plugins in ChatGPT. It seems to be a different model, probably based on GPT3.5 (evidence: the color of the icon is green, not black; seems faster than GPT4; no limits or quota; no explicit mention of GPT4 anywhere in announcement).

So I think there's a good chance the title is wrong.

10

Additional comments on creative mode by Mikhail (from today):

https://twitter.com/MParakhin/status/1636350828431785984

We will {...increase the speed of creative mode...}, but it probably always be somewhat slower, by definition: it generates longer responses, has larger context.

https://twitter.com/MParakhin/status/1636352229627121665

Our current thinking is to keep maximum quality in Creative, which means slower speed.

https://twitter.com/MParakhin/status/1636356215771938817

Our current thinking about Bing Chat modes: Balanced: best for the most common tasks, like search, maximum speed Creative: whenever you need to generate new content, longer output, more expressive, slower Precise: most factual, minimizing conjectures

So creative mode definitely has larger context size, and might also be a larger model?

118

Based on Mikhail's Twitter comments, 'precise' and 'creative' don't seem to be too much more than simply the 'temperature' hyperparameter for sampling. 'Precise' would presumably correspond to very low, near-zero or zero, highly deterministic samples.

Nope, Mikhail has said the opposite: https://twitter.com/MParakhin/status/1630280976562819072

Nope, the temperature is (roughly) the same.

So I'd guess the main difference is in the prompt.

But if that's the case, he could simply mention the amount he's willing to bet. The phrasing kinda suggested to me that he doesn't have all the info needed to do the Kelly calculation yet.