Ege Erdil

If you have any questions for me or just want to talk, feel free to reach out by sending a private message on this site or by sending an e-mail to egeerdil96@gmail.com.

You can also find me on Metaculus at https://www.metaculus.com/accounts/profile/116023/, or on Discord with the username starfall7651.

Wiki Contributions

Comments

It is not a tautology.

Can you explain to me the empirical content of the claim, then? I don't understand what it's supposed to mean.

About the rest of your comment, I'm confused about why you're discussing what happens when both chess engines and humans have a lot of time to do something. For example, what's the point of this statement?

My understanding is that it is not true that if you ran computers for a long time that they would beat the human also running for a long time, and that historically, it's been quite the opposite...

I don't understand how this statement is relevant to any claim I made in my comment. Humans beating computers at equal time control is perfectly consistent with the computers being slower than humans. If you took a human and slowed them down by a factor of 10, that's the same pattern you would see.

Are you instead trying to find examples of tasks where computers were beaten by humans when given a short time to do the task but could beat the humans when given a long time to do the task? That's a very different claim from "in every case where we've successfully gotten AI to do a task at all, AI has done that task far far faster than humans".

Yes, that's what I'm trying to say, though I think in actual practice the numbers you need would have been much smaller for the Go AIs I'm talking about than they would be for the naive tree search approach.

Sure, but in that case I would not say the AI thinks faster than humans, I would say the AI is faster than humans at a specific range of tasks where the AI can do those tasks in a "reasonable" amount of time.

As I've said elsewhere, there is a quality or breadth vs serial speed tradeoff in ML systems: a system that only does one narrow and simple task can do that task at a high serial speed, but as you make systems more general and get them to handle more complex tasks, serial speed tends to fall. The same logic that people are using to claim GPT-4 thinks faster than humans should also lead them to think a calculator thinks faster than GPT-4, which is an unproductive way to use the one-dimensional abstraction of "thinking faster vs. slower".

You might ask "Well, why use that abstraction at all? Why not talk about how fast the AIs can do specific tasks instead of trying to come up with some general notion of if their thinking is faster or slower?" I think a big reason is that people typically claim the faster "cognitive speed" of AIs can have impacts such as "accelerating the pace of history", and I'm trying to argue that the case for such an effect is not as trivial to make as some people seem to think.

True, but isn't this almost exactly analogously true for neuron firing speeds? The corresponding period for neurons (10 ms - 1 s) does not generally correspond to the timescale of any useful cognitive work or computation done by the brain.

Yes, which is why you should not be using that metric in the first place.

But even the top-line number is (at least theoretically) a very concrete measure of something that you can actually get out of the system. In contrast, when used in "computational equivalence" estimates of the brain, FLOP/s are (somewhat dubiously, IMO) repurposed as a measure of what the system is doing internally.

Will you still be saying this if future neural networks are running on specialized hardware that, much like the brain, can only execute forward or backward passes of a particular network architecture? I think talking about FLOP/s in this setting makes a lot of sense, because we know the capabilities of neural networks are closely linked to how much training and inference compute they use, but maybe you see some problem with this also?

So even if the 1e15 "computational equivalence" number is right, AND all of that computation is irreducibly a part of the high-level cognitive algorithm that the brain is carrying out, all that means is that it necessarily takes at least 1e15 FLOP/s to run or simulate a brain at neuron-level fidelity. It doesn't mean that you can't get the same high-level outputs of that brain through some other much more computationally efficient process.

I agree, but even if we think future software progress will enable us to get a GPT-4 level model with 10x smaller inference compute, it still makes sense to care about what inference with GPT-4 costs today. The same is true of the brain.

Separately, I think your sequential tokens per second calculation actually does show that LLMs are already "thinking" (in some sense) several OOM faster than humans? 50 tokens/sec is about 5 lines of code per second, or 18,000 lines of code per hour. Setting aside quality, that's easily 100x more than the average human developer can usually write (unassisted) in an hour, unless they're writing something very boilerplate or greenfield.

Yes, but they are not thinking 7 OOM faster. My claim is not AIs can't think faster than humans, indeed, I think they can. However, current AIs are not thinking faster than humans when you take into account the "quality" of the thinking as well as the rate at which it happens, which is why I think FLOP/s is a more useful measure here than token latency. GPT-4 has higher token latency than GPT-3.5, but I think it's fair to say that GPT-4 is the model that "thinks faster" when asked to accomplish some nontrivial cognitive task.

The main issue with current LLMs (which somewhat invalidates this whole comparison) is that they can pretty much only generate boilerplate or greenfield stuff. Generating large volumes of mostly-useless / probably-nonsense boilerplate quickly doesn't necessarily correspond to "thinking faster" than humans, but that's mostly because current LLMs are only barely doing anything that can rightfully be called thinking in the first place.

Exactly, and the empirical trend is that there is a quality-token latency tradeoff: if you want to generate tokens at random, it's very easy to do that at extremely high speed. As you increase your demands on the quality you want these tokens to have, you must take more time per token to generate them. So it's not fair to compare a model like GPT-4 to the human brain on grounds of "token latency": I maintain that throughput comparisons (training compute and inference compute) are going to be more informative in general, though software differences between ML models and the brain can still make it not straightforward to interpret those comparisons.

Sure, but from the point of view of per token latency that's going to be a similar effect, no?

I think you might have accidentally linked to your comment instead of the LessWrong post you intended to link to.

Ege Erdil3mo4-2

Don't global clock speeds have to go down as die area goes up due to the speed of light constraint?

For instance, if you made a die with 1e15 MAC units and the area scaled linearly, you would be looking at a die that's ~ 2e9 times larger than H100's die size, which is about 1000 mm^2. The physical dimensions of such a die would be around 2 km^2, so the speed of light would limit global clock frequencies to something on the order of c/(1 km) ~= 300 kHz, which is not 1 million times faster than the 1 kHz you attribute to the human brain. If you need multiple round trips for a single clock, the frequencies will get even lower.

Maybe when the clock frequencies get this low, you're dissipating so little heat that you can go 3D without worrying too much about heating issues and that buys you something. Still, your argument here doesn't seem that obvious to me, especially if you consider the fact that one round trip for one clock is extremely optimistic if you're trying to do all MACs at once. Remember that GPT-3 is a sequential model; you can't perform all the ops in one clock because later layers need to know what the earlier layers have computed.

Overall I think your comment here is quite speculative. It may or may not be true, I think we'll see, but people shouldn't treat it as if this is obviously something that's feasible to do.

Ege Erdil3mo367

I think counterexamples are easy to find. For example, chess engines in 1997 could play at the level of top human chess players on consumer hardware, but only if they were given orders of magnitude more time to think than the top humans had available. Around 1997 Deep Blue was of a similar strength to Kasparov, but it had to run on a supercomputer; on commercial hardware chess engines were still only 2400-2500 elo. If you ran them for long enough, though, they would obviously be stronger than even Deep Blue was.

I think the claim that "in every case where we've successfully gotten AI to do a task at all, AI has done that task far far faster than humans" is a tautology because we only say we've successfully gotten AI to do a task when AI can beat the top humans at that task. Nobody said "we got AI to play Go" when AI Go engines were only amateur dan strength, even though they could have equally well said "we got AI to play Go at a superhuman level but it's just very slow".

A non-tautological version might say that the decrease over time in the compute multiplier the AIs need to compete with the top humans is steep, so it takes a short time for the AIs to transition from "much slower than humans" to "much faster than humans" when they are crossing the "human threshold". I think there's some truth to this version of the claim but it's not really due to any advanced serial speed on the part of the AIs.

Ege Erdil3mo2417

If there are people who say “current AIs think many orders of magnitude faster than humans”, then I agree that those people are saying something kinda confused and incoherent, and I am happy that you are correcting them.

Eliezer himself has said (e.g. in his 2010 debate with Robin Hanson) that one of the big reasons he thinks CPUs can beat brains is because CPUs run at 1 GHz while brains run at 1-100 Hz, and the only barrier is that the CPUs are currently running "spreadsheet algorithms" and not the algorithm used by the human brain. I can find the exact timestamp from the video of the debate if you're interested, but I'm surprised you've never heard this argument from anyone before.

There’s a different claim, “we will sooner or later have AIs that can think and act at least 1-2 orders of magnitude faster than a human”. I see that claim as probably true, although I obviously can’t prove it.

I think this claim is too ill-defined to be true, unfortunately, but insofar as it has the shape of something I think will be true it will be because of throughput or software progress and not because of latency.

I agree that the calculation “1 GHz clock speed / 100 Hz neuron firing rate = 1e7” is not the right calculation (although it’s not entirely irrelevant). But I am pretty confident about the weaker claim of 1-2 OOM, given some time to optimize the (future) algorithms.

If the claim here is that "for any task, there will be some AI system using some unspecified amount of inference compute that does the task 1-2 OOM faster than humans", I would probably agree with that claim. My point is that if this is true, it won't be because of the calculation “1 GHz clock speed / 100 Hz neuron firing rate = 1e7”, which as far as I can tell you seem to agree with.

Ege Erdil3mo2315

As far as I know, in every case where we've successfully gotten AI to do a task at all, AI has done that task far far faster than humans. When we had computers that could do arithmetic but nothing else, they were still much faster at arithmetic than humans. Whatever your view on the quality of recent AI-generated text or art, it's clear that AI is producing it much much faster than human writers or artists can produce text/art.

"Far far faster" is an exaggeration that conflates vastly different orders of magnitude with each other. When compared against humans, computers are many orders of magnitude faster at doing arithmetic than they are at generating text: a human can write perhaps one word per second when typing quickly, while an LLM's serial speed of 50 tokens/sec maybe corresponds to 20 words/sec or so. That's just a ~ 1.3 OOM difference, to be contrasted with 10 OOMs or more at the task of multiplying 32-bit integers, for instance. Are you not bothered at all by how wide the chasm between these two quantities seems to be, and whether it might be a problem for your model of this situation?

In addition, we know that this could be faster if we were willing to accept lower quality outputs, for example by having fewer layers in an LLM. There is a quality-serial speed tradeoff, and so ignoring quality and just looking at the speed at which text is generated is not a good thing to be doing. There's a reason GPT-3.5 has smaller per token latency than GPT-4.

Load More