Ege Erdil

If you have any questions for me or just want to talk, feel free to reach out by sending a private message on this site or by sending an e-mail to egeerdil96@gmail.com.

You can also find me on Metaculus at https://www.metaculus.com/accounts/profile/116023/, or on Discord with the username starfall7651.

Wiki Contributions

Comments

Sorted by

This brings up another important point which is that a lot of externalities are impossible to calculate, and therefore such approaches end up fixating on the part that seems calculable without even accounting for (or even noticing) the incalculable part. If the calculable externalities happen to be opposed to larger incalculable externalities, then you can end up worse off than if you had never tried.

I think this is correct as a conditional statement, but I don't think one can deduce the unconditional implication that attempting to price some externalities in domains where many externalities are difficult to price is generally bad.

As applied to the gun externality question, you could theoretically offer a huge payday to the gun shop that sold the firearm used to stop a spree shooting in progress, but you still need a body to count before paying out.

The nice feature of positive payments by the government (instead of fines, i.e. negative payments by the government) is that the judgment-proof defendant problem goes away, so there's no reason to actually make these payments to the gun shop at all: you can just directly pay the person who stops the shooting, which probably provides much better incentives to be a Good Samaritan without the shop trying to pass along this incentive to gun buyers.

I think this applies well to AI, because absent a scenario where gray goo rearranges everyone into paperclips (in which case everyone pays with their life anyway), a lot of the benefits and harms are likely to be illegible. If AI chatbots end up swaying the next election, what is the dollar value we need to stick on someone? How do we know if it's even positive or negative, or if it even happened? If we latch onto the one measurable thing, that might not help.

I don't agree that most of the benefits of AI are likely to be illegible. I expect plenty of them to take the form of new consumer products that were not available before, for example. "A lot of the benefits" is a weaker phrasing and I don't quite know how to interpret it, but I thought it's worth flagging my disagreement with the adjacent phrasing I used.

Ege Erdil0-2

In general, I don't agree with arguments of the form "it's difficult to quantify the externalities so we shouldn't quantify anything and ignore all external effects" modulo concerns about public choice ("what if the policy pursued is not what you would recommend but some worse alternative?"), which are real and serious, though out of the scope of my argument. There's no reason a priori to suppose that any positive or negative effects not currently priced will be of the same order of magnitude.

If you think there are benefits to having a population where most people own guns that are not going to be captured by the incentives of individuals who purchase guns for their own purposes, it's better to try to estimate what that effect size is and then provide appropriate incentives to people who want to purchase guns. The US government pursues such policies in other domains: for example, one of the motivations that led to the Jones Act was the belief that the market would not assign sufficient value to the US maintaining a large domestic shipbuilding industry at peacetime.

In addition, I would dispute that some of these are in fact external effects by necessity. You can imagine some of them being internalized, e.g. by governments offering rewards to citizens who prevent crime (which gives an extra incentive to such people to purchase guns as it would make their interventions more effective). Even the crime prevention benefit could be internalized to a great extent by guns being sold together with a kind of proof-of-ownership that is hard to counterfeit, similar to the effect that open carry policies have in states which have them.

There's a more general public choice argument against this kind of policy, which is that governments lack the incentives to actually discover the correct magnitude of the externalities and then intervene in the appropriate way to maximize efficiency or welfare. I think that's true in general, and in the specific case of guns it might be a reason to not want the government to do anything at all, but in my opinion that argument becomes less compelling when the potential harms of a technology are large enough.

Ege Erdil3-1

If the risk is sufficiently high, then the shops would simply not sell guns to anyone who seemed like they might let their guns be stolen, for example. Note that the shops would still be held liable for any harm that occurs as a result of any gun they have sold, irrespective of whether the buyer was also the perpetrator of the harm.

In practice, the risk of a gun sold to a person with a safe background being used in such an act is probably not that large, so such a measure doesn't need to be taken: the shop can just sell the guns at a somewhat inflated price to compensate for the risk of the gun being misused in some way, and this is efficient. If you were selling e.g. nuclear bombs instead of guns, then you would demand any prospective buyer meet a very high standard of safety before selling them anything, as the expected value of the damages in this case would be much higher.

The police arresting people who steal guns does nothing to fix the problem of shootings if the gun is used shortly after it is stolen, and police are not very good at tracking down stolen items to begin with, so I don't understand the point of your example.

Open source might be viable if it's possible for the producers to add safeguards into the model that cannot be trivially undone by cheap fine-tuning, but yeah, I would agree with that given the current lack of techniques for doing this successfully.

The shop has the ability to invest more in security if they will be held liable for subsequent harm. They can also buy insurance themselves and pass on the cost to people who do purchase guns legally as an additional operating expense.

It is not a tautology.

Can you explain to me the empirical content of the claim, then? I don't understand what it's supposed to mean.

About the rest of your comment, I'm confused about why you're discussing what happens when both chess engines and humans have a lot of time to do something. For example, what's the point of this statement?

My understanding is that it is not true that if you ran computers for a long time that they would beat the human also running for a long time, and that historically, it's been quite the opposite...

I don't understand how this statement is relevant to any claim I made in my comment. Humans beating computers at equal time control is perfectly consistent with the computers being slower than humans. If you took a human and slowed them down by a factor of 10, that's the same pattern you would see.

Are you instead trying to find examples of tasks where computers were beaten by humans when given a short time to do the task but could beat the humans when given a long time to do the task? That's a very different claim from "in every case where we've successfully gotten AI to do a task at all, AI has done that task far far faster than humans".

Yes, that's what I'm trying to say, though I think in actual practice the numbers you need would have been much smaller for the Go AIs I'm talking about than they would be for the naive tree search approach.

Sure, but in that case I would not say the AI thinks faster than humans, I would say the AI is faster than humans at a specific range of tasks where the AI can do those tasks in a "reasonable" amount of time.

As I've said elsewhere, there is a quality or breadth vs serial speed tradeoff in ML systems: a system that only does one narrow and simple task can do that task at a high serial speed, but as you make systems more general and get them to handle more complex tasks, serial speed tends to fall. The same logic that people are using to claim GPT-4 thinks faster than humans should also lead them to think a calculator thinks faster than GPT-4, which is an unproductive way to use the one-dimensional abstraction of "thinking faster vs. slower".

You might ask "Well, why use that abstraction at all? Why not talk about how fast the AIs can do specific tasks instead of trying to come up with some general notion of if their thinking is faster or slower?" I think a big reason is that people typically claim the faster "cognitive speed" of AIs can have impacts such as "accelerating the pace of history", and I'm trying to argue that the case for such an effect is not as trivial to make as some people seem to think.

True, but isn't this almost exactly analogously true for neuron firing speeds? The corresponding period for neurons (10 ms - 1 s) does not generally correspond to the timescale of any useful cognitive work or computation done by the brain.

Yes, which is why you should not be using that metric in the first place.

But even the top-line number is (at least theoretically) a very concrete measure of something that you can actually get out of the system. In contrast, when used in "computational equivalence" estimates of the brain, FLOP/s are (somewhat dubiously, IMO) repurposed as a measure of what the system is doing internally.

Will you still be saying this if future neural networks are running on specialized hardware that, much like the brain, can only execute forward or backward passes of a particular network architecture? I think talking about FLOP/s in this setting makes a lot of sense, because we know the capabilities of neural networks are closely linked to how much training and inference compute they use, but maybe you see some problem with this also?

So even if the 1e15 "computational equivalence" number is right, AND all of that computation is irreducibly a part of the high-level cognitive algorithm that the brain is carrying out, all that means is that it necessarily takes at least 1e15 FLOP/s to run or simulate a brain at neuron-level fidelity. It doesn't mean that you can't get the same high-level outputs of that brain through some other much more computationally efficient process.

I agree, but even if we think future software progress will enable us to get a GPT-4 level model with 10x smaller inference compute, it still makes sense to care about what inference with GPT-4 costs today. The same is true of the brain.

Separately, I think your sequential tokens per second calculation actually does show that LLMs are already "thinking" (in some sense) several OOM faster than humans? 50 tokens/sec is about 5 lines of code per second, or 18,000 lines of code per hour. Setting aside quality, that's easily 100x more than the average human developer can usually write (unassisted) in an hour, unless they're writing something very boilerplate or greenfield.

Yes, but they are not thinking 7 OOM faster. My claim is not AIs can't think faster than humans, indeed, I think they can. However, current AIs are not thinking faster than humans when you take into account the "quality" of the thinking as well as the rate at which it happens, which is why I think FLOP/s is a more useful measure here than token latency. GPT-4 has higher token latency than GPT-3.5, but I think it's fair to say that GPT-4 is the model that "thinks faster" when asked to accomplish some nontrivial cognitive task.

The main issue with current LLMs (which somewhat invalidates this whole comparison) is that they can pretty much only generate boilerplate or greenfield stuff. Generating large volumes of mostly-useless / probably-nonsense boilerplate quickly doesn't necessarily correspond to "thinking faster" than humans, but that's mostly because current LLMs are only barely doing anything that can rightfully be called thinking in the first place.

Exactly, and the empirical trend is that there is a quality-token latency tradeoff: if you want to generate tokens at random, it's very easy to do that at extremely high speed. As you increase your demands on the quality you want these tokens to have, you must take more time per token to generate them. So it's not fair to compare a model like GPT-4 to the human brain on grounds of "token latency": I maintain that throughput comparisons (training compute and inference compute) are going to be more informative in general, though software differences between ML models and the brain can still make it not straightforward to interpret those comparisons.

Sure, but from the point of view of per token latency that's going to be a similar effect, no?

Load More