All of Max H's Comments + Replies

My point is that there is a conflict for divergent series though, which is why 1 + 2 + 3 + … = -1/12 is confusing in the first place. People (wrongly) expect the extension of + and = to infinite series to imply stuff about approximations of partial sums and limits even when the series diverges.

My own suggestion for clearing up this confusion is that we should actually use less overloaded / extended notation even for convergent sums, e.g.  seems just as readable as the usual  and  notation.

 In precisely the same sense that we can write


despite that no real-world process of "addition" involving infinitely many terms may be performed in a finite number of steps, we can write



Well, not precisely. Because the first series converges, there's a whole bunch more we can practically do with the equivalence-assignment in the first series, like using it as an approximation for the sum of any finite number of terms. -1/12 is a terrible approximation for any of the partial sums of the second series.

IMO the use o... (read more)

2Shankar Sivarajan6d
The point I was trying to make is that we already have perfectly good notation for sums, namely the + and = signs, that we've already extended well beyond the (apocryphal) original use of adding finite sets of positive integers. As long as there's no conflict in meaning (where saying "there's no answer" or "it's divergent" doesn't count) extending it further is fine.

True, but isn't this almost exactly analogously true for neuron firing speeds? The corresponding period for neurons (10 ms - 1 s) does not generally correspond to the timescale of any useful cognitive work or computation done by the brain.

Yes, which is why you should not be using that metric in the first place.

Well, clock speed is a pretty fundamental parameter in digital circuit design. For a fixed circuit, running it at a 1000x slower clock frequency means an exactly 1000x slowdown. (Real integrated circuits are usually designed to operate in a specific ... (read more)

The clock speed of a GPU is indeed meaningful: there is a clock inside the GPU that provides some signal that's periodic at a frequency of ~ 1 GHz. However, the corresponding period of ~ 1 nanosecond does not correspond to the timescale of any useful computations done by the GPU.

True, but isn't this almost exactly analogously true for neuron firing speeds? The corresponding period for neurons (10 ms - 1 s) does not generally correspond to the timescale of any useful cognitive work or computation done by the brain.

The human brain is estimated to do the comp

... (read more)
4Ege Erdil1mo
Yes, which is why you should not be using that metric in the first place. Will you still be saying this if future neural networks are running on specialized hardware that, much like the brain, can only execute forward or backward passes of a particular network architecture? I think talking about FLOP/s in this setting makes a lot of sense, because we know the capabilities of neural networks are closely linked to how much training and inference compute they use, but maybe you see some problem with this also? I agree, but even if we think future software progress will enable us to get a GPT-4 level model with 10x smaller inference compute, it still makes sense to care about what inference with GPT-4 costs today. The same is true of the brain. Yes, but they are not thinking 7 OOM faster. My claim is not AIs can't think faster than humans, indeed, I think they can. However, current AIs are not thinking faster than humans when you take into account the "quality" of the thinking as well as the rate at which it happens, which is why I think FLOP/s is a more useful measure here than token latency. GPT-4 has higher token latency than GPT-3.5, but I think it's fair to say that GPT-4 is the model that "thinks faster" when asked to accomplish some nontrivial cognitive task. Exactly, and the empirical trend is that there is a quality-token latency tradeoff: if you want to generate tokens at random, it's very easy to do that at extremely high speed. As you increase your demands on the quality you want these tokens to have, you must take more time per token to generate them. So it's not fair to compare a model like GPT-4 to the human brain on grounds of "token latency": I maintain that throughput comparisons (training compute and inference compute) are going to be more informative in general, though software differences between ML models and the brain can still make it not straightforward to interpret those comparisons.

I haven't read every word of the 200+ comments across all the posts about this, but has anyone considered how active heat sources in the room could confound / interact with efficiency measurements that are based only on air temperatures? Or be used to make more accurate measurements, using a different (perhaps nonstandard) criterion for efficiency?

Maybe from the perspective of how comfortable you feel, the only thing that matters is air temperature.

But consider an air conditioner that cools a room with a bunch of servers or space heaters in it to an equili... (read more)

Part of this is that I don't share other people's picture about what AIs will actually look like in the future. This is only a small part of my argument, because my main point is that that we should use analogies much less frequently, rather than switch to different analogies that convey different pictures.

You say it's only a small part of your argument, but to me this difference in outlook feels like a crux. I don't share your views of what the "default picture" probably looks like, but if I did, I would feel somewhat differently about the use of analogie... (read more)

a position of no power and moderate intelligence (where it is now)

Most people are quite happy to give current AIs relatively unrestricted access to sensitive data, APIs, and other powerful levers for effecting far-reaching change in the world. So far, this has actually worked out totally fine! But that's mostly because the AIs aren't (yet) smart enough to make effective use of those levers (for good or ill), let alone be deceptive about it.

To the degree that people don't trust AIs with access to even more powerful levers, it's usually because they fear ... (read more)

Is it "inhabiting the other's hypothesis" vs. "finding something to bet on"?

Yeah, sort of. I'm imagining two broad classes of strategy for resolving an intellectual disagreement:

  • Look directly for concrete differences of prediction about the future, in ways that can be suitably operationalized for experimentation or betting. The strength of this method is that it almost-automatically keeps the conversation tethered to reality; the weakness is that it can lead to a streetlight effect of only looking in places where the disagreement can be easily operationali
... (read more)

The Cascading Style Sheets (CSS) language that web pages use for styling HTML is a pretty representative example of surprising Turing Completeness:


Haha. Perhaps higher entities somewhere in the multiverse are emulating human-like agents on ever more exotic and restrictive computing substrates, the way humans do with Doom and Mario Kart.

(Front page of 5-D aliens' version of Hacker News: "I got a reflective / self-aware / qualia-experiencing consciousness running on a recycled first-gen smart toaster".)

Semi-related to the idea of substrate ultimately n... (read more)

ok, so not attempting to be comprehensive:

  • Energy abundance...

I came up with a similar kind of list here!

I appreciate both perspectives here, but I lean more towards kave's view: I'm not sure how much overall success hinges on whether there's an explicit Plan or overarching superstructure to coordinate around.

I think it's plausible that if a few dedicated people / small groups manage to pull off some big enough wins in unrelated areas (e.g. geothermal permitting or prediction market adoption), those successes could snowball in lots of different directions p... (read more)

I think one of the potential cruxes here is how many of the necessary things are fun or difficult in the right way. Like, sure, it sounds neat to work at a geothermal startup and solve problems, and that could plausibly be better than playing video games. But, does lobbying for permitting reform sound fun to you? The secret of video games is that all of the difficulty is, in some deep sense, optional, and so can be selected to be interesting. ("What is drama, but life with the dull bits cut out?") The thing that enlivens the dull bits of life is the bigger meaning, and it seems to me like the superstructure is what makes the bigger meaning more real and less hallucinatory. This seems possible to me, but I think most of the big successes that I've seen have looked more like there's some amount of meta-level direction. Like, I think Elon Musk's projects make more sense if your frame is "someone is deliberately trying to go to Mars and fill out the prerequisites for getting there". Lots of historical eras have people doing some sort of meta-level direction like this. But also we might just remember the meta-level direction that was 'surfing the wave' instead of pushing the ocean, and many grand plans have failed.


Does anyone who knows more neuroscience and anatomy than me know if there are any features of the actual process of humans learning to use their appendages (e.g. an infant learning to curl / uncurl their fingers) that correspond to the example of the robot learning to use its actuator?

Like, if we assume certain patterns of nerve impulses represent different probabilities, can we regard human hands as "friendly actuators", and the motor cortex as learning the fix points (presumably mostly during infancy)?

That would be really cool.

But that's not really where we are at---AI systems are able to do an increasingly good job of solving increasingly long-horizon tasks. So it just seems like it should obviously be an update, and the answer to the original question


One reason that current AI systems aren't a big update about this for me is that they're not yet really automating stuff that couldn't in-principle be automated with previously-existing technology. Or at least the kind of automation isn't qualitatively different.

Like, there's all sorts of technologies that enable increasing ... (read more)

Yeah, I don't think current LLM architectures, with ~100s of attention layers or whatever, are actually capable of anything like this.

But note that the whole plan doesn't necessarily need to fit in a single forward pass - just enough of it to figure out what the immediate next action is. If you're inside of a pre-deployment sandbox (or don't have enough situational awareness to tell), the immediate next action of any plan (devious or not) probably looks pretty much like "just output a plausible probability distribution on the next token given the current c... (read more)

A language model itself is just a description of a mathematical function that maps input sequences to output probability distributions on the next token.

Most of the danger comes from evaluating a model on particular inputs (usually multiple times using autoregressive sampling) and hooking up those outputs to actuators in the real world (e.g. access to the internet or human eyes).

A sufficiently capable model might be dangerous if evaluated on almost any input, even in very restrictive environments, e.g. during training when no human is even looking at the o... (read more)

4Alexander Gietelink Oldenziel3mo
I notice I am confused by this. Seems implausible that a LLM can execute a devious x-risk plan in a single forward-pass based on a wrong prompt.

Related to We don’t trade with ants: we don't trade with AI.

The original post was about reasons why smarter-than-human AI might (not) trade with us, by examining an analogy between humans and ants.

But current AI systems actually seem more like the ants (or other animals), in the analogy of a human-ant (non-)trading relationship.

People trade with OpenAI for access to ChatGPT, but there's no way to pay a GPT itself to get it do something or perform better as a condition of payment, at least in a way that the model itself actually understands and enforces. (W... (read more)

Also seems pretty significant:

As a part of this transition, Greg Brockman will be stepping down as chairman of the board and will remain in his role at the company, reporting to the CEO.

The remaining board members are:

OpenAI chief scientist Ilya Sutskever, independent directors Quora CEO Adam D’Angelo, technology entrepreneur Tasha McCauley, and Georgetown Center for Security and Emerging Technology’s Helen Toner.

Has anyone collected their public statements on various AI x-risk topics anywhere?

Adam D'Angelo via X:

Oct 25

This should help access to AI diffuse throughout the world more quickly, and help those smaller researchers generate the large amounts of revenue that are needed to train bigger models and further fund their research.

Oct 25

We are especially excited about enabling a new class of smaller AI research groups or companies to reach a large audience, those who have unique talent or technology but don’t have the resources to build and market a consumer application to mainstream consumers.

Sep 17

This is a pretty good articulation of the uni... (read more)

Thanks, edited.

Has anyone collected their public statements on various AI x-risk topics anywhere?

A bit, not shareable.

Helen is an AI safety person. Tasha is on the Effective Ventures board. Ilya leads superalignment. Adam signed the CAIS statement

I couldn't remember where from, but I know that Ilya Sutskever at least takes x-risk seriously. I remember him recently going public about how failing alignment would essentially mean doom. I think it was published as an article on a news site rather than an interview, which are what he usually does. Someone with a way better memory than me could find it.

EDIT: Nevermind, found them.

But as a test, may I ask what you think the x-axis of the graph you drew is? Ie: what are the amplitudes attached to?

Position, but it's not meant to be an actual graph of a wavefunction pdf; just a way to depict how the concepts can be sliced up in a way I can actually draw in 2 dimensions.

If you do treat it as a pdf over position, a more accurate way to depict the "world" concept might be as a line which connects points on the diagram for each time step. So for a fixed time step, a world is a single point on the diagram, representing a sample from the pdf defined by the wavefunction at that time.

"position" is nearly right. The more correct answer would be "position of one photon".  If you had two electrons, say, you would have to consider their joint configuration. For example, one possible wavefunction would look like the following, where the blobs represent high amplitude areas: This is still only one dimensional: the two electrons are at different points along a line. I've entangled them, so if electron 1 is at position P, electron 2 can't be.  Now, try and point me to where electron 1 is on the graph above.  You see, I'm not graphing electrons here, and neither were you. I'm graphing the wavefunction. This is where your phrasing seems a little weird: you say the electron is the collection of amplitudes you circled: but those amplitudes are attached to configurations saying "the electron is at position x1" or "the electron is at position x2". It seems circular to me. Why not describe that lump as "a collection of worlds where the electron is in a similar place"?  If you have N electrons in a 3d space, the wavefunction is not a vector in 3d space (god I wish, it would make my job a lot easier). It's a vector in 3N+1 dimensions, like the following: where r1, r2, etc are pointing to the location of electron 1, 2, 3, etc, and each possible configuration of electron 1 here, electron 2 there, etc, has an amplitude attached, with configurations that are more often encountered experimentally empirically having higher amplitudes. 

Here's a crude Google Drawing of t = 0 to illustrate what I mean:



Both the concept of a photon and the concept of a world are abstractions on top of what is ultimately just a big pile of complex amplitudes; illusory in some sense.

I agree that talking in terms of many worlds ("within the context of world A...") is normal and natural. But sometimes it makes sense to refer to and name concepts which span across multiple (conceptual) worlds.

I'm not claiming the conceptual boundaries I've drawn or terminology I've used in the diagram above are standa... (read more)

Nice graph!  But as a test, may I ask what you think the x-axis of the graph you drew is? Ie: what are the amplitudes attached to? I think you've already agreed (or at least not objected to) saying that the detector "found the photon" is fine within the context of world A. I assume you don't object to me saying that I will find the detector flashing with probability 0.5. And I assume you don't think me and the detector should be treated differently. So I don't think there's any actual objection left here, you just seem vaguely annoyed that I mentioned the empirical fact that amplitudes can be linked to probabilities of outcomes. I'm not gonna apologise for that. 

I don't think that will happen as a foregone conclusion, but if we pour resources into improved methods of education (for children and adults), global health, pronatalist policies in wealthy countries, and genetic engineering, it might at least make a difference. I wouldn't necessarily say any of this is likely to work or even happen, but it seems at least worth a shot.

4Seth Herd3mo
I was thinking more of the memetic spread of "wisdom" - principles that make you effectively smarter in important areas. Rationality is one vector, but there are many others. I realize the internet is making us dumber on average in many ways, but I think there's a countercurrent of spreading good advice that's recognized as good advice. Anyone with an internet connection can now get smarter by watching attractively-packaged videos. There's a lot of misleading and confusing stuff, but my impression is that a lot of the cream does rise to the top if you're actively searching for wisdom in any particular area.

This post received a lot of objections of the flavor that many of the ideas and technologies I am a fan of either wont't work or wouldn't make a difference if they did.

I don't even really disagree with most of these objections, which I tried to make clear up front with apparently-insufficient disclaimers in the intro that include words like "unrealistic", "extremely unlikely", and "speculative".

Following the intro, I deliberately set aside my natural inclination towards pessimism and focused on the positive aspects and possibilities of non-AGI technology.

H... (read more)

If the photon were only a quanta of energy which is entirely absorbed by the detector that actually fires, how could it have any causal effects (e.g. destructive interference) on the pathway where it isn't detected?

OTOH, if your definition of "quanta of energy" includes the complex amplitude in the unmeasured path, then I think it's more accurate to say that the detector finds or measures a component of the photon, rather than that it detects the photon itself. Why should the unmeasured component be any less real or less part of the photon than the measure... (read more)

Okay, let me break in down in terms of actual states, and this time, let's add in the actual detection mechanism, say an electron in a potential well. Say the detector is in the ground state energy, E=0, and the absorption of a photon will bump it up to the next highest state, E=1. We will place this detector in path A, but no detector in path B.  At time t = 0, our toy wavefunction is: 1/sqrt2 |photon in path A, detector E=0> + 1/sqrt2 |photon in path B, detector E=0> If the photon in A collides with the detector at time t =1, then at time t=2, our evolved wavefunction is: 1/sqrt2 |no free photon, detector E=1> + 1/sqrt2 |photon in path B, detector E=0> Within the context of world A, a photon was found by the detector. This is a completely normal way to think and talk about this.    I think it's straight up wrong to say "the photon is in the detector and in path B". Nature doesn't label photons, and it doesn't distinguish between them. And what is actually in world A is an electron in a higher energy state: it would be weird to say it "contains" a photon inside of it. Quantum mechanics does not keep track of individual objects, it keeps track of configurations of possible worlds, and assigns amplitudes to each possible way of arranging everything.  

I'm a many-worlder, yes. But my objection to "finding a photon" is actually that it is an insufficiently reductive treatment of wave-particle duality - a photon can sometimes behave like a little billiard ball, and sometimes like a wave. But that doesn't mean photons themselves are sometimes waves and sometimes particles - the only thing that a photon can be that exhibits those different behaviors in different contexts is the complex amplitudes themselves.

The whole point of the theory is that detectors and humans are treated the same way. In one world, t

... (read more)
What part of "finding a photon" implies that the photon is a billiard ball? Wave-particle duality aside, a photon is a quanta of energy: the detector either finds that packet or it doesn't (or in many worlds, one branched detector finds it and the other branched detector doesn't).  I'm interested to hear more about how you interpret the "realness" of different branches. Say there is an electron in one of my pinky fingers that is in a superposition of spin up and spin down. Are there correspondingly two me's, one with with pinky electron up and one with pinky electron down? Or is there a single me, described by the superposition of pinky electrons? 
An important point about detecting the photon is that the detector absorbs all the energy of the photon: it's not as if it is classically sampling part of a distributed EM field. That's still true if the photon is never a point particle.

I'm not updating about what's actually likely to happen on Earth based on dath ilan.

It seems uncontroversially true that a world where the median IQ was 140 or whatever would look radically different (and better) than the world we currently live in. We do not in fact, live in such a world.

But taking a hypothetical premise and then extrapolating what else would be different if the premise were true, is a generally useful tool for building understanding and pumping on intuitions in philosophy, mathematics, science, and forecasting.

If you say "but the premise is false!!111!" you're missing the point.

2Seth Herd3mo
So I think the remaining implied piece is that humans are getting smarter? I actually think we are and will continue to, in the relevant ways. But that's quite debatable.

What you should have said, therefore, is "Dath ilan is fiction; it's debatable whether the premises of the world would actually result in the happy conclusion depicted. However, I think it's probably directionally correct -- it does seem to me that if Eliezer was the median, the world would be dramatically better overall, in roughly the ways depicted in the story."

The world population is set to decline over the course of this century.

This is another problem that seems very possible to solve or at least make incremental progress on without AGI, if there's a will to actually try. Zvi has written a bunch of stuff about this.

I mildly object to the phrase "it will find a photon". In my own terms, I would say that you will observe the detector going off 50% of the time (with no need to clarify what that means in terms of the limit of a large # of experiments), but the photon itself is the complex amplitudes of each configuration state, which are the same every time you run the experiment.

Note that I myself am taking a pretty strong stance on the ontology question, which you might object to or be uncertain about.

My larger point is that if you (or other readers of this post) don't... (read more)

I am assuming you are referring to the many worlds interpretation of quantum mechanics, where superpositions extend up to the human level, and the alternative configurations correspond to real, physical worlds with different versions of you that see different results on the detector.  Which is puzzling, because then why would you object to "the detector finding a photon"? The whole point of the theory is that detectors and humans are treated the same way. In one world, the detector finds the photon, and then spits out a result, and then one You sees the result, and in a different world, the detector finds the photon, spits out the other result, and a different result is seen. There is no difference between "you" and "it" here.  As for the photon "being" the complex amplitudes... That doesn't sound right to me. Would you say that "you" are the complex amplitudes assigned to world 1 and world 2? It seems more accurate to say that there are two yous, in two different worlds (or many more).  Assuming you are a many worlder, may I ask which solution to the Born probabilities you favour? 

We never see these amplitudes directly, we infer them from the fact that they give correct probabilities via the Born rule. Or more specifically, this is the formula that works. That this formula works is an empirical fact, all the interpretations and debate are a question of why this formula works. 

Sure, but inferring underlying facts and models from observations is how inference in general works; it's not specific to quantum mechanics. Probability is in the Mind, even when those probabilities come from applying the Born rule.

Analogously, you could t... (read more)

The essay Probability is in the Mind doesn't prove that probability is only in the mind.
I'm a little confused by what your objection is. I'm not trying to stake out an interpretation here, I'm describing the calculation process that allows you to make predictions about quantum systems. The ontology of the wavefunction is a matter of heated debate, I am undecided on it myself.  Would you object to the following modification:

Climate change is exactly the kind of problem that a functional civilization should be able to solve on its own, without AGI as a crutch.

Until a few years ago, we were doing a bunch of geoengineering by accident, and the technology required to stop emitting a bunch of greenhouse gases in the first place (nuclear power) has been mature for decades.

I guess you could have an AGI help with lobbying or public persuasion / education. But that seems like a very "everything looks like a nail" approach to problem solving, before you even have the supposed tool (AGI) to actually use.

8O O4mo
We use fossil fuels for a lot more than energy and there’s more to climate change than fossil fuel emissions. Energy usage is roughly 75% of emissions. 25% of oil is used for manufacturing. My impression is we are way over targets for fossil fuel usage that would result in reasonable global warming. Furthermore, a lot of green energy will be a hard sell to developing nations. Maybe replacing as much oil with nuclear as politically feasible reduces it but does it reduce it enough? Current models[1] assume we invent carbon capture technology somewhere down the line, so things are looking dire. It’s clear we have this idea that we will partially solve this issue in time with engineering, and it does seem that way if you look at history. However, recent history has the advantage that there was constant population growth with an emphasis on new ideas and entrepreneurship. If you look at what happened to a country like Japan, when age pyramids shifted, you can see that the country gets stuck in backward tech as society restructures itself to take care of the elderly. So I think any assumptions that we will have exponential technological progress are “trend chasing” per se. A lot of our growth curves almost require mass automation or AGI to work. Without that you probably get stagnation. Economists have projected this in 2015 and it seems not much has changed since. [2]. Now [3]. I think it’s fine to have the opinion that AGI risk of failure could be higher than the risks from stagnation and other existential risks, but I also think having an unnecessarily rose tinted view of progress isn’t accurate. For example, you may be overestimating AGI risk relative to other risks in that case. 1. 2. 3. GDP growth%2C 202

Ah, you're right that that the surrounding text is not an accurate paraphrase of the particular position in that quote.

The thing I was actually trying to show with the quotes is "AGI is necessary for a good future" is a common view, but the implicit and explicit time limits that are often attached to such views might be overly short. I think such views (with attached short time limits) are especially common among those who oppose an AI pause.

I actually agree that AGI is necessary (though not sufficient) for a good future eventually. If I also believed that... (read more)

Indeed, when you add an intelligent designer with the ability to precisely and globally edit genes, you've stepped outside the design space available to natural selection, and you can end up with some pretty weird results! I think you could also use gene drives to get an IGF-boosting gene to fixation much faster than would occur naturally.

I don't think gene drives are the kind of thing that would ever occur via iterative mutation, but you can certainly have genetic material with very high short-term IGF that eventually kills its host organism or causes extinction of its host species.

Some people will end up valuing children more, for complicated reasons; other people will end up valuing other things more, again for complicated reasons.

Right, because somewhere pretty early in evolutionary history, people (or animals) which valued stuff other than having children for complicated reasons eventually had more descendants than those who didn't. Probably because wanting lots of stuff for complicated reasons (and getting it) is correlated with being smart and generally capable, which led to having more descendants in the long run.

If evolution ... (read more)

1Nora Belrose4mo
CRISPR gene drives reach fixation even faster, even if they seriously harm IGF.

Another possible configuration of the untrusted smart model / trusted weak model setup is to have the weak model be the "driver" which makes most or all of the relevant choices (in an agent scaffold, say) and relies on the untrusted model (or humans) only for hints.

For example, the untrusted model could be used only to provide a probability distribution over the immediate next token in some context, which is supplied to the weaker model as a hint. The weaker model can only make use of the hint if it can understand the purpose of the hint, and explain the r... (read more)

If you now put a detector in path A , it will find a photon with probability ( ), and same for path B. This means that there is a 50% chance of the configuration |photon in path A only>, and 50% chance of the configuration |photon in path B only>. The arrow direction still has no effect on the probability.

Isn't this kind of assertion implicitly taking a pretty strong stance on a particular philosophical interpretation?

We have some observations (counts of how many times each detector went off in past experiments), and a theory which expl... (read more)

Apologies for the late reply, but thank you for your detailed response.  Responding to your objection to my passage, I disagree, but I may edit it slightly to be clearer.  I was simply trying to point out the empirical fact that if you put a detector in path A and a detector in path B, and repeat the experiment a bunch of times, you will find the photon in detector A 50% of the time, and the photon in detector B 50% of the time. If the amplitudes had different values, you would empirically find them in different proportions, as given by the squared amplitudes.  I don't find these probabilities to be an "afterthought". This is the whole point of the theory, and the reason we consider quantum physics to be "true". We never see these amplitudes directly, we infer them from the fact that they give correct probabilities via the Born rule. Or more specifically, this is the formula that works. That this formula works is an empirical fact, all the interpretations and debate are a question of why this formula works.  Regarding the defense of the original sequence, I'm sorry, but incorrect math is incorrect math.  The people who figured out the mistake in the comments figured it out from other sources. If anything, it is even more damning that people pointed the mistake out 10 years ago, and it still hasn't been fixed. For every person who figured out the problem or sifted through hundreds of comments to figure out the issue, there are dozens more who accepted the incorrect framework, or decided they were too dumb to understand the math when it was the author who was wrong.   My problem is that the post is misinforming people. I will make no apology for being harsh about that.  I will restrain my opinion on Eliezers other quantum posts for a future post when I tackle the overstated case for many worlds theories. 
Some readers figuring out what's going on is consistent with many of them being unnecessarily confused.
Max H4moΩ10160

That does clarify, thanks.

Response in two parts: first, my own attempt at clarification over terms / claims. Second, a hopefully-illustrative sketch / comparison for why I am skeptical that current GPTs having anything properly called a "motivational structure", human-like or otherwise, and why I think such skepticism is not a particularly strong positive claim about anything in particular.

The clarification:

At least to me, the phrase "GPTs are [just] predictors" is simply a reminder of the fact that the only modality available to a model itself is that it ... (read more)

This is an excellent reply, thank you!  I think I broadly agree with your points. I think I'm more imagining "similarity to humans" to mean "is well-described by shard theory; eg its later-network steering circuits are contextually activated based on a compositionally represented activation context." This would align with greater activation-vector-steerability partway through language models (not the only source I have for that). However, interpreting GPT: the logit lens and eg DoLA suggests that predictions are iteratively refined throughout the forward pass, whereas presumably shard theory (and inner optimizer threat models) would predict most sophisticated steering happens later in the network.

I don't dispute any of that, but I also don't think RLHF is a workable method for building or aligning a powerful AGI.

Zooming out, my original point was that there are two problems humanity is facing, quite different in character but both very difficult:

  • a coordination / governance problem, around deciding when to build AGI and who gets to build it
  • a technical problem, around figuring out how to build an AGI that does what the builder wants at all.

My view is that we are currently on track to solve neither of those problems. But if you actually consider what ... (read more)

How does a solution to the above solve the coordination/governance problem?

In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory.

The premise of this hypothetical is that all the technical problems are solved - if an AI lab wants to build an AI to pursue the collective CEV of humanity or whatever, they can just get it to do that. Maybe they'll settle on something other than CEV that is a bit better or worse or just different, but my point was that I don't expect them to choose something ridiculous like "our CEO becomes god-emperor forever" or whatever.

I'm not sure they

... (read more)

In the Manhattan project, there was no disagreement between the physicists, the politicians / generals, and the actual laborers who built the bomb, on what they wanted the bomb to do. They were all aligned around trying to build an object that would create the most powerful explosion possible.

As for who had control over the launch button, of course the physicists didn't have that, and never expected to. But they also weren't forced to work on the bomb; they did so voluntarily and knowing they wouldn't be the ones who got any say in whether and how it would... (read more)

Well these systems aren't programmed. Researchers work on architecture and engineering, goal content is down to the RLHF that is applied and the wishes of the user(s), and the wishes of the user(s) are determined by market forces, user preferences, etc. And user preferences may themselves be influenced by other AI systems. Closed source models can have RLHF and be delivered via an API, but open source models will not be far behind at any given point in time. And of course prompt injection attacks can bypass the RLHF on even closed source models. The decisions about what RLHF to apply on contentious topics will come from politicians and from the leadership of the companies, not from the researchers. And politicians are influenced by the media and elections, and company leadership is influenced by the market and by cultural trends. Where does the chain of control ultimately ground itself? Answer: it doesn't. Control of AI in the current paradigm is floating. Various players can influence it, but there's no single source of truth for "what's the AI's goal".
4M. Y. Zuo4mo
Where did you learn of this? From what I know it was the opposite, there were so many disagreements, even just among the physicists, that they decided to duplicate nearly all effort to produce two different types of nuclear device designs, the gun type and the implosion type, simultaneously. e.g.  both plutonium and uranium processing supply chains were set up at massive expense, and later environmental damage,  just in case one design didn't work.
In that they wanted the bomb to explode? I think the analogous level of control for AI would be unsatisfactory. I'm not sure they thought this; I think many expected that by playing along they would have influence later. Tech workers today often seem to care a lot about how products made by their companies are deployed.

A bit of anecdotal impressions, yes, but mainly I just think that in humans being smart, conscientious, reflective, etc. enough to be the brightest researcher a big AI lab is actually pretty correlated with being Good (and also, that once you actually solve the technical problems, it doesn't take that much Goodness to do the right thing for the collective and not just yourself).

Or, another way of looking at it, I find Scott Aaronson's perspective convincing, when it is applied to humans. I just don't think it will apply at all to the first kinds of AIs that people are actually likely to build, for technical reasons.

8Roman Leventov4mo
I think there are way more transhumanists and post-humanists at AGI labs than you imagine. Richard Sutton is a famous example (btw, I've just discovered that he moved from DeepMind to Keen Technologies, John Carmack's venture), but I believe there are many more of them, but they disguise themselves for political reasons.

And I'm saying that, assuming all the technical problems are solved, AI researchers would be the ones in control, and I (mostly) trust them to just not do things like build an AI that acts like an invasive species, or argues for its own rights, or build something that actually deserves such rights.

Maybe some random sociologists on Twitter will call for giving AIs rights, but in the counterfactual world where AI researchers have fine control of their own creations, I expect no one in a position to make decisions on the matter to give such calls any weight.

E... (read more)

No. You have simplistic and incorrect beliefs about control. If there are a bunch of companies (Deepmind, Anthropic, Meta, OpenAI, ...) and a bunch of regulation efforts and politicians who all get inputs, then the AI researchers will have very little control authority, as little perhaps as the physicists had over the use of the H-bomb. Where does the control really reside in this system? Who made the decision to almost launch a nuclear torpedo in the Cuban Missile Crisis?
What is the basis of this trust? Anecdotal impressions of a few that you know personally in the space, opinion polling data, something else?

In order for humans to survive the AI transition I think we need to succeed on the technical problems of alignment (which are perhaps not as bad as Less Wrong culture made them out to be), and we also need to "land the plane" of superintelligent AI on a stable equilibrium where humans are still the primary beneficiaries of civilization, rather than a pest species to be exterminated or squatters to be evicted.

Do we really need both? It seems like either a technical solution OR competent global governance would mostly suffice.

Actually-competent global govern... (read more)

5Nathan Helm-Burger4mo
Do we need both? Perhaps not, in the theoretical case where we get a perfect instance of one. I disagree that we should aim for one or the other, because I don't expect we will reach anywhere near perfection on either. I think we should expect to have to muddle through somehow with very imperfect versions of each. I think we'll likely see some janky poorly-organized international AI governance attempt combined with just good enough tool AI and software and just-aligned-enough sorta-general AI to maintain an uneasy temporary state of suppressing rogue AI explosions. How long will we manage to stay on top under such circumstances? Hopefully long enough to realize the danger we're in and scrape together some better governance and alignment solutions. Edit: I later saw that Max H said he thought we should pursue both. So we disagree less than I thought. There is some difference, in that I still think we can't really afford a failure in either category. Mainly because I don't expect us to do well enough in either for that single semi-success to carry us through.
As I said in the article, technically controllable ASIs are the equivalent of an invasive species which will displace humans from Earth politically, economically and militarily.

Without governance you're stuck trusting that the lead researcher (or whoever is in control) turns down near infinite power and instead act selflessly. That seems like quite the gamble.

But given that this example is so controversial, even if it were right why would you use it -- at least, why would you use it if you had any other example at all to turn to?

Humans are the only real-world example we have of human-level agents, and natural selection is the only process we know of for actually producing them.

SGD, singular learning theory, etc. haven't actually produced human-level minds or a usable theory of how such minds work, and arguably haven't produced anything that even fits into the natural category of minds at all, yet. (Maybe they w... (read more)

Hmm, I'm in favor of an immediate stop (and of people being more honest about their beliefs) but in my experience the lying / hiding frame doesn't actually describe many people.

This is maybe even harsher than what you said in some ways, but to me it feels more like even very bright alignment researchers are often confused and getting caught in shell games with alignment, postulating that we'll be able to build "human level" AI, which somehow just doesn't do a bunch of bad things that smart humans are clearly capable of. And if even the most technical peopl... (read more)

A very recent post that might add some concreteness to my own views: Human wanting

I think many of the bullets in that post describe current AI systems poorly or not at all. So current AI systems are either doing something entirely different from human wanting, or imitating human wanting rather poorly.

I lean towards the former, but I think some of the critical points about prosaic alignment apply in either case.

You might object that "having preferences" or "caring at all" are a lot simpler than the concept of human wanting that Tsvi is gesturing at in that ... (read more)

Max H4moΩ23-1

Taking my own stab at answers to some of your questions:

A sufficient condition for me to believe that an AI actually cared about something would be a whole brain emulation: I would readily accept that such an emulation had preferences and values (and moral weight) in exactly the way that humans do, and that any manipulations of that emulation were acting on preferences in a real way.

I think that GPTs (and every other kind of current AI system) are not doing anything that is even close to isomorphic to the processing that happens inside the human brain. Art... (read more)

Thanks for the reply. Let me clarify my position a bit. I didn't mean to (positively) claim that GPTs have near-isomorphic motivational structure (though I think it's quite possible).  I meant to contend that I am not aware of any basis for confidently claiming that LLMs like GPT-4 are "only predicting what comes next", as opposed to "choosing" or "executing" one completion, or "wanting" to complete the tasks they are given, or—more generally—"making decisions on the basis of the available context, such that our ability to behaviorally steer LLMs (e.g. reducing sycophancy) is real evidence about our control over LLM motivations." Concerning "GPTs are predictors", the best a priori argument I can imagine is: GPT-4 was pretrained on CE loss, which itself is related to entropy, related to information content, related to Shannon's theorems isolating information content in the context of probabilities, which are themselves nailed down by Cox's theorems which do axiomatically support the Bayesian account of beliefs and belief updates... But this long-winded indirect axiomatic justification of "beliefs" does not sufficiently support some kind of inference like "GPTs are just predicting things, they don't really want to complete tasks." That's a very strong claim about the internal structure of LLMs. (Besides, the inductive biases probably have more to do with the parameter->function map, than the implicit regularization caused by the pretraining objective function; more a feature of the data, and less a feature of the local update rule used during pretraining...)

Or, another way of putting it:

It seems like you're imagining some sort of side-channel in which the LLM can take "free actions," which don't count as next-tokens, before coming back and making a final prediction about the next-tokens. This does not resemble anything in LM likelihood training, or in the usual user interaction modalities for LLMs.

These are limitations of current LLMs, which are GPTs trained via SGD. But there's no inherent reason you can't have a language model which predicts next tokens via shelling out to some more capable and more ag... (read more)

It seems like you're imagining some sort of side-channel in which the LLM can take "free actions," which don't count as next-tokens, before coming back and making a final prediction about the next-tokens. This does not resemble anything in LM likelihood training, or in the usual interaction modalities for LLMs.

I'm saying that the lack of these side-channels implies that GPTs alone will not scale to human-level.

If your system interface is a text channel, and you want the system behind the interface to accept inputs like the prompt above and return corre... (read more)

5Max H4mo
Or, another way of putting it: These are limitations of current LLMs, which are GPTs trained via SGD. But there's no inherent reason you can't have a language model which predicts next tokens via shelling out to some more capable and more agentic system (e.g. a human) instead. The result would be a (much slower) system that nevertheless achieves lower loss according to the original loss function.

But we have no evidence that this homunculus exists inside GPT-4, or any LLM. More pointedly, as LLMs have made remarkable strides toward human-level general intelligence, we have not observed a parallel trend toward becoming "more homuncular," more like a generally capable agent being pressed into service for next-token prediction.

"Remarkable strides", maybe, but current language models aren't exactly close to human-level in the relevant sense.

There are plenty of tasks a human could solve by exerting a tiny bit of agency or goal-directedness that are stil... (read more)

The example confuses me.

If you literally mean you are prompting the LLM with that text, then the LLM must output the answer immediately, as the string of next-tokens right after the words assuming I'm telling the truth, is:. There is no room in which to perform other, intermediate actions like persuading you to provide information.

It seems like you're imagining some sort of side-channel in which the LLM can take "free actions," which don't count as next-tokens, before coming back and making a final prediction about the next-tokens.  This does not rese... (read more)

Why do you think they chose to lead off with these signatures and not Eliezer Yudkowsky's? If the push for individual withdrawal from capabilities work is a success, then any time a government-implemented pause is proposed the expert consensus will be that no pause is necessary and AI does not represent an existential risk.


The benefit of withdrawal is not a pause or a stop. As long as there is no consensus on AI risk, individual withdrawal cannot lead to a stop.

I think this is treating expert consensus and credibility as more fixed / independent / in... (read more)

We can go look for such structures in e.g. nets, see how well they seem to match our own concepts, and have some reason to expect they'll match our own concepts robustly in certain cases.

Checking my own understanding with an example of what this might look like concretely:

Suppose you have a language model that can play Chess (via text notation). Presumably, the model has some kind of internal representation of the game, the board state, the pieces, and strategy. Those representations are probably complicated linear combinations / superpositions of activati... (read more)

For example, on complete preferences, here's a slightly more precise claim: any interesting and capable agent with incomplete preferences implies the possibility (via an often trivial construction) of a similarly-powerful agent with complete preferences, and that the agent with complete preferences will often be simpler and more natural in an intuitive sense.

This is not the case under things like invulnerable incomplete preferences, where they managed to weaken the axioms of EU theory enough to get a shutdownable agent:

... (read more)
Load More