Steven Byrnes

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See for a summary of my research and sorted list of writing. Email: Also on Twitter, Mastodon, Threads. Physicist by training.


Intro to Brain-Like-AGI Safety

Wiki Contributions


If you have possible targets , then as an alternative to maximin, you could also maximize , or replace log with your favorite function with sufficiently-sharply-diminishing returns. This has the advantage that the AI will continue to take pareto-improvements rather than often feeling completely neutral about them.

(That’s just a fun idea that I think I got indirectly from Stuart Armstrong … but I’m skeptical that this would really be relevant: I suspect that if you have any uncertainty about the target at all, it would rapidly turn into a combinatorial explosion of possibilities, such that it would be infeasible for the AI to keep track of how an action scores on all gazillion of them.)

Thanks for the comment!

Right, so my concern is that humans evidently don’t take societal resilience seriously, e.g. gain-of-function research remains legal in every country on earth (as far as I know) even after COVID. So you can either:

  • (1) try to change that fact through conventional means (e.g. be an activist for societal resilience, either directly or via advocating for prediction markets and numeracy or something, I dunno), per Section 3.3 — I’m very strongly in favor of people working on this but don’t hold out much hope for anything more than a marginal improvement;
  • (2) hope that “AI helpers” will convince people to take societal resilience seriously — I’m pessimistic per the Section 3.2 argument that people won’t use AI helpers that tell them things they don’t want to hear, in situations where there are no immediate consequences, and I think sacrificing immediate gains for uncertain future societal resilience is one such area;
  • (3) make AIs that take societal resilience seriously and act on it, not because any human told them to but rather because their hearts are in the right place and they figured this out on their own — this is adjacent to Section 3.5.2 where we make friendly autonomous AGI, and I’m probably most optimistic / least pessimistic about that path right now;
  • (4) suggest that actually this whole thing is not that important, i.e., it would be nice if humans were better at societal resilience, but evidently we’ve been muddling along so far and maybe we’ll continue to do so — I’m pessimistic for various reasons in the post but I hope I’m wrong!

I guess you’re suggesting (3) or (4) or maybe some combination of both, I’m not sure. You can correct me if I’m wrong.

Separately, in response to your “Mr. Smiles” thing, I think all realistic options on the table can be made to sound extremely weird and dystopian. I agree with you that “AI(s) that can prevent powerful out-of-control AI from coming into existence in the first place” seems pretty dystopian, but I’m also concerned that “AI(s) that does allow out-of-control AIs to come into existence, but prevents them from doing much harm by intervening elsewhere in the world” seems pretty dystopian too, once you think it through. And so does every other option. Or at least, that’s my concern.

I’m confused about your “pretend base world”. This isn’t a discussion about whether it’s good or bad that OAI exists. It’s a discussion about “Sam Altman’s chip ambitions”. So we should compare the world where OAI seems to be doing quite well and Sam Altman has no chip ambitions at all, to the world where OAI seems to be doing quite well and Sam Altman does have chip ambitions. Right?

I agree that if we’re worried about FOOM-from-a-paradigm-shifting-algorithmic-breakthrough (which as it turns out I am indeed worried about), then we would prefer be in a world where there is a low absolute number of chips that are flexible enough to run a wide variety of algorithms, than a world where there are a large number of such chips. But I disagree that this would be the effect of Sam Altman’s chip ambitions; rather, I think Sam Altman’s chip ambitions would clearly move things in the opposite, bad direction, on that metric. Don’t you think?

By analogy, suppose I say “(1) It’s very important to minimize the number of red cars in existence. (2) Hey, there’s a massively hot upcoming specialized market for blue cars, so let’s build 100 massive car factories all around the world.” You would agree that (2) is moving things in the wrong direction for accomplishing (1), right?

This seems obvious to me, but if not, I’ll spell out a couple reasons:

  • For one thing, who’s to say that the new car factories won’t sell into the red-car market too? Back to the case at hand: we should strongly presume that whatever fabs get built by this Sam Altman initiative will make not exclusively ultra-specialized AI chips, but rather they will make whatever kinds of chips are most profitable to make, and this might include some less-specialized chips. After all, whoever invests in the fab, once they build it, they will try to maximize revenue, to make back the insane amount of money they put in, right? And fabs are flexible enough to make more than one kind of chip, especially over the long term.
  • For another thing, even if the new car factories don’t directly produce red cars, they will still lower the price of red cars, compared to the factories not existing, because the old car factories will produce extra marginal red cars when they would otherwise be producing blue cars. Back to the case at hand: the non-Sam-Altman fabs will choose to pump out more non-ultra-specialized chips if Sam-Altman fabs are flooding the specialized-chips market. Also, in the longer term, fab suppliers will be able to lower costs across the industry (from both economies-of-scale and having more money for R&D towards process improvements) if they have more fabs to sell to, and this would make it economical for non-Sam-Altman fabs to produce and sell more non-ultra-specialized chips.

Whether Eliezer or Paul is right about “sudden all at once jolt” is an interesting question but I don’t understand how that question is related to Sam Altman’s chip ambitions. I don’t understand why Logan keeps bringing that up, and now I guess you’re doing it too. Is the idea that “sudden all at once jolt” is less likely in a world with more chips and chip fabs, and more likely in a world with fewer chips and chip fabs? If so, why? I would expect that if the extra chips make any difference at all, it would be to push things in the opposite direction.

In other words, if “situations where there are a lot of chips piled up everywhere plugged into infrastucture” is the bad thing that we’re trying to avoid, then a good way to help avoid that is to NOT manufacture tons and tons of extra chips, right?

You’re saying a lot of things that seem very confused and irrelevant to me, and I’m trying to get to the bottom of where you’re coming from.

Here’s a key question, I think: In this comment, you drew a diagram with a dashed line labeled “capabilities ceiling”. What do you think determines the capabilities ceiling? E.g. what hypothetical real-world interventions would make that dashed line move left or right, or to get steeper or shallower?

In other words, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen sooner will push us towards slow takeoff. That would be internally-inconsistent, right? As a thought experiment, suppose all tech and economic and intellectual progress of the next century magically happened overnight tonight. Then we would have extremely powerful AGI tomorrow, right? And that is a very fast takeoff, i.e. one day.

Conversely, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen later will push us towards fast takeoff. Again, that would be internally-inconsistent, right? As a thought experiment, suppose every human on Earth simultaneously hibernated for 20 years starting right now. And then the entire human race wakes up in 2044, and we pick right back up where we were. That wouldn’t make a bit of difference to takeoff speed—the takeoff would be exactly as fast or slow if the hibernation happens as if it doesn’t. Right? (Well, that’s assuming that takeoff hasn’t already started, I guess, but if it has, then the hibernation would technically make takeoff slower not faster, right?)

If you agree with those two thought experiments, then you need to have in mind some bottleneck sitting between us and dangerous AGI, i.e. the “capabilities ceiling” dashed line. If there is such a bottleneck, then we can and hopefully would accelerate everything except that bottleneck, and we won’t get all the way to dangerous AGI until that bottleneck goes away, which (we hope) will take a long time, presumably because of the particular nature of that bottleneck. Most people in the prosaic AGI camp (including the slightly-younger Sam Altman I guess) think that the finite number of chips in the world is either the entirety of this bottleneck, or at least a major part of it, and that therefore trying to alleviate this bottleneck ASAP is the last thing you want to do in order to get maximally slow takeoff. If you disagree with that, then you presumably are expecting a different bottleneck besides chips, and I’d like to know what it is, and how you know.

Can you explain why? If that Paul excerpt is supposed to be an explanation, then I don’t follow it.

You previously linked this post by Hadshar & Lintz which is very explicit that the more chips there are in the world, the faster takeoff we should expect. (E.g. “slow, continuous takeoff is more likely given short timelines […partly because…] It’s plausible that compute overhang is low now relative to later, and this tends towards slower, more continuous takeoff.”) Do you think Hadshar & Lintz are incorrect on this point, or do you think that I am mischaracterizing their beliefs, or something else?

I'm not a fan of @Review Bot because I think that when people are reading a discussion thread, they're thinking and talking about object-level stuff, i.e. the content of the post, and that's a good thing. Whereas the Review Bot comments draw attention away from that good thing and towards the less-desirable meta-level / social activity of pondering where a post sits on the axis from "yay" to "boo", and/or from "popular" to "unpopular".

(Just one guy's opinion, I don't feel super strongly about it.)

I’m confused about how “pausing AI research (by banning large training runs)” is related to this conversation. The OP doesn’t even mention that, nor do any of the other comments, as far as I can see. The topic of discussion here on this page is a move by Sam Altman to raise tons of money and build tons of chip fabs. Right?

If you weren’t talking about “Sam Altman’s chip ambitions” previously, then, well, I propose that we start talking about it now. So: How do you think that particular move—the move by Sam Altman to raise tons of money and build tons of chip fabs—would affect (or not affect) takeoff speeds, and why do you think that? A.k.a. how would you answer the question I asked here?

I’m confused about whether you agree or disagree with the proposition: “By doing this chip thing, Sam Altman is likely to make takeoff faster (on the margin) than it otherwise would be.”

In other words, compare the current world to the world where Sam Altman did everything else in life the same, including leading OpenAI etc., but where he didn’t pursue this big chip project. Which world do you think has faster takeoff? If you already answered that question, then I didn’t understand it, sorry.

A fast takeoff occurs later in time than a slow takeoff because a slow takeoff involves gradual acceleration over a longer period of time, whereas a slow takeoff involves sudden takeoff over a shorter period of time.

I think you’re confused. If you hold the “singularity date” / “takeoff end time” fixed—e.g., you say that we somehow know for certain that “the singularity” will happen on August 29, 2047—then the later takeoff starts, the faster it is. I think that’s what you have in mind, right? (Yes I know that the singularity is not literally a specific day, but hopefully you get what I’m saying).

But that’s irrelevant. It’s not what we’re talking about. I.e., it is not true that the singularity will happen at a fixed, exogenous date. There are possible interventions that could make that date earlier or later.

Anyway, I understand your comment as suggesting: “As long as the singularity hasn’t happened yet, any intervention that would accelerate AI progress, such as Sam Altman’s trying to make more AI chips in the next 10ish years, pushes us in the direction of slow takeoff, not fast takeoff.” Am I understanding you correctly? If so, then I disagree because it’s possible to accelerate AI progress in a way that brings the “takeoff end time” sooner without affecting the “takeoff start time” (using the terminology of that post you linked), or at least in a way that brings the end time sooner more than it brings the start time sooner.

You see what I mean?

Load More