Steven Byrnes

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms—see https://sjbyrnes.com/agi.html. Email: steven.byrnes@gmail.com. Twitter: @steve47285. Employer: https://astera.org/. Physicist by training.

Sequences

Intro to Brain-Like-AGI Safety

Wiki Contributions

Comments

I’m finding this conversation frustrating. It seems to me that your grandparent comment was specifically talking about biotech & pandemics. For example, you said “wet nanotech/biotech”. And then in that context you said “"blight" that could very, very, very slowly damage our agriculture, cause disease in humans, and expand the AI's influence, on the scale of decades or centuries”. This sure sounds to me like a claim that a novel pandemic would spread over the course of decades or centuries. Right? And such a claim is patently absurd. It did not take decades or centuries for COVID to spread around the world. (Even before mass air travel, it did not take decades or centuries for Spanish Flu to spread around the world.) Instead of acknowledging that mistake, your response is “a pathogen is not grey nanotech, but biotech”, which is missing the point—I was disputing a claim that you made about biotech.

their deadliness is inverse to their contagiousness for obvious reasons (dead men don't travel very well).

Famously, when you catch COVID, you can become infectious a day or two before you become symptomatic. (That’s why it was so hard to contain.) And COVID also could cause nerve-damage that presumably had nothing to do with its ability to spread. More generally, it seems perfectly possible for a disease to have a highly-contagious-but-not-too-damaging early phase and then a few days later it turns lethal, perhaps by spreading into a totally different part of the body. So I strongly disbelieve the claim that deadliness and contagiousness of engineered pathogens are inevitably inverse, let alone that this is “obvious”.

I also suggest reading this article.

Mainly, I expect fine-tuning to shift mask probabilities and only bias next-token prediction slightly and not particularly create an underlying goal.

If RLHF didn’t make a very noticeable difference in which tokens get emitted, then nobody would bother doing RLHF, right?

OP said:

I use "nanobots" to mean "self-replicating microscopic machines with some fundamental mechanistic differences from all biological life that make them superior".

(And I believe they’re using “grey goo” the same way.) So I think you’re using a different definition of “grey goo” from OP, and that under OP’s definition, biological life is not an existence proof.

I think the question of “whether grey-goo-as-defined-by-OP is possible” is an interesting question and I’d be curious to know the answer for various reasons, even if it’s not super-central in the context of AI risk.

If an AI arranged to release a highly-contagious deadly engineered pathogen in an international airport, it would not take "decades or centuries" to spread. Right????

OP said:

I use "nanobots" to mean "self-replicating microscopic machines with some fundamental mechanistic differences from all biological life that make them superior".

I think that there are lots of plausible “invasive species from hell” scenarios where an organism is sufficiently edited so as to have no natural viruses (because its genome is weird) and no natural predators (because its sugars are weird or it has an exotic new toxin) and so on. They would still have ecological niches where they wouldn’t be able to thrive, and they would still presumably get predators and diseases eventually. But a lot of destruction could happen in the meantime, including collapsing critical ecosystems etc., and it could happen fast (years not decades, but also not weeks) if the organism is introduced in lots of places at once, I would assume.

Those scenarios are important, but they’re not “nanobots” by OP’s definition.

In the context of AI x-risk, I’m mainly interested in

  • (1) can an AI use nanotech as a central ingredient of a plan to wipe out humanity, and
  • (2) can an AI use nanotech as a central ingredient of a plan to operate perpetually in a world without humans?

[(2) is obviously possible once you have a few billion human-level-intelligent robots, but the question is “can nanotech dramatically reduce the amount of time that the AI is relying on human help, compared to that baseline?”. Presumably “being able to make arbitrarily more chips or chip-equivalents” would be the most difficult ingredient.]

In both cases it seems to me that the answer is “obviously yes”: 

  • super-plagues / crop diseases / etc. are an existence proof for (1),
  • human brains are an existence proof for (2).

Therefore grey goo as defined in this post doesn’t seem too relevant for my AI-related questions. Like, if the AI doesn’t have a plan to make nanotech things that can exterminate / outcompete microbes living in rocks deep under the seafloor—man, I just don’t care.

None of this is meant to be a criticism of this post, which I’m glad exists, even if I’m not in a position to evaluate it. Indeed, I’m not even sure OP would disagree with my comment here (based on their main AI post).

If the 5 lovely plates were literally identical in the two sets, I think (for many people) it might serve as a sort of "hint" that they should consider the clever course of action, the one that involves splitting up the B set (i.e. doing one thing with the 10 cracked & chipped plates, and doing a different thing with the 5 other B plates). That same clever splitting idea might also pop into some people's heads for the B-versus-C comparison, but I think it would be less obvious / salient, so fewer people would think of that, leaving at least a subset of people who would choose both B-over-A if that were the choice, and C-over-B if that were the choice.

Well, I guess it wouldn't be a circular preference for you. :)

I think it wouldn't occur to many people that they could do one thing with the better 5 plates, and do a different thing with the worse 10 plates, if the plates are not presented in a way the 5+10 division salient. Imagine the better and worse ones are all mixed up, and they're all the same design, such that they're obviously meant to be used as a set, but 2/3rds of the plates in the set have obvious cracks and chips. My impression (again see related experiments in the book chapter) is that many people would just take in the set of 15 plates as a whole and say "man, we can't eat off these, someone could get a cut, the sauce would leak onto the table etc.". The person would have to be kinda thinking outside the box and putting in some effort to notice that there are 5 plates in the set with no chips or cracks, and think of the strategy where they use those and throw out the other 10.

It’s generally hard to find one-size-fits-all responses for things like this. Instead I would first want to know: WHY does he thinks it’s astronomically unlikely to be <80 years away?

You’re thinking about inference, and I’m thinking about learning. When I spend my week trying to come up with the project, I’m permanently smarter at the end of the week than I was at the beginning. It’s a weights-versus-context-window thing. I think weight-learning can do things that context-window-“learning” can’t. In my mind, this belief is vaguely related to my belief that there is no possible combination of sensory inputs that will give a human a deep understanding of chemistry from scratch in 10 minutes. (And having lots of clones of that human working together doesn't help.)

Load More