Eli Tyre

Wiki Contributions


Looking back on my alignment PhD

When I get stuck on a problem (e.g. what is the type signature of human values?), I do not stay stuck. I notice I am stuck, I run down a list of tactics, I explicitly note what works, I upweight that for next time. 

What tactics in particular?

Conversation with Eliezer: What do you want the system to do?

Realistically I think the core issue is that Eliezer is very skeptical about the possibility of competitive AI alignment. That said, I think that even on Eliezer's pessimistic view he should probably just be complaining about competitiveness problems rather than saying pretty speculative stuff about what is needed for a pivotal act.

Isn't the core thing here that Eliezer expects that a local, hard-takeoff is possible? He thinks that a single AI system can rapidly gain enormous power relative to the rest of the world (either by recursive self improvement, or by seizing compute, or by just deploying on more computers)

If this is possible thing for an AGI system to do, it seems like ensuring a human future requires that you're able to prevent an unaligned AGI from undergoing a hard takeoff.

If you have aligned systems that are competitive in a number of different domains, that doesn't matter if 1) local hard takeoff is on the table and 2) you aren't able to produce systems whose alignment is robust to a hard takeoff.

It seems like the pivotal act ideology is a natural consequence of 1) expecting hard takeoff and 2) thinking that alignment is hard, full stop. Whether or not aligned systems will be competitive doesn't come into it. Or by "competitive" do you mean, specifically "competitive, even across the huge relative capability gain of a hard takeoff"? 

It seems like Eliezer's chain of argument is:

  • [Hard takeoff is likely] 
    • =>
  • [You need a pivotal act to preempt unaligned superintelligence] 
    • =>
  • [Your safe AI design needs to be able to do something concrete that can enable a pivotal act in order to be of strategic relevance.]
    • =>
  • [When doing AI safety work, you need to be thinking about the concrete actions that your system will do]
Where I agree and disagree with Eliezer

The notion of an AI-enabled “pivotal act” seems misguided. Aligned AI  systems can reduce the period of risk of an unaligned AI by advancing alignment research, convincingly demonstrating the risk posed by 
unaligned AI, and consuming the “[free energy](https://www.lesswrong.com/posts/yPLr2tnXbiFXkMWvk/an-equilibrium-of-no-free-energy)”that an unaligned AI might have used to grow explosively. No particular act needs to be pivotal in order to greatly reduce the risk from unaligned AI, and the search for single pivotal acts leads to unrealistic stories of the future and unrealistic pictures of what AI labs should do.

On the face of it, this seems true, and it seems like a pretty big clarification to my thinking. You can buy more time or more safety, at little bit at a time, instead of all at once, in sort of the way that you want to achieve life extension escape velocity.

But it seems like this largely depends on whether you expect takeoff to be hard or soft. If AI takeoff is hard, you need pretty severe interventions, because they either need to prevent the deployment of AGI or be sufficient to counter the actions of a superintelligece. Generally, it seems like the sharper takeoff is, the more good outcomes flow through pivotal acts, and the smoother takeoff is the more we should expect good outcomes to flow through incremental improvements.

Are there any incremental actions that add up to a "pivotal shift" in a hard takeoff world?


Ngo and Yudkowsky on scientific reasoning and pivotal acts


And some deep principles governing engines, but not really very crucial ones to actually building (early versions of) those engines


that's... not historically true at all?

getting a grip on quantities of heat and their flow was critical to getting steam engines to work

it didn't happen until the math was there

Checking very quickly, this article, at least, disagrees with the claim that thermodynamics was developed a century after the invention of the steam engine.

Maybe Eliezer is referring to something more basic than thermodynamics? Or this just an error?

On how various plans miss the hard bits of the alignment challenge

This comment seems to me to be pointing at something very important which I had not hitherto grasped.

My (shitty) summary:

There's a big difference between gains from improving the architecture / abilities of a system (the genome, for human agents) and gains from increasing knowledge developed over the course of an episode (or lifetime). In particular they might differ in how easy to "get the alignment in". 

If the AGI is doing consequentialist reasoning while it is still mostly getting gains from gradient descent as opposed to from knowledge collected over an episode, then we have more ability to steer it's trajectory. 

Eli's shortform feed

When is an event surprising enough that I should be confused?

Today, I was reading Mistakes with Conservation of Expected Evidence. For some reason, I was under the impression that the post was written by Rohin Shah; but it turns out it was written by Abram Demski.

In retrospect, I should have been surprised that "Rohin" kept talking about what Eliezer says in the Sequences. I wouldn't have guessed that Rohin was that "culturally rationalist" or that he would be that interested in what Eliezer wrote in the sequences. And indeed, I was updating that Rohin was more of a rationalist, with more rationalist interests, than I had thought. If I had been more surprised, I could have noticed my surprise / confusion, and made a better prediction.

But on the other hand, was my surprise so extreme that it should have triggered an error message (confusion), instead of merely an update? Maybe this was just fine reasoning after all?

From a Bayesian perspective, I should have observed this evidence, and increased my credence in both Rohin being more rationalist-y than I thought, and also in the hypothesis that this wasn't written by Rohin. But practically, I would have needed to generate the second hypothesis, and I don't think that I had strong enough reason to.

I feel like there's a semi-interesting epistemic puzzle here. What's the threshold for a surprising enough observation that you should be confused (much less notice your confusion)?

Air Conditioner Test Results & Discussion

Noting for myself: I didn't make an explicit prediction, but I emotionally expected John to be vindicated by this experiment. My emotional prediction was wrong, and that seems good to notice, even if I don't do much further reflection.

Air Conditioner Test Results & Discussion

This is a great comment. The graphs helped a lot.

Air Conditioner Test Results & Discussion

I just want to say that I found this comment personally helpful.

This is the problem with how the rationalist community approaches the concept of what it means to "make a rational decision" perfectly demonstrated in a single debate. You do not make a "rational decision" in the real world by reasoning in a vacuum.

Something about this seems on point to me. Rationalists, in general, are much more likely to be mathematicians, than (for instance) mechanical engineers. It does seem right to me that when I look around, I see people drawn to abstract analyses, very plausibly at the expense of neglecting contextualized details that are crucial for making good calls.  This seems like it could very well be a bias of my culture.

For instance, it's fun and popular to talk about civilizational inadequacy, or how the world is mad. I think that is pointing at something true and important, but I wonder how much of that is basically overlooking the fact that it is hard to do things in the real world with a bunch of different stakeholders and a confusing mistakes. 

In a lot of cases, civilizational inadequacy can be the result of engineers (broadly construed) who understand that "the perfect is the enemy of the good", pushing projects through to completion anyway. The outcome is sometimes so muddled to be worse than having done nothing, but also, shipping things under constraints, even though they could be much better on some axes is how civilization runs.

Anyway, this makes me think that I should attempt to do more engineering projects, or otherwise find ways to operate in domains where the goal is to get "good enough", within a bunch of not-always crisply-defined constraints. 

Load More