Computing scientist and Systems architect. Currently doing self-funded AGI safety research.


Counterfactual Planning

Wiki Contributions


AGI Ruin: A List of Lethalities

IMO the biggest hole here is "why should a superhuman AI be extremely consequentialist/optimizing"?

I agree this is a very big hole. My opinion here is not humble. My considered opinion is that Eliezer is deeply wrong in point 23, on many levels. (Edited to add: I guess I should include an informative link instead of just expressing my disappointment. Here is my 2021 review of the state of the corrigibility field).

Steven, in response to your line of reasoning to fix/clarify this point 23: I am not arguing for pivotal acts as considered and then rejected by Eliezer, but I believe that he strongly underestimates the chances of people inventing safe and also non-consequentialist optimising AGI. So I disagree with your plausibility claim in point (3).

AGI Ruin: A List of Lethalities

You are welcome. I carefully avoided mentioning my credentials as a rhetorical device.

I rank the credibility of my own informed guesses far above those of Eliezer.

This is to highlight the essence of how many of the arguments on this site work.

AGI Ruin: A List of Lethalities

Why do you rate yourself "far above" someone who has spent decades working in this field?

Well put, valid question. By the way, did you notice how careful I was in avoiding any direct mention of my own credentials above?

I see that Rob has already written a reply to your comments, making some of the broader points that I could have made too. So I'll cover some other things.

To answer your valid question: If you hover over my LW/AF username, you can see that I self-code as the kind of alignment researcher who is also a card-carrying member of the academic/industrial establishment. In both age and academic credentials. I am in fact a more-senior researcher than Eliezer is. So the epistemology, if you are outside of this field and want to decide which one of us is probably more right, gets rather complicated.

Though we have disagreements, I should also point out some similarities between Eliezer and me.

Like Eliezer, I spend a lot of time reflecting on the problem of crafting tools that other people might use to improve their own ability to think about alignment. Specifically, these are not tools that can be used for the problem of triangulating between self-declared experts. They are tools that can be used by people to develop their own well-founded opinions independently. You may have noticed that this is somewhat of a theme in section C of the original post above.

The tools I have crafted so far are somewhat different from those that Eliezer is most famous for. I also tend to target my tools more at the mainstream than at Rationalists and EAs reading this forum.

Like Eliezer, on some bad days I cannot escape having certain feelings of disappointment about how well this entire global tool crafting project has been going so far. Eliezer seems to be having quite a lot of these bad days recently, which makes me feel sorry, but there you go.

AGI Ruin: A List of Lethalities

Having read the original post and may of the comments made so far, I'll add an epistemological observation that I have not seen others make yet quite so forcefully. From the original post:

Here, from my perspective, are some different true things that could be said, to contradict various false things that various different people seem to believe, about why AGI would be survivable [...]

I want to highlight that many of the different 'true things' on the long numbered list in the OP are in fact purely speculative claims about the probable nature of future AGI technology, a technology nobody has seen yet.

The claimed truth of several of these 'true things' is often backed up by nothing more than Eliezer's best-guess informed-gut-feeling predictions about what future AGI must necessarily be like. These predictions often directly contradict the best-guess informed-gut-feeling predictions of others, as is admirably demonstrated in the 2021 MIRI conversations.

Some of Eliezer's best guesses also directly contradict my own best-guess informed-gut-feeling predictions. I rank the credibility of my own informed guesses far above those of Eliezer.

So overall, based on my own best guesses here, I am much more optimistic about avoiding AGI ruin than Eliezer is. I am also much less dissatisfied about how much progress has been made so far.

AGI Ruin: A List of Lethalities

I tried something like this much earlier with a single question, "Can you explain why it'd be hard to make an AGI that believed 222 + 222 = 555", and got enough pushback from people who didn't like the framing that I shelved the effort.

Interesting. I kind of like the framing here, but I have written a paper and sequence on the exact opposite question, on why it would be easy to make an AGI that believes 222+222=555, if you ever had AGI technology, and what you can do with that in terms of safety.

I can honestly say however that the project of writing that thing, in a way that makes the math somewhat accessible, was not easy.

Announcing the Alignment of Complex Systems Research Group

If you’re interested in conceptual work on agency and the intersection of complex systems and AI alignment

I'm interested in this agenda, and I have been working on this kind of thing myself, but I am not interested at this time in moving to Prague. I figure that you are looking for people interested in moving to Prague, but if you are issuing a broad call for collaborators in general, or are thinking about setting up a much more distributed group, please clarify.

A more technical question about your approach:

What we’re looking for is more like a vertical game theory.

I'm not sure if you are interested in developing very generic kinds of vertical game theory, or in very specific acts of vertical mechanism design.

I feel that vertical mechanism design where some of the players are AIs is deeply interesting and relevant to alignment, much more so than generic game theory. For some examples of the kind of mechanism design I am talking about, see my post and related paper here. I am not sure if my interests make me a nearest neighbour of your research agenda, or just a very distant neighbour.

Reshaping the AI Industry

There are some good thoughts here, I like this enough that I am going to comment on the effective strategies angle. You state that

The wider AI research community is an almost-optimal engine of apocalypse.


AI capabilities are advancing rapidly, while our attempts to align it proceed at a frustratingly slow pace.

I have to observe that, even though certain people on this forum definitely do believe the above two statements, even on this forum this extreme level of pessimism is a minority opinion. Personally, I have been quite pleased with the pace of progress in alignment research.

This level of disagreement, which is almost inevitable as it involves estimates about about the future. has important implications for the problem of convincing people:

As per above, we'd be fighting an uphill battle here. Researchers and managers are knowledgeable on the subject, have undoubtedly heard about AI risk already, and weren't convinced.

I'd say that you would indeed be facing an uphill battle, if you'd want to convince most researchers and managers that the recent late-stage Yudkowsky estimates about the inevitability of an AI apocalypse are correct.

The effective framing you are looking for, even if you believe yourself that Yudkowsky is fully correct, is that more work is needed on reducing long-term AI risks. Researchers and managers in the AI industry might agree with you on that, even if they disagree with you and Yudkowsky about other things.

Whether these researchers and managers will change their whole career just because they agree with you is a different matter. Most will not. This is a separate problem, and should be treated as such. Trying to solve both problems at once by making people deeply afraid about the AI apocalypse is a losing strategy.

Would (myopic) general public good producers significantly accelerate the development of AGI?

What are some of those [under-produced software] components? We can put them on a list.

Good question. I don't have a list, just a general sense of the situation. Making a list would be a research project in itself. Also, different people here would give you different answers. That being said,

  • I occasionally see comments from alignment research orgs who do actual software experiments that they spend a lot of time on just building and maintaining the infrastructure to run large scale experiments. You'd have to talk to actual orgs to ask them what they would need most. I'm currently a more theoretical alignment researcher, so I cannot offer up-to-date actionable insights here.

  • As a theoretical researcher, I do reflect on what useful roads are not being taken, by industry and academia. One observation here is that there is an under-investment in public high-quality datasets for testing and training, and in the (publicly available) tools needed for dataset preparation and quality assurance. I am not the only one making that observation, see for example https://research.google/pubs/pub49953/ . Another observation is that everybody is working on open source ML algorithms, but almost nobody is working on open source reward functions that try to capture the actual complex details of human needs, laws, or morality. Also, where is the open source aligned content recommender?

  • On a more practical note, AI benchmarks have turned out to be a good mechanism for drawing attention to certain problems. Many feel that this benchmarks are having a bad influence on the field of AI, I have a lot of sympathy for that view, but you might also go with the flow. A (crypto) market that rewards progress on selected alignment benchmarks may be a thing that has value. You can think here of benchmarks that reward cooperative behaviour, truthfulness and morality in answers given by natural language querying systems, playing games ethically ( https://arxiv.org/pdf/2110.13136.pdf ), etc. My preference would be to reward benchmark contributions that win by building strong priors into the AI to guide and channel machine learning; many ML researchers would consider this to be cheating, but these are supposed to be alignment benchmarks, not machine-learning-from-blank-slate benchmarks. I have some doubts about the benchmarks for fairness in ML which are becoming popular, if I look at the latest NeurIPS: the ones I have seen offer tests which look a bit too easy, if the objective is to reward progress on techniques that have the promise of scaling up to more complex notions of fairness and morality you would like to have at the AGI level, or even for something like a simple content recommendation AI. Some cooperative behaviour benchmarks also strike me as being too simple, in their problem statements and mechanics, to reward the type of research that I would like to see. Generally, you would want to retire a benchmark from the rewards-generating market when the improvements on the score level out.

Would (myopic) general public good producers significantly accelerate the development of AGI?

I guess I got that impression from the 'public good producers significantly accelerate the development of AGI' in the title, and then looking at the impactcerts website.

I somehow overlooked the bit where you state that you are also wondering if that would be a good idea.

To be clear: my sense of the current AI open source space is that it definitely under-produces certain software components, software components that could be relevant for improving AI/AGI safety.

Would (myopic) general public good producers significantly accelerate the development of AGI?

If I am reading you correctly, you are trying to build an incentive structure that will accelerate the development of AGI. Many alignment researchers (I am one) will tell you that this is not a good idea, instead you want to build an incentive structure that will accelerate the development of safety systems and alignment methods for AI and AGI.

There is a lot of open source production in the AI world, but you are right in speculating that a lot of AI code and know-how is never open sourced. Take a look at the self-driving car R&D landscape if you want to see this in action.

As already mentioned by Zac, for-profit companies release useful open source all the time for many self-interested reasons.

One reason not yet mentioned by Zac is that an open source release may be a direct attack to suck the oxygen our of the business model of one or more competitors, an attack which aims to commoditize the secret sauce (the software functions and know-how) that the competitor relies on to maintain profitability.

This motivation explains why Facebook started to release big data handling software and open source AI frameworks: they were attacking Google's stated long-term business strategy, which relied on Google being better at big data and AI than anybody else. To make this more complicated, Google's market power never relied as much on big data and advanced AI as it wanted its late-stage investors to believe, so the whole move has been somewhat of an investor story telling shadow war.

Personally, I am not a big fan of the idea that one might try to leverage crypto-based markets as a way to improve on this resource allocation mess.

Load More