Michael Tontchev

Wiki Contributions


Maybe our brains do a kind of expansion of a fact before memorizing it and its neighbors in logic space.

Surviving AGI and it solving aging doesn't imply you specifically get to live forever. In my mental model, safe AGI simply entails mass disempowerment, because how do you earn an income when everything you do is done better and cheaper by an AI? If the answer is UBI, what power do people have to argue for it and politically keep it?

The main problem I see here is that support for these efforts does epistemic damage. If you become known as the group that supports regulations for reasons they don't really believe to further other, hidden goals, you lose trust in the truthfulness of your communication. You erode the norms by which both you and your opponents play, which means you give them access to a lot of nefarious policies and strategies as well.

That being said, there's probably other ideas within this space that are not epistemically damaging.

I feel like we can spin up stories like this that go any way we want. I'd rather look at trends and some harder analysis.

For example we can tell an equally-entertaining story where any amount of AI progress slowdown in the US pushes researchers to other countries that care less about alignment, so no amount of slowdown is effective. Additionally, any amount of safety work and deployment criteria can push the best capabilities people to the firms with the least safety restrictions.

But do we think these are plausible, and specifically more plausible than alternatives where slowdowns work?

Maybe the answer is something like "UBI co-op"? If the mostly 99% non-capitalist class bands together in some voluntary alliance where the cumulative wealth is invested in running one of these AI empires in their benefit and split up in some proportion?
Seems potentially promising, but may face the challenges of historical co-ops. Haven't thought enough, but it's all I've got for now.

Yep, and I recognize that later in the article:

The paperclip maximizer problem that we discussed earlier was actually initially proposed not as an outer alignment problem of the kind that I presented (although it is also a problem of choosing the correct objective function/outer alignment). The original paperclip maximizer was an inner alignment problem: what if in the course of training an AI, deep in its connection weights, it learned a “preference” for items shaped like paperclips.

But it's still useful as an outer alignment intuition pump.

Want to add this one:


This is the note I wrote internally at Meta - it's had over 300 reactions, as well as people reaching out to me saying it has convinced them to switch to working on alignment.

Thanks for your feedback. It turns out the Medium format matches really well with LessWrong and only needed 10 minutes of adjustment, so I copied it over :) Thanks!

Do people really not do one extra click, even after the intro? :O

Load More