LESSWRONG
is fundraising!
LW

Mikhail Samin

My name is Mikhail Samin (@Mihonarium on Twitter/X, @misha on Telegram).

Humanity's future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.

I have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.

I'm running two small nonprofits: AI Governance and Safety Institute and AI Safety and Governance Fund. Learn more about our results and donate: aisgf.us/fundraising

I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).

In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.

[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities would imprison me if I ever visit Russia.]

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

Mikhail Samin — LessWrong

Mikhail Samin

My name is Mikhail Samin (@Mihonarium on Twitter/X, @misha on Telegram).

Humanity's future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.

I'm running two small nonprofits: AI Governance and Safety Institute and AI Safety and Governance Fund. Learn more about our results and donate: aisgf.us/fundraising

I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).

Wikitag Contributions

Comments

Sorted by

Newest

Mikhail Samin's Shortform

Mikhail Samin3d70

I noticed I have no clue how different positions of the tongue, the jaw, and the lips lead to different sounds.

So after talking to LLMs and a couple of friends who are into linguistics, I vibecoded https://contact.ms/fun/vowels.

I have no clue how valid any of it is. Would love for someone with a background in physics(/physiology/phonetics?) to fact-check it.

Clipboard Normalization

Mikhail Samin13d20

would love a version for windows!

Toss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

[+]Mikhail Samin19d-13-3

Toss a bitcoin to your Lightcone – LW + Lighthaven's 2026 fundraiser

[+]Mikhail Samin20d*-10-42

Mikhail Samin's Shortform

Mikhail Samin1mo20

Has anyone tried to do refusal training with early layers frozen/only on the last layers? I wonder if the result would be harder to jailbreak.

Sharing information about Lightcone Infrastructure

Mikhail Samin1mo20

Perhaps you’re right; I would love for that to be the case, and to have been wrong about all this. But this model- that it’s a there exists quantifier- is very surprised by a bunch of things from “lol, no, […]” to “I might use it that way. Like, I might tell someone who is worried about [third party] that they are planning to move into the space if it seems relevant. Or I might myself come to realize it's important and then actively tell people to maybe do something about it.”

And, like, he didn’t give any examples of when he would not use the information.

His position was pretty clear to me: he thought that the fact the third party is moving into that space is bad, and if there is a way to use the information to prevent them from doing it, he would do so (but he didn’t see any ways of doing that and didn’t find it very important overall).

Like, there’s nothing in the messages to suggest otherwise.

He didn’t give an isolated example of when he’d want to share information for different reasons, where it would have a side-effect of hurting the interests of the third party. Instead, it was an example where the reason to share information was specifically that it would lead to hurting the interests of the third party.

He did call the information “strategically relevant”. He did say that he would continue to share the information basically at his sole discretion. He did say he might use it if he realizes it’s strategically important.

I really don’t have a coherent model of an alternative explanation you’re trying to point at.

(If you- or someone else- is available for that, I would love to jump on a call with someone who has a good model of Oliver and can explain to me the alternative explanation for what generated the messages.)

Sharing information about Lightcone Infrastructure

Mikhail Samin1mo00

“Oliver is not a good counterparty” is my judgment of him and his character based on the interaction that we had. How is it a “deceptive attack”?

I did replace it with “Oliver Habryka is a counterparty I regret having; he doesn't follow planecrash!lawful neutral/good norms; be careful when talking to him” to communicate information more directly, but I do think that if you refer having someone as a counterparty, they’re not a good counterparty!

Unless its governance changes, Anthropic is untrustworthy

Mikhail Samin1mo50

Due to concerns with the validity of

Anthropic wants to stay near the front of the pack at AI capabilities so that their empirical research is relevant, but not at the actual front of the pack to avoid accelerating race-dynamics.
— From an Anthropic employee in a private conversation, early 2023

I decided to remove it from the section 0 of the post. (At first, I temporarily added “(approximate recollection)” at the end while checking with Raemon on the details, but decided to delete this entirely once I got the reply.)

I apologize to readers for having had it in the post.

Thanks to @DanielFilan for the flag to it and to Raemon for a quick response on the details and the clarification.

Unless its governance changes, Anthropic is untrustworthy

Mikhail Samin1mo30

Yep, thanks for flagging. That was not intentional. After checking with Raemon, I removed this entirely.

Unless its governance changes, Anthropic is untrustworthy

Mikhail Samin1mo62

I sent Mikhail the following via DM, in response to his request for "any particular parts of the post [that] unfairly attack Anthropic":
I think that the entire post is optimized to attack Anthropic, in a way where it's very hard to distinguish between evidence you have, things you're inferring, standards you're implicitly holding them to, standards you're explicitly holding them to, etc.

I asked you for any particular example; you replied that “the entire post is optimized in a way where it’s hard to distinguish…”. Could you, please, give a particular example of where it’s hard to distinguish between evidence that I have and things I’m inferring?