My name is Mikhail Samin (diminutive Misha, @Mihonarium on Twitter, @misha on Telegram).

Humanity's future can be enormous and awesome; losing it would mean our lightcone (and maybe the universe) losing most of its potential value.

My research is currently focused on AI governance and improving the understanding of AI and AI risks among stakeholders. I also have takes on what seems to me to be the very obvious shallow stuff about the technical AI notkilleveryoneism; but many AI Safety researchers told me our conversations improved their understanding of the alignment problem.

I'm running two small nonprofits: AI Governance and Safety Institute and AI Safety and Governance Fund. Learn more about our results and donate: aisgf.us/fundraising

I took the Giving What We Can pledge to donate at least 10% of my income for the rest of my life or until the day I retire (why?).

In the past, I've launched the most funded crowdfunding campaign in the history of Russia (it was to print HPMOR! we printed 21 000 copies =63k books) and founded audd.io, which allowed me to donate >$100k to EA causes, including >$60k to MIRI.

[Less important: I've also started a project to translate 80,000 Hours, a career guide that helps to find a fulfilling career that does good, into Russian. The impact and the effectiveness aside, for a year, I was the head of the Russian Pastafarian Church: a movement claiming to be a parody religion, with 200 000 members in Russia at the time, trying to increase separation between religious organisations and the state. I was a political activist and a human rights advocate. I studied relevant Russian and international law and wrote appeals that won cases against the Russian government in courts; I was able to protect people from unlawful police action. I co-founded the Moscow branch of the "Vesna" democratic movement, coordinated election observers in a Moscow district, wrote dissenting opinions for members of electoral commissions, helped Navalny's Anti-Corruption Foundation, helped Telegram with internet censorship circumvention, and participated in and organized protests and campaigns. The large-scale goal was to build a civil society and turn Russia into a democracy through nonviolent resistance. This goal wasn't achieved, but some of the more local campaigns were successful. That felt important and was also mostly fun- except for being detained by the police. I think it's likely the Russian authorities would imprison me if I ever visit Russia.]

I want to make a thing that talks about why people shouldn't work at Anthropic on capabilities and all the evidence that points in the direction of them being a bad actor in the space, bound by employees who they have to deceive.

A very early version of what it might look like: https://anthropic.ml

Help needed! Email me (or DM on Signal) ms@contact.ms (@misha.09)

(am not a security professional.)

All seem low-real-world-severity; two of the three are bugs in places where I think people wouldn't be looking for them as much; one of the three is a controlled crash with no impact outside potential DoS.
See this comment.
The timing side-channel bug is impressive to see discovered with AI. You need to notice that operations take different amounts of time, and then figure out that it's bad in this specific case.

AISLE's system flagged the anomaly through deep analysis of memory access patterns and control flow

Unsure how much of this is due to scaffolding around LLMs vs. due to more traditional systems.

Thanks for response; my personal concerns^[1] would somewhat be alleviated, without any technocal changes, by:

Lightcone Infrastructure explicitly promising not to look at private messages unless a counterparty agrees to that (e.g., becasue a counterparty reports spam);
Everyone with such access explicitly promising to tell others at Lightcone Infrastructure when they access any private content (DMs, drafts).

^{^}
Talking to a friend about an incident made me lose trust in LW's privacy unless it explicitly promises that privacy.

Question: does LessWrong has any policies/procedures around accessing user data (e.g., private messages)? E.g., if someone from Lightcone Infrastructure wanted to look at my private DMs or post drafts, would they be able to without approval from others at Lightcone/changes to the codebase?

Here are some comments by signers:
Rep. Don Beyer

I don’t think he actually signed?

Some of those; and some people who talk to those.

Talking to many people.

Horizon Institute for Public Service is not x-risk-pilled

Someone saw my comment and reached out to say it would be useful for me to make a quick take/post highlighting this: many people in the space have not yet realized that Horizon people are not x-risk-pilled.

Edit: some people reached out to me to say that they've had different experiences (with a minority of Horizon people).

Hmm, what are you referring to?

Thanks, that’s helpful!

(Yep, it was me ranting about experiencing someone betraying my trust in a fairly sad way, who I really didn’t expect to do that, and who was very non-smart/weirdly scripted about doing it, and it was very surprising until I learned that they’ve not read planecrash. I normally don’t go around viewing anyone this way; and I dislike it when (very rarely! i can’t recall any other situations like that!) I do feel this way about someone.)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Horizon Institute for Public Service is not x-risk-pilled