LESSWRONG
LW

Zach Stein-Perlman

AI strategy & governance. ailabwatch.org. ailabwatch.substack.com.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Newest

Help me understand: how do multiverse acausal trades work?

Zach Stein-Perlman16h40

Certainly not clear to me that acausal trade works but I don't think these problems are correct.

Consider a post-selection state — a civilization has stable control over a fixed amount of resources in its universe
idk but feels possible (and just a corollary of the model the distribution of other civilizations that want to engage in acausal trade problem)

xAI's Grok 4 has no meaningful safety guardrails

Zach Stein-Perlman1d40

Update: xAI says that the load-bearing thing for avoiding bio/chem misuse from Grok 4 is not inability but safeguards, and that Grok 4 robustly refuses "harmful queries." So I think Igor is correct. If the Grok 4 misuse safeguards are ineffective, that shows that xAI failed at a basic safety thing it tried (and either doesn't understand that or is lying about it).

I agree it would be a better indication of future-safety-at-xAI if xAI said "misuse mitigations for current models are safety theater." That's just not its position.

Zach Stein-Perlman's Shortform

Zach Stein-Perlman3d119

I think I'd prefer "within a month after external deployment" over "by the time of external deployment" because I expect the latter will lead to (1) evals being rushed and (2) safety people being forced to prioritize poorly.

Zach Stein-Perlman's Shortform

Zach Stein-Perlman3d30

Thanks. Sorry for criticizing without reading everything. I agree that, like, on balance, GDM didn't fully comply with the Seoul commitments re Gemini 2.5. Maybe I just don't care much about these particular commitments.

Zach Stein-Perlman's Shortform

Zach Stein-Perlman3d20

I agree. If Google wanted to join the commitments but not necessarily publish eval results by the time of external deployment, it should have clarified "we'll publish within 2 months after external deployment" or "we'll do evals on our most powerful model at least every 4 months rather than doing one round of evals per model" or something.

Zach Stein-Perlman's Shortform

Zach Stein-Perlman3dΩ1525-12

Some of my friends are signal-boosting this new article: 60 U.K. Lawmakers Accuse Google of Breaking AI Safety Pledge. See also the open letter. I don't feel good about this critique or the implicit ask.

Sharing information on capabilities is good but public deployment is a bad time for that, in part because most risk comes from internal deployment.
Google didn't necessarily even break a commitment? The commitment mentioned in the article is to "publicly report model or system capabilities." That doesn't say it has to be done at the time of public deployment.
1. The White House voluntary commitments included a commitment to "publish reports for all new significant model public releases"; same deal there.
2. Possibly Google broke a different commitment (mentioned in the open letter): "Assess the risks posed by their frontier models or systems across the AI lifecycle, including before deploying that model or system." Depends on your reading of "assess the risks" plus facts which I don't recall off the top of my head.
Other companies are doing far worse in this dimension. At worst Google is 3rd-best in publishing eval results. Meta and xAI are far worse.

GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion

Zach Stein-Perlman3d20

Yes
Out of date

Banning Said Achmiz (and broader thoughts on moderation)

Zach Stein-Perlman9d88

Many people would be much less inclined to vote if it was fully public, so you would lose a lot of signal.

Viliam's Shortform

Zach Stein-Perlman11d2011

Disagree: the problem with rationalist cults is that they're wrong about [the effects of their behavior / whether what they're doing is optimific], not that what they're doing is Pascalian. (And empirically you should expect that subcommunities that really hurt their members do so for bad reasons.)

adamzerner's Shortform

Zach Stein-Perlman11d62

Yeah that's progress. But then you realize that ~all commitments are too weak/vague (not to mention non-standardized) and you notice that words-about-the-future are cheap and you realize you should focus on stuff other than commitments.