LESSWRONG
LW

Zach Stein-Perlman
10119Ω3868465012
Message
Dialogue
Subscribe

AI strategy & governance. ailabwatch.org. ailabwatch.substack.com. 

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Slowing AI
4Zach Stein-Perlman's Shortform
Ω
4y
Ω
279
Help me understand: how do multiverse acausal trades work?
Zach Stein-Perlman16h40

Certainly not clear to me that acausal trade works but I don't think these problems are correct.

  1. Consider a post-selection state — a civilization has stable control over a fixed amount of resources in its universe
  2. idk but feels possible (and just a corollary of the model the distribution of other civilizations that want to engage in acausal trade problem)
Reply
xAI's Grok 4 has no meaningful safety guardrails
Zach Stein-Perlman1d40

Update: xAI says that the load-bearing thing for avoiding bio/chem misuse from Grok 4 is not inability but safeguards, and that Grok 4 robustly refuses "harmful queries." So I think Igor is correct. If the Grok 4 misuse safeguards are ineffective, that shows that xAI failed at a basic safety thing it tried (and either doesn't understand that or is lying about it).

I agree it would be a better indication of future-safety-at-xAI if xAI said "misuse mitigations for current models are safety theater." That's just not its position.

Reply
Zach Stein-Perlman's Shortform
Zach Stein-Perlman3d119

I think I'd prefer "within a month after external deployment" over "by the time of external deployment" because I expect the latter will lead to (1) evals being rushed and (2) safety people being forced to prioritize poorly.

Reply
Zach Stein-Perlman's Shortform
Zach Stein-Perlman3d30

Thanks. Sorry for criticizing without reading everything. I agree that, like, on balance, GDM didn't fully comply with the Seoul commitments re Gemini 2.5. Maybe I just don't care much about these particular commitments.

Reply
Zach Stein-Perlman's Shortform
Zach Stein-Perlman3d20

I agree. If Google wanted to join the commitments but not necessarily publish eval results by the time of external deployment, it should have clarified "we'll publish within 2 months after external deployment" or "we'll do evals on our most powerful model at least every 4 months rather than doing one round of evals per model" or something.

Reply
Zach Stein-Perlman's Shortform
Zach Stein-Perlman3dΩ1525-12

Some of my friends are signal-boosting this new article: 60 U.K. Lawmakers Accuse Google of Breaking AI Safety Pledge. See also the open letter. I don't feel good about this critique or the implicit ask.

  1. Sharing information on capabilities is good but public deployment is a bad time for that, in part because most risk comes from internal deployment.
  2. Google didn't necessarily even break a commitment? The commitment mentioned in the article is to "publicly report model or system capabilities." That doesn't say it has to be done at the time of public deployment.
    1. The White House voluntary commitments included a commitment to "publish reports for all new significant model public releases"; same deal there.
    2. Possibly Google broke a different commitment (mentioned in the open letter): "Assess the risks posed by their frontier models or systems across the AI lifecycle, including before deploying that model or system." Depends on your reading of "assess the risks" plus facts which I don't recall off the top of my head.
  3. Other companies are doing far worse in this dimension. At worst Google is 3rd-best in publishing eval results. Meta and xAI are far worse.
Reply2
GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion
Zach Stein-Perlman3d20
  1. Yes
  2. Out of date
Reply
Banning Said Achmiz (and broader thoughts on moderation)
Zach Stein-Perlman9d88

Many people would be much less inclined to vote if it was fully public, so you would lose a lot of signal.

Reply
Viliam's Shortform
Zach Stein-Perlman11d2011

Disagree: the problem with rationalist cults is that they're wrong about [the effects of their behavior / whether what they're doing is optimific], not that what they're doing is Pascalian. (And empirically you should expect that subcommunities that really hurt their members do so for bad reasons.)

Reply
adamzerner's Shortform
Zach Stein-Perlman11d62

Yeah that's progress. But then you realize that ~all commitments are too weak/vague (not to mention non-standardized) and you notice that words-about-the-future are cheap and you realize you should focus on stuff other than commitments.

Reply
Load More
51AI companies have started saying safeguards are load-bearing
Ω
5d
Ω
2
15ChatGPT Agent: evals and safeguards
1mo
0
33Epoch: What is Epoch?
2mo
1
15AI companies aren't planning to secure critical model weights
2mo
0
207AI companies' eval reports mostly don't support their claims
Ω
3mo
Ω
13
58New website analyzing AI companies' model evals
3mo
0
72New scorecard evaluating AI companies on safety
3mo
8
71Claude 4
3mo
24
36OpenAI rewrote its Preparedness Framework
5mo
1
241METR: Measuring AI Ability to Complete Long Tasks
Ω
5mo
Ω
106
Load More
Ontology
2y
(+45)
Ontology
2y
(-5)
Ontology
2y
Ontology
2y
(+64/-64)
Ontology
2y
(+45/-12)
Ontology
2y
(+64)
Ontology
2y
(+66/-8)
Ontology
2y
(+117/-23)
Ontology
2y
(+58/-21)
Ontology
2y
(+41)
Load More