Davidmanheim — LessWrong

Unless its governance changes, Anthropic is untrustworthy

I agree that many of the worldviews being promoted are unrealistic - expecting companies in the current competitive race conditions would be a competitive disadvantage.

But I also think that there are worlds where Anthropic or OpenAI as companies cared enough to ensure that they can be trusted to keep their promises. And there are industries (financial auditing, many safety critical industries,) where this is already the case - where companies know that their reputation as careful and honest actors is critical to their success. In those industries, breaking the trust is a quick path to bankruptcy.

Clearly, the need for anything like that type of trustworthiness is not true in the AI industry. Moreover, coordinating a change in the status quo might be infeasible. So again, yes, this is an unrealistic standard.

However, I would argue that high-trust another viable equilibrium, one where key firms were viewed as trustworthy enough that anyone using less-trustworthy competitors would be seen as deeply irresponsible. Instead, we have a world stuck in the low-trust competition in AI, a world where everyone agrees that uploading sensitive material to an LLM is a breach of trust, and uploading patient information is a breach of confidentiality. The only reason to trust the firms is that they likely won't care or check, and certainly not that they can be trusted not to do so. And they are right to say that the firms have not made themselves trustworthy enough for such uses - and that is part of the reason the firms are not trying to rigorously prove themselves trustworthy.

And if AI is going to control the future, as seems increasingly likely, I'm very frustrated that attempts to move towards actually being able to trust AI companies are, as you said, "based on unrealistic and naive world views."

Unless its governance changes, Anthropic is untrustworthy

Davidmanheim3d89

Regardless of whether you think the company is net positive, or working for it is valuable, are you willing to explicitly disagree with the claim that as an entity, the company cannot be trusted to reliably fulfill all the safety and political claims which it makes, or has made? (Not as in inviolably never doing anything different despite changes, but in the same sense that you trust a person not to break a promise without. e.g., explaining to those it was made to about why it thinks the original promise isn't binding, or why the specific action isn't breaking their trust.)

I think that an explicit answer to this question would be more valuable than the reasonable caveats given.

Unless its governance changes, Anthropic is untrustworthy

Davidmanheim3d2110

I agree that treating corporations or governments or countries as single coherent individuals is a type error, since it's important to be able to decompose them into factions and actors to build a good gears-level model that is predictive, and you can easily miss that. I strongly disagree that treating them as actors which can be trusted or distrusted is a type error. You seem to be making the second claim, and I don't understand it; the company makes decisions, and you can either trust it to do what it says, or not - and this post says the latter is the better model for anthropic.

Of course, the fact that you can't trust a given democracy to keep its promises doesn't mean you can't trust any of the individuals in it, and the fact that you can't trust a given corporation doesn't necessarily mean that about the individuals working for the company either. (It doesn't even mean you can't trust each of the individual people in charge - clearly, trust isn't necessarily conserved over most forms of preference or decision aggregation.)

But as stated, the claims made seem reasonable, and in my view, the cited evidence shows it's basically correct, about the company as an entity and its trustworthiness.

Security Complacency Meets Frontier AI: The Coming Collapse of ‘Secure by Apathy’

Davidmanheim6d40

He implied there that in the short term the advantage will be asymmetric, even if he's hopeful that there will eventually be a defensive advantage. (I'm agnostic on the latter, and even if he's right, I think the times scale needed for it to emerge might be longer than it will matter.) But I should have linked to his recent piece, where he says this explicity, not that older one: https://www.schneier.com/crypto-gram/archives/2025/1015.html#cg18

I'll fix that now.

Out-paternalizing the government (getting oxygen for my baby)

Davidmanheim13d148

I think you might be failing to update generally enough here about how valuable it is for governments to.regulate the practice of medicine, in terms of directly benefitting people - who are, in general terms, overconfident about their competence to make decisions in complex domains they don't understand.

Of course, this is not to say there are no harms from governments doing so, nor that they currently strike the right balance.

Are AI time horizons inherently superexponential?

Davidmanheim20d20

The phrase "time horizons" was confusing to me for about the first third of the post. Perhaps you could say "AI task time horizons" instead, or mention "AI task success rates for a given time horizon" at some point to clarify what the phrase refers to.

Otherwise, good points!

Condensation

Davidmanheim23dΩ330

Typo?

how information is used, it might to be far enough

Probably should read "might not be"

AIs should also refuse to work on capabilities research

Davidmanheim1mo20

This is a good question, albeit only vaguely adjacent.

My answer would be that winners curse only applies if firms aren't actively minimizing the extent to which they overbid. In the current scenario, firms are trying (moderately) hard to prevent disaster, just not reliably enough to succeed indefinitely. However, once they fail, we could easily be far past the overhang point for the AI succeeding.

Assuming to start, implausibly, that the AI itself is not strategic enough to consider its chances of succeeding, we'll assume AI capabilities nonetheless keep increasing. The firms can also detect and prevent it from trying with some probability, but their ability to monitor and stop it from trying is decreasing. The better AI firms are at stopping the models, and the slower that their ability declines relative to model capability, the more likely it is that when they do fail, the AI will succeed. And if the AIs are strategic, they will be much less likely to try if they are likely to either fail or be detected, so they ill wait even longer.

Musings on Reported Cost of Compute (Oct 2025)

Davidmanheim1mo70

Data. Find out the answer.

https://www.wevolver.com/article/tpu-vs-gpu-a-comprehensive-technical-comparison

Looks like they arehwitin 2x of the H200s, albeit with some complexity in details.

Musings on Reported Cost of Compute (Oct 2025)

Davidmanheim1mo20

Because it's what they can get. A factor of two or more in compute is plausibly less important than a delay of a year.

This may or may not be the case, but the argument for why it can't be very different fails.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments