One of the very admirable things about the LessWrong community is their willingness to take arguments very seriously, regardless of who put that argument forward. In many circumstances, this is an excellent discipline!
But if you're acting as a manager (or a voter), you often need to consider not just arguments, but also practical proposals made by specific agents:
Should X be allowed to pursue project Y?
Should I make decisions based on X claiming Z, when I cannot verify Z myself?
One key difference is that these are not abstract arguments. They're practical proposals involving some specific entity X. And in cases like this, the credibility of X becomes relevant: Will X pursue project Y honestly and effectively? Is X likely to make accurate statements about Z?
And in these cases, ignoring the known truthfulness of X can be a mistake.
Anyway, the secret to every analysis I’ve ever done of contemporary politics has been, more or less, my expensive business school education (I would write a book entitled “Everything I Know I Learned At A Very Expensive University”, but I doubt it would sell). About half of what they say about business schools and their graduates is probably true, and they do often feel like the most collossal waste of time and money, but they occasionally teach you the odd thing which is very useful indeed...
Good ideas do not need lots of lies told about them in order to gain public acceptance. I was first made aware of this during an accounting class...
Fibbers’ forecasts are worthless. Case after miserable case after bloody case we went through, I tell you, all of which had this moral. Not only that people who want a project will tend to make innacurate projections about the possible outcomes of that project, but about the futility of attempts to “shade” downward a fundamentally dishonest set of predictions. If you have doubts about the integrity of a forecaster, you can’t use their forecasts at all. Not even as a “starting point”...
The Vital Importance of Audit. Emphasised over and over again. Brealey and Myers has a section on this, in which they remind callow students that like backing-up one’s computer files, this is a lesson that everyone seems to have to learn the hard way. Basically, it’s been shown time and again and again; companies which do not audit completed projects in order to see how accurate the original projections were, tend to get exactly the forecasts and projects that they deserve. Companies which have a culture where there are no consequences for making dishonest forecasts, get the projects they deserve. Companies which allocate blank cheques to management teams with a proven record of failure and mendacity, get what they deserve...
The entire post is excellent, though the Iraq-specific details probably make more sense to people who read the news in 2001-2004.
Application to AI Safety
We have reached a point where different AI labs have established track records. This allows to investigate their credibility. Are any of the labs known "fibbers"? Do they have a history of making misleading statements, or of breaking supposedly binding commitments when those commitments become slightly inconvenient?
Similarly, when looking at this week's bit of AI-related politics (the confrontation between Anthropic and the leadership of the leadership of the DoD), what are the established track records of the people involved? Do they have a history of misrepresentations, or are they generally honest?
But these principles are not specific to this week. Indeed, they may not be specific to humans. If an AI model has a track record of deception (like o3 did), then we should not assume that it is aligned. The opposite, sadly, is not entirely reliable—a model with a long track record of telling the truth might be setting up for a "treacherous turn". But at least you might have a chance.
The first step of epistemic hygiene is ignoring the mouth noises (or output tokens) of entities with a track record of lying, at least as it applies to project proposals or claims of fact. As Dan Davies claimed:
If you have doubts about the integrity of a forecaster, you can’t use their forecasts at all. Not even as a “starting point”.
One of the very admirable things about the LessWrong community is their willingness to take arguments very seriously, regardless of who put that argument forward. In many circumstances, this is an excellent discipline!
But if you're acting as a manager (or a voter), you often need to consider not just arguments, but also practical proposals made by specific agents:
One key difference is that these are not abstract arguments. They're practical proposals involving some specific entity X. And in cases like this, the credibility of X becomes relevant: Will X pursue project Y honestly and effectively? Is X likely to make accurate statements about Z?
And in these cases, ignoring the known truthfulness of X can be a mistake.
My thinking on this matter was influenced by a classic 2004 post by Dan Davies, The D-Squared Digest One Minute MBA – Avoiding Projects Pursued By Morons 101:
The entire post is excellent, though the Iraq-specific details probably make more sense to people who read the news in 2001-2004.
Application to AI Safety
We have reached a point where different AI labs have established track records. This allows to investigate their credibility. Are any of the labs known "fibbers"? Do they have a history of making misleading statements, or of breaking supposedly binding commitments when those commitments become slightly inconvenient?
Similarly, when looking at this week's bit of AI-related politics (the confrontation between Anthropic and the leadership of the leadership of the DoD), what are the established track records of the people involved? Do they have a history of misrepresentations, or are they generally honest?
But these principles are not specific to this week. Indeed, they may not be specific to humans. If an AI model has a track record of deception (like o3 did), then we should not assume that it is aligned. The opposite, sadly, is not entirely reliable—a model with a long track record of telling the truth might be setting up for a "treacherous turn". But at least you might have a chance.
The first step of epistemic hygiene is ignoring the mouth noises (or output tokens) of entities with a track record of lying, at least as it applies to project proposals or claims of fact. As Dan Davies claimed: