# Wiki Contributions

Yeah that's correct on both counts (that does seem like an important distinction, and neither really match my experience, though the former is more similar).

I spent about a decade at a company that grew from 3,000 to 10,000 people; I would guess the layers of management were roughly the logarithm in base 7 of the number of people. Manager selection was honestly kind of a disorganized process, but it was basically: impress your direct manager enough that they suggest you for management, then impress your division manager enough that they sign off on this suggestion.

I'm currently somewhere much smaller, I report to the top layer and have two layers below me. Process is roughly the same.

I realized that I should have said that I found your spotify example the most compelling: the problems I see/saw are less "manager screws over the business to personally advance" but rather "helping the business would require manager to take a personal hit, and they didn't want to do that."

For what it's worth, I think a naïve reading of this post would imply that moral mazes are more common than my experience indicates.

I've been in middle management at a few places, and in general people just do reasonable things because they are reasonable people, and they aren't ruthlessly optimizing enough to be super political even if that's the theoretical equilibrium of the game they are playing.[1]

1. ^

This obviously doesn't mean that they are ruthlessly optimizing for the company's true goals though. They are just kind of casually doing things they think are good for the business, because playing politics is too much work.

FYI I think your first skepticism was mentioned in the safety from speed section; she concludes that section:

These [objections] all seem plausible. But also plausibly wrong. I don’t know of a decisive analysis of any of these considerations, and am not going to do one here. My impression is that they could basically all go either way.

She mentions your second skepticism near the top, but I don't see anywhere she directly addresses it.

think about how humans most often deceive other humans: we do it mainly by deceiving ourselves... when that sort of deception happens, I wouldn't necessarily expect to be able to see deception in an AI's internal thoughts

The fact that humans will give different predictions when forced to make an explicit bet versus just casually talking seems to imply that it's theoretically possible to identify deception, even in cases of self-deception.

Basic question: why would the AI system optimize for X-ness?

I thought Katja's argument was something like:

1. Suppose we train a system to generate (say) plans for increasing the profits of your paperclip factory similar to how we train GANs to generate faces
2. Then we would expect those paperclip factory planners to have analogous errors to face generator errors
3. I.e. they will not be "eldritch"

The fact that you could repurpose the GAN discriminator in this terrifying way doesn't really seem relevant if no one is in practice doing that?

Thanks for sharing this! Could you make it an actual sequence? I think that would make navigation easier.

As one example: YCombinator companies have roughly linear correlation between exit value and number of employees, and basically all companies with $100MM+ exits have >100 employees. My impression is that there are very few companies with even$1MM revenue/employee (though I don't have a data set easily available).