Just this guy, you know?
I reject essentialism and I'm very aware of attribution bias, both of which make it hard for me to accept that in most cases the wrong people are to blame, rather than bad culture and bad interactions (which you are part of, if you're there).
Roughly to the same extent that you have power to keep people out, you can ALSO influence behaviors of people you let in. Show them the better way. Share your soul in the game. Validate their soul in the game. Keep the conversations about impact and good (positive-sum aspects of the organization) rather than about relative position and authority (zero-sum).
Of course, some people start out closer than others to your preferred behaviors, and you really should _also_ keep the most-distant-from-desired out. I don't actually mean to say that everyone is fungible or equally valued to your purposes.
Hmm. The focus on organization size, nor on "keeping the wrong people" out (keeping the wrong behaviors out, I can get behind) isn't working for me. I think the relevant failure dimensions are about NOT having object-level objectives, and the resulting belief (and truth) that personal success is through the appearance of success to your supervisors.
My main advice for avoiding it is: do something real! Whether that's entertaining people for hours by shipping your game, reducing shipping costs by calculating better warehouse locations, picking a better investment by using more/faster data, manufacturing mosquito nets that use less material or are slightly more durable, or anything else, it has to be in some way measured by outside forces.
Market discipline is incredibly powerful, and very hard to fool for very long. You probably _DO_ need to be aware of politics in any organization with more than 3 people, and more aware in larger ones. But as long as you're making object-level contributions, and those around you are primarily talking about that rather than the politics, you're in an OK place.
Seems reasonable. It also seems reasonable to predict others' future actions based on BOTH someone's intentions and their ability to understand consequences. You may not be able to separate these - after the third time someone yells "FIRE" and runs away, you don't really know or care if they're trying to cause trouble or if they're just mistaken about the results.
grin! I love this kind of analysis, thank you for challenging my assumptions!
But I'm confused why you're eating whipped cream rather than plain butter. Given you're only looking at one dimension (cost per calorie), why not go even cheaper?
Just to spoil the answer - it's the same as why you might prefer fancy butter to whipped cream, or to cheap butter: the composition is actually different. Cream still has some whey in it, and whipping it just suspends it, where fully churning or overwhipping it separates it out so the remainder is purer fat.
And fancier butter is basically starting with a richer-fat cream, and processing it such that the butterfat is purer than in cheap butter.
In summary: butter is better than whipped cream (for some things)? no whey!
You can also model the agent as failing to learn that its "unpredictable randomness" isn't. It's still the case that the simple analysis of "agents which can't learn a true fact will fail in cases where that something matters" is good enough.
I think it's not that the reward function is insufficient, it's the deeper problem that the situation is literally undefined. Can you explain why you think there _IS_ a "true" factor? Not "can a learning system find it", but "is there something to find"? If all known real examples have flags, flatness, and redness 100% correlated, there is no real preference for which one to use in the (counterfactual) case where they diverge. This isn't sampling error or bias, it's just not there.
A lot of these examples are distinct from Roko's idea, in that they are self-reinforcing, but generally through other mechanisms than distinguishing supporters from non-supporters and targeting those groups specifically.
There's a pretty strong governance norm (and in many cases constitutional protection) against this kind of segregation and targeting, at least in nominally-free democratic societies. A politician who puts opponents in jail JUST because they are opposed (or proposes a law that punishes ONLY those who oppose it) won't last long in most civil societies. In fact, the ability to do stuff like this is a pretty strong indicator that civility is a sham in that area.
This, of course, doesn't apply to a hypothetical all-powerful AI, as it doesn't really care about democratic support or what its' subjects think.
Useful observation/reminder. I might generalize it to "there is a conflict between expectations/preferences of the users and the implementer". It's not necessarily the policy/door/game that is in the wrong; there may be legitimate but un-obvious or conflicted reasons for the choice. Sometimes users really do need to go through the education (and perhaps grieving) process in order to match reality better.
I very much sympathize with the person who's sick of having the same conversation, and it happens with good policies as well as with bad - there's no information about the quality of the policy in the complaints and confusion. There _IS_ evidence that the policy rollout process is flawed. Controversial or unpleasant policies need to brought out by senior management first, so they can answer questions (or punish troublemakers, depending on maziness of the org).
Didn't mean to condescend, I was mostly pointing out that complexity is in the iteration of simple rules with a fairly wide branching. I will still argue that all the heuristics and evaluation mechanisms used by standard engines are effectively search predictions, useful only because the full search is infeasible, and because the full search results have not been memoized (in the not-so-giant-by-today's-standards lookup table of position->value).
I may not have been clear enough. The evaluation _IS_ a search. The value of a position is exactly the value of a min-max adversarial search to a leaf (game end).
Compression and caching and prediction are ways to work around the fact that we don't actually have the lookup table available.