This is a special post for quick takes by damiensnyder. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
1 comment, sorted by Click to highlight new comments since: Today at 7:39 PM

I have only a broad overview of AI, AI risk, and the topics surrounding it. I've encountered the idea that superintelligent AI picking an optimal future is likely to be lured by "siren worlds," which are bad but are optimized to seem optimal. I vaguely grasped why this might happen, but I didn't give the theory much credence. (Mainly I thought it was less likely that a bad world could seem optimal than that a good world could seem optimal.) However, I just discovered this comment:

This optimistic conjecture could be tested by looking to see what image maximally triggers a ML classifier. Does the perfect cat, the most cat-like cat according to ML actually look like a cat to us humans? If so, then by analogy the perfect utopia according to ML would also be pretty good. If not...

I have seen images that are "maximum possible score for cat," and they don't look like cats. Usually they look like mutant cats with three faces and five eyes. This question and the "siren world" concept seem to relate to each other somewhat. Is the connection between image classifiers and siren worlds:

  • evidential (i.e., image classifiers should strengthen my belief that superintelligent AI would be lured by siren worlds);
  • not evidential, but useful as an analogy, because the scenarios share similar causes;
  • useful as an analogy, though the technical concerns of the two scenarios are essentially unrelated; or
  • not applicable?