Happy to try to clarify, and this is helping me rethink my own thoughts, so appreciate the prompts. I'm playing with new trains of thought here and so have pretty low confidence in where I ended up, so greatly appreciate any further clarifications or responses you have.
There's a standard trick for scoring an uncertain prediction: It outputs its probability estimate p
Yup, understand that is how to effectively score uncertainty. I was very wrong to phrase this as "we still have to have some framework to map uncertainty to a state" because you don't strictly have to do anything, you can just use probabilities.
Restricting this to discrete, binary states allows us to simplify the comparison between models for this discussion. I will claim we can do so with no loss of fidelity (leaning heavily on Shannon, ie, this is all just information, encoding it to binary and back out again doesn't mess anything up). And doing so is not obliged, but useful.
I really shouldn't have said "you must X!" I should have said "it's kind of handy if you X," sorry for that confusion.
You're saying that giving it less information (by replacing its camera feed with a lower quality feed) is equivalent to sometimes lying to it? I don't see the equivalence!
We have a high quality information stream and a low quality information stream, and they both gesture vaguely at the ultimate high quality information stream, namely, the true facts of the matter of the world itself. Say, LQ < HQ < W.
LQ may be low quality because it is missing information in HQ, it may just be a subset of HQ, like a lower resolution video. Or it may have actual noise, false information.
If we have a powerful algorithm, we may be able to, at least asymptomatically, convert LQ to HQ, using processing power. So maybe in some cases LQ + processing = HQ exactly. But that makes the distinction uninteresting, and you would likely have to further degrade v′1 to get the effect you are looking for, so let's discard that and consider only cases where v′1 is strictly worse.
You can now use a NAND to sort the outputs of LQ and HQ into two buckets:
So for bucket 1, there are aspects of the world where there's effectively no loss in quality. But comparing HQ with HQ is not useful, so let's discard those cases, and examine the corners where LQ and HQ disagree.
LQ effectively has false information about some subset of reality there, that is in a sense what "LQ" means.
(Or just has gaps, which resolve to approximate HQ after processing, or fail and resolve to noise, either way.)
if you overfit on preventing human simulation, you let direct translation slip away
Rereading, I think HoldenK started down this path, "once the predictor is good enough that it can get data points right despite missing crucial information, it is also (potentially) good enough that it can learn how to imitate "what the human would think had happened if they had more information.""
So for your block -- in a sense you're giving the human some information the predictor lacks. You're giving the human "hints," in the form of higher quality input, which helps get the human closer to perfectly representing the actual world. (Not completely, sometimes there's still uncertainty, but closer than the predictor is likely to get.)
If that gets the human to "perfect", then the best the predictor can do is asymptotically approach human prediction and direct translation at the same time.
My Weak Spots
I think one likely objection to what I wrote here is that I am abusing Shannon. I've considered that, would be happy to discuss it more and carefully consider objections along those lines, but I think toy examples would get us there. And without taking away from your notes about how "Sometimes the predictor’s probability is strictly between 0 and 1, so it gets some loss." If p(I eat soup) is 0.6 for all days, let's just ask ten discrete questions, "across n days the number of soups I eat will converge to n/1? (T/F), n/2? (T/F), ..." I would definitely try to preserve performance and scoring, I just want to run the NAND.
I think another likely objection is that when we apply models, trying to get m(HQ) = ~W, then it relies on interactions of states in complex ways where we can't slice them randomly into two groups without disrupting how models work at the basic level. I think the response is to simply group these states into bigger subsets of outcomes and treat those as atomic.
I think the biggest and most important objection would be that I've misunderstood your block. I would welcome any clarifications, and especially appreciate a toy example if you could, even if not involving diamonds, just to make sure I definitely get what you're saying in that part.
I'd be interested in other objections or weak spots here, appreciate your time helping me to think this through more carefully and completely.
This is really interesting.
To understand this more thoroughly I'm simplifying the high and low quality video feeds to lists of states that correspond to reality. (This simplification might be unfair so I'm not sure this is a true break of your original proposal, but I think it helped me think about general breaking strategies.)
Ok, video feeds compressed to arrays:
We consider scenarios in fixed order. If the diamond is present, we record a 1, and if not, a 0. The high quality feed gives us a different array than the low quality mode (otherwise the low quality mode is not helpful). E.g., High reports: (1,0,1,1,0, ...); Low: (1,0,1,?,0,...)
There are two possible ways that gap can get resolved.
In case one, the low quality predictor has a powerful enough model of reality to effectively derive the High quality data. (We might find this collapses to the original problem, because it has somehow reconstructed the high quality stream from the low quality stream, then proceeds as normal. You might argue that's computationally expensive, ok, then let's proceed to case two.)
In case two, the low quality datafeed predictor predicts wrongly.
(I know you are saying it predicts *uncertainly,* but we still have to have some framework to map uncertainty to a state, we have to round one way or the other. If uncertainty avoids loss, the predictor will be preferentially inconclusive all the time. If we round uncertainty up, effectively we're in case one. If we round down, effectively case two.)
So we could sharpen case two and say that sometimes the AI's camera intentionally lies to it on some random subset of scenarios. And the AI finds itself in a chaotic world where it is sometimes punished for predicting what it just knows to be true things.
In that case, although it's easy to show how it would diverge from human simulation, it also might not simulate reality very well either, since deriving the algorithm generating the lies might be too computationally complex. (Or maybe it can derive and counter the liar, in which case we're back at case 1, ie, the original problem.) If liar simulation is impossible, then the optimal predictor might just hit a ceiling and accepting some level of noise. Effectively this means we have a new problem -- there is no direct translation possible, because the predictor is viewing a "different" world than the human.
I simplified your construct, possibly unfairly, and maybe that's a way you can salvage your original build. But this was a really illuminating exercise for me to generalize the strategy.
I think there are some classes of builds (maybe yours escapes this) where if you overfit on preventing human simulation, you let direct translation slip away. And then if you rehabilitate direct translation, you have to reexamine if there's an escape for human simulation. This sort of disjunctive analysis seems like an important strategy for adversarial breakers.
You still may be able to get the bedsheet over both corners, but I think other breakers in general will want to start with some disjunctive approach like this in other cases.
"A number of commenters, yesterday, claimed that the preference pattern wasn't irrational because of "the utility of certainty", or something like that. One commenter even wrote U(Certainty) into an expected utility equation."
It was not my intent to claim "the preference pattern wasn't irrational," merely that your algebraic modeling failed to capture what many could initially claim was a salient detail of the original problem. I hope a reread of my original comment will find it pleading, apologetic, limited to the algebraic construction, and sincere.
I should have mentioned that I thought the algebraic modeling was a very elegant way to show that the diminishing marginal utility of money was not at play. If that was its only purpose, then the rest of this is unnecessary, but I think you can use that construction to do more, with a little work.
Here's one possible response to this apparent weakness in the algebraic modeling:
If you can simply assert that Allais's point holds experimentally for arbitrarily increasing values in place of $24k and $27k (which I'm sure you can), then we find this proposed "utility of certainty" (or whatever more appropriate formulation you prefer*) increasing with no upper bound. The notion that we value certainty seems to hold intuitive appeal, and I see nothing wrong with that on its face. But the notion that we value certainty above all else is more starkly implausible (and I would suspect demonstrably untrue: would you really give your life just to become certain of the outcome of a coinflip?).
I was trying to make the argument stronger, not weaker, but I get the impression I've somehow pissed all over it. My apologies.
*I've read your post on Terminal Values three times and haven't yet grokked why I can't feed things like knowledge or certainty into a Utility function. Certainty seems like a "fixed, particular state of the world," it seems like an "outcome," not a "action," and most definitely unlike "1." If the worry is that certainty is an instrumental value, not a terminal value, why couldn't one make the same objection of the $24,000? Money has no inherent value, it is valuable only because it can be spent on things like chocolate pizza. You've since replaced the money with lives, but was the original use of money an error? I suspect not... but then what is the precise problem with U(Certainty)?
I should clarify that, once again, I bring up these objections not to show where you've gone wrong, but to show where I'm having difficulties in understanding. I hope you'll consider these comments a useful guide as to where you might go more slowly in your arguments for the benefit of your readers (like myself) who are a bit dull, and I hope you do not read these comments as combative, or deserving of some kind of excoriating reply.
I'll keep going over the Terminal Values post to see if I can get it to click.
I think I missed something on the algebraic inconsistency part...
If there is some rational independent utility to certainty, the algebraic claims should be more like this:
This seems consistent so long as U(Certainty) > 1/34 U($27,000).
I'm not committed to the notion there is a rational independent value to certainty, I'm just not seeing how it can be dismissed with quick algebra. Maybe that wasn't your goal. Forgive me if this is my oversight.
The Mantis Shrimp (http://en.wikipedia.org/wiki/Mantis_shrimp) forms a crude wheel to maneuver on land.
But I still can't think of any examples of wheels in nature that use axles and are large enough to be more than a free floating rotating object. Maybe this seems an arbitrary threshold, but I think usually when we marvel at the wheel, we're marvelling at axles, and their ability to support weight and radically reduce friction when moving big heavy things, all while holding the object basically level. While the cellular turbines that power us are pretty fascinating in their own right, it'd be interesting if anyone could think of biological wheels with axles that were a bit bigger. So far, I can't think of any outside of science fiction.*
*Pullman's "The Amber Spyglass" even gives a somewhat plausible evolutionary background for his axled and wheeled Mulefa. Any others?