While the particulars of your argument seem to me to have some holes, I actually very much agree with your observation we don't know what the upper limit of properly orchestrated Claude instances are, and that targeted engineering of Claude-compatible cognitive tools could vastly increase its capabilities.
One idea I've been playing with for a really long time is that the Claudes aren't the actual agents, but instead just small nodes or subprocesses in a higher-functioning mind. If I loosely imagine a hierarchy of Claudes, each corresponding roughly to system-1 or subconscious deliberative processes, with the ability to write and read to files as a form of "long term memory/processing space" for the whole system, and I imagine that by some magical oracle process they coordinate/delegate as well as Claudes possibly can, subject to a vague notion of "how smart Claude itself is", I see no reason a system like this can't already be an AGI, and cannot in principle be engineered into existence using contemporary LLMs.
(However, I will say that this thing sounds pretty hard to actually engineer, i.e, it being "just an engineering problem" doesn't mean it would happen soon, but OTOH maybe it could if people would try the right approach hard enough. I can't imagine a clean way of applying optimization pressure to the Claudes in any such setup that isn't an extremely expensive and reward-sparse form of RL.)
(paraphrasing would be a markov kernel here, and with the transitivity property I mentioned earlier, I'm asking that achieves its stationary distribution in one iteration)
for this condition, if you also want symmetricity, this is a very strong condition; you'd only accept "lossless paraphrasings". i think not only are you achieving the stationary distribution in one iteration but the distribution cannot change, so this is either a markov kernel for every semantically different phrase, or not-markov.
There is some danger in this suggestion: it can improve the situational awareness of the LLM.
Why?
i think compute and networking speeds are honestly enough that most people struggle to take advantage of more of those things (streaming video is about the most data-intensive thing a lot of people do, and what's above that is mostly actual computational tasks), so it would take (significant) additional innovations in figuring out how to convert these things into better experiences in order for this to be tenable. it seems a lot of the time that the line is usually drawn somewhere around gaming enthusiasts (e.g there is a cohort of people who will buy a more powerful smartphone so it can render graphics better so they can game on their phones more enjoyably, same for the display). this could be because economic incentives towards innovations in compute still favor commoditizable things, since compute is more generally useful (for the amount of work you could employ to make phones better for a small contingent of people who would buy them, you could just make some similarly advanced/complex system better for some industrial/trad-tech purpose and make way more money)
i think there is a false premise assumed here. a lot of products are not luxury products because of superior quality that can be innovated past. it's primarily raw signaling value, sort of fiated by the brand.
alcohol is possibly cheating, because it's not really food and an expensive luxury category of its own? (but even there, really, most of the wine all the way at the highest end is like $1000 for bottles you'd consume semi-regularly, and about $10 at the cheap end).
naively, fast food is about $30 a day (fast food is actually kinda expensive, i guess the cheapest things you can do, bulk beans and rice sorts of things, are about one OOM cheaper). i think $1000 a day is actually more than enough to afford an entire person to do skilled work all of the time, although you could push it upwards a bit more if they were truly elite at the thing that they do, but probably not one OOM more.
(accounting tends to get tricky up there because I imagine a big part of the value proposition for your worker past a certain point is a sort of security, access to other commodities/conveniences, connections, etc.)
but to the original point, i do really struggle to imagine how an iphone is meant to provide you more value. i think a lot of what an iphone is supposed to do for you in terms of productivity is better achieved by other means, and it's hard to improve for entertainment at its form factor. as for signaling value, here are two websites that sell phone cases in the $10k range, as a sort of jewelry.
https://caviar.global/ https://leronza.com/24k-gold-luxury-samsung-galaxy-z-fold7/
i guess you could switch between these regularly, as with jewelry, to effectively recover another OOM.
sadly don't have any lemborexant, so can't compare; i originally picked daridorexant naively due to its shorter half-life, thinking this corresponded to less daytime tiredness.
my naive understanding was actually also that lemborexant should be the one better at keeping you asleep, so it's interesting to hear that it doesn't seem to do that at all for you.
failing to Be Deliberate
One fun consequence of defining this concept is that now also when you try hard and you don't succeed, you can feel bad for failing to be deliberate.
have you tried daridorexant?
I think I see the logic. Were you thinking of making the model good at answering questions whose correct answer depend on the model itself, like "When asked a question of the form x, what proportion of the time would you tend to answer y?"
The previous remark about being a microscope into its dataset seemed benign to me, e.g, if the model were already good at answering questions like "What proportion of datapoints satisfying predicates X satisfy predicate Y?"
But perhaps you also argue that the latter induces some small amount of self-awareness -> situational awareness?