I wasn't thinking of shards as reward prediction errors, but I can see how the language was confusing. What I meant is that when multiple shards are activated, they affect behavior according to how strongly and reliably they were reinforced in the past. Practically, this looks like competing predictions of reward (because past experience is strongly correlated with predictions of future experience), although technically it's not a prediction - the shard is just based on the past experience and will influence behavior similarly even if you rationally know the context has changed. E.g. the cake shard will probably still reinforce eating cake even if you know that you just had mouth-changing surgery that means you don't like cake anymore.
(However, I would expect that shards evolve over time. So in the this example, after enough repetitions reliably failing to reinforce cake eating, the cake shard would eventually stop making you crave cake when you see cake.)
So in my example, cleaner language might be: For example, I more reliably ate cake in the past if someone was currently offering me the slice of cake, compared to someone promising that they will bring a slightly better cake to the office party tomorrow. So when the "someone is currently offering me something" shard and the "someone is promising me something" shard are both activated, the first shard affects my decisions more, because it was rewarded more reliably in the past.
(One test of this theory might be whether people are more likely to take the bigger, later payout if they grew up in extremely reliable environments where they could always count on the adults to follow through on promises. In that case, their "someone is promising me something" shard should have been reinforced similarly to the "someone is currently offering me something" shard. This is basically one explanation given for the classic Marshmallow Experiment - kids waited if they trusted adults to follow through with the promised two marshmallows; kids ate the marshmallow immediately if they didn't trust adults.)
Cool, I'm happy if you're relaxing with a leisure activity you enjoy! The people I spoke with were explicitly not doing this for fun.
Time inconsistency example: You’ve described shards as context-based predictions of getting reward. One way to model the example would be to imagine there is one shard predicting the chance of being rewarded in the situation where someone is offering you something right now, and another shard predicting the chance you will be rewarded if someone is promising they will give you something tomorrow.
For example, I place a substantially better probability on getting to eat cake if someone is currently offering me the slice of cake, compared to someone promising that they will bring a slightly better cake to the office party tomorrow. (In the second case, they might get sick, or forget, or I might not make it to the party.)
I have lots of points of contact with the world, but it feels really effortful to be always mindful and noting down observations (downright overwhelming if I don't narrowing my focus to a single cluster of datapoints I'm trying to understand)
@Logan, how do you make space for practicing naturalism? It sounds like you rely on ways of easing yourself into curiosity, rather than forcing yourself to pay attention.
(Also, just saw the comment rules for the first time while copying these over - hope mindfulness mention doesn't break them too hard)
Speculating here, I'm guessing Logan is pointing at a subcategory of what I would call mindfulness - a data point-centered version of mindfulness. One of my theories of how experts build their deep models is that they start with thousands of data points. I had been lumping frameworks along with individual observations, but maybe it's worth separating those out. If this is the case, frameworks help make connections more quickly, but the individual data points are how you notice discrepancies, uncover novel insights, and check that your frameworks are working in practice.
(Copying over FB reactions from while reading) Hmm, I'm confused about the Observation post. Logan seems to be using direct observation vs map like I would talk about mindfulness vs mindlessness. Except for a few crucial differences: I would expect mindfulness to mean paying attention fully to one thing, which could include lots of analysis/thinking/etc. that Logan would put in the map category. It feels like we're cutting reality up slightly differently.
Trying to sit with the thought of the territory as the thing that exists as it actually is regardless of whatever I expect. This feels easy to believe for textbook physics, somewhat harder for whatever I'm trying to paint (I have to repeatedly remind myself to actually look at the cloud I'm painting), and really hard for psychology. (Like, I recently told someone that, in theory, if their daily planning is calibrated they should have days where they get more done than planned, but in practice this gets complicated because of interactions between their plans and how quickly/efficiently they work.)
Yet, to the best of my knowledge, psychology isn't *not* physics... It's just that we humans aren't yet good enough at physics to understand psychology.
I feel like being the code master for Codenames is a good exercise for understanding this concept.