PSA: reward is part of the habit loop too

Alok Singh

PSA: reward is part of the habit loop too — LessWrong

22 PSA: reward is part of the habit loop too

by Alok Singh

5th Jan 2023

1 min read

22

The usual setup of a habit is

cue
craving
routine
reward

Among my social circle, rewarding oneself seems to be lost by the wayside. Sadly funny for a crowd of people into reinforcement learning, since they're trying to skip the reinforcement and wondering why it doesn't stick¹. My reward is usually reading fiction or playing a video game. For brushing my teeth, it's the nice shiny feeling at the end, which I make a point of noticing. For going to the gym, it's the steam room after.

Figure out what you like. Anything can be a reward as long as it feels good. Don't try to logic yourself into wanting "the right rewards". Has that worked before? Surrender. Let the soft animal of your body love what it loves.

RL without the reward is an environment (states, actions, transitions) that's Markovian. It has a syntax (things are happening), but no semantics (without a reward function, there's no interpretation of whether something is good or bad). ↩

Practical

Frontpage

22

PSA: reward is part of the habit loop too

21noggin-scratcher

3[anonymous]

New Comment

Rendering 1/2 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:55 PM

[-]noggin-scratcher4y2113

My reward is usually reading fiction or playing a video game.

How do you avoid noticing that you could do those things without doing the habit first?

[-][anonymous]4y32

I don't think that matters. If the purpose of the reward was to bribe yourself to do it (i.e. consciously thinking "I'll go to the gym because that way I'll get to read some fiction afterwards"), then yes, you'd have to find some way of withholding the fiction until you've been to the gym. But I think the behaviourist sense of "reward" is different; it's to reinforce the behaviour by creating a pleasant feeling which the brain then associates with the prior action.

To illustrate, I once tried to use this method to improve my punctuality (I wish I could say it worked, but I didn't keep it up for long enough). I had a bag of sweets and if I got set up to join a virtual meeting 5 minutes before the start I would eat one. My friend said "If that was me I wouldn't be able to stop myself from eating them at other times." I said "Well if I do I'll buy some more! It's my punctuality I'm trying to improve, not my waistline".

Moderation Log