http://measureofdoubt.com/2011/04/12/pulling-levers-killing-monsters-the-lure-of-unpredictable-rewards/ (how do I put a link like this in a word with blue letters?)

I've read that unpredictable rewards associated with a behavior actually encourage that behavior more effectively than consistent rewards.

The optimal habit-forming figure given in the link above is a 25% chance of reward for each instance of performing the behavior.

My hypothesis then, is that if I want to establish a habit by rewarding myself upon successfully performing a certain task, I should reward myself only 25% of the time if I want to ingrain the habit as forcefully as possible into my unconscious.

 

Anyone else think so, or have any other research to add?

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 6:19 AM
[-][anonymous]9y50

That's known as a VR 4 schedule (variable-ratio 4) because the behavior is rewarded an average of every four times the correct response is given. Variable schedules maximize what is known as resistance to extinction; the probability a behavior will decrease in frequency goes down. Continuous schedules are best for establishing a new behavior. I would expect they use continuous reinforcement whenever a new skill is being learned in the game.

Upvote for content, but I think that there's a typo in your second sentence

Variable schedules maximize what is known as resistance to extinction, the probability a behavior will decrease in frequency goes down. Perhaps a semicolon instead of a comma, or "as frequency of rewards ... " instead of "in frequency ...", was intended?

[-][anonymous]9y00

Fixed

https://en.wikipedia.org/wiki/Reinforcement#Intermittent_reinforcement

Please use Open Thread next time for similar questions.

I had this at uni. It was a long time ago, so I can't really provide references.

If you want to learn something new, you need to reinforce each time

When you already know you can do the whole thing, then it's a good idea to start intermittent reinforcement. It should gradually go from 100% to 0% rewards, so you could e.g. take a d10 and roll it; first week anything over 1 gives you the reward, second week - anything over 2 etc. The die is essential, you need to randomize the reward, not just say "every 1 out of 4 gets a reward" - that in fact works worse than 100% rewards.

I used it to potty train my kids, worked like a charm.

I don't understand what "similar questions" means. I don't see any problem with this question.

how do I put a link like this in a word with blue letters?

Type the text that you want to appear; select it; click the button in the LW article editor with a picture of some chain links. You'll get a popup box in which you can type the URL (and also, if you want, some "caption" text that will appear when readers hover their mouse pointer over the link).

This is completely different from how you get links in comments (text in square brackets immediately followed by URL in parentheses).

You cannot impose a reward schedule on yourself...

I complete a task, I roll a d4 once, and in case of success I eat a chocolate. What exactly is the problem with this?

You could in principle very easily ignore the dice and eat the chocolate regardless. You need to take it upon yourself to follow through with the scheme and forfeit the chocolate 3 times out of 4. If you start with the understanding that chocolate is a possibility 4 times out of 4 if you followed a more permissive scheme, then you are effectively punishing yourself 3/4 of the time, which I expect would work as negative reinforcement for said task or the reward scheme in general. And it would also require enough willpower, which some people won't have.

If you start with the understanding that chocolate is a possibility (...) then you are effectively punishing yourself 3/4 of the time

This makes sense and feels correct to me.

There were studies on than with kittens or pupps and it seems that in fact this works like this:

"When I do this trick, I get a reward! Let's do the trick! I didn't get the reward? Maybe I should try again! I got the reward, the world works like it should, yay! Let's get another one! No reward? Maybe they didn't notice? I have to try harder! Still no reward? Let's try again, I'm sure I'll get it this time".

The puppies noticed the reward, not the punishment. If it was as regular as one out of four, they would notice this regularity and act according to the expectation of the result - not try when a punishment was due and try when a reward was due.

If you start with the understanding that chocolate is a possibility (...) then you are effectively punishing yourself 3/4 of the time.

This seems counter-intuitive to me. I do like chocolate, a lot, so I do not eat chocolate every chance I get -- that wouldn't end well. I have to pick some way to choose when I get chocolate, and my usual method (and any proposed method that may involve dice, really) denies me chocolate more than just 75% of the times that I would like chocolate. So why not use an arbitrary but useful method of choosing when I get chocolate? I'm not going to be very disappointed when I roll "not chocolate", because I am usually in the "not chocolate" state by default...

On the other hand, I do understand why this system might not be good; increasing chocolate intake is not ideal, or I would be doing it anyway. So this reward system should be short term, not long term. But I think it would be motivating (for me).

I have observed that when gamblers know the odds, they don't gamble less. But my sample size is low.

I am not a psychologist, but it seems at least plausible that getting a chocolate one time in four because you saw yourself roll a die and chose whether to eat a chocolate depending on the result might have different motivational effects than getting a chocolate one time in four according to some mechanism competely opaque to you.

d4 are bitches to roll! A d8 is much more fulfilling...

It seems that in the case of casino's unpredictability is an important part. If you make a conscious choice to reward yourself every 4th times you lose that element of unpredictability.

I would also expect that the optimal percentage is context dependent and the 25% don't generalize to every task and reward.