Me and M&Ms

by coyotespike1 min read2nd Aug 201412 comments


Personal Blog

Ah, delicious dark chocolate M&Ms, colorfully filling a glass jar with your goodness. How do I love thee? About four of you an hour. Here's a brief rundown of my most recent motivation hacking experiment. 

1. Gwern has an interesting article arguing that Massive Open Online Courses (MOOCs) may shift the learning advantage from intelligence toward conscientiousness (actually he's not sure about the intelligence part). This shift occurs because MOOCs select for higher-quality instruction and better feedback, broadly speaking and over time, but it's much harder to stay on task without a malevolent instructor and bad grades breathing down your neck. This thesis jives with my own experience; if I get stuck on a math problem, I just google "an intuitive approach to x," and I usually find a couple of people begging to teach me the concept. But it's harder to get started and to stay focused than in a classroom.

2. Given that knowledge compounds and grants increasing advantages, I'd really like to keep taking advantage of MOOCs. Some MOOCs are better than others, but many are better than your standard college course - and they're free. For a non-technical guy getting technical, like me, it's a golden age of education. So, it would be great if I were highly conscientious. Gwern points out that conscientiousness is a relatively stable Big Five personality trait.

3. The question then becomes, can conscientiousness be developed? Well, I'm not a Cartesian agent, so wouldn't it make sense to reward myself for conscientiousness? Enter the M&Ms. I set a daily target for pomodoros. When I finish a pomodoro, I get a big peanut M&M or two small ones. If I finish two in a row, I get two servings, and so on. In this way, I encourage myself to get started, and then to keep going to build Deep Focus. Each pomodoro becomes cause for celebration, and I find my rapid progress through pomodoros (and chocolate) energizing, where long periods of distraction were tiring.

This has worked fantastically well for the last two weeks. I hit my pomodoro target for paid work, then switch to educational work. I plan to keep it up, and maybe I'll use chocolate as motivation somewhere else as well. Now back to my M&Ms, green, yellow, blue, orange, brown, red . . . 

Personal Blog


12 comments, sorted by Highlighting new comments since Today at 10:19 PM
New Comment

There's a nice conventional categorisation of behaviour modification programmes that goes like this:

Fixed-ratio: a reward is given after a fixed number of nonreinforced responses (e.g. an M&M after every pomodoro, or even fifth pomodoro). Fixed-interval: a reward is given after a fixed interval of time (e.g. you might always set the pomodoro for 25 minutes as per convention). Variable-ratio: a reward is given after a variable number of nonreinforced responses (e.g. you flip a coin after every pomodoro to decide whether you get an M&M). Variable-interval: a reward is given after a variable time interval (i.e. you find some way to determine how long to set the pomodoro, perhaps with a lower bound).

The schedule of reinforcement you're using is left a bit vague. It looks like you're following an FR schedule but could also be doing an FI or VI schedule. But for the purposes of offering advice to people who might want to try something like this, I'll assume you're using either FR or FI.

Psychologists categorise schedules in that way because they want to study the effects of differences in reinforcement. In particular they've been interested in the effects of changes in schedules on the extinction of a behaviour. One major result from the literature (which is reported in most psych textbooks that include a chapter on learning theories) is that variable schedules (using either ratios of respondes or time intervals) are much more resistant to extinction than fixed schedules. As an example, consider a slot machine at a casino; it doesn't have a fixed ratio of 1 reward for every nth try. Instead it varies the ratio of attempts and rewarded attempts, taking advantage of the much stronger reinforcement effect.

So my first piece of advice is: do not use fixed schedules. Varying the rate of reinforcement (either as a function of time or number of completed pomodoros) will help make the good habit you're trying to build stick if your pomodoros use is ever disrupted (because you're busy, you somehow forget, or whatever).

Another result from the literature is that ratio schedules produce higher response rates. This occurs because faster responding increases the likelihood of being reward sooner, since ratio schedules don't depend on time but on attempts. In many situations you might want to take advantage of this and opt for a VR schedule (say if you wanted to encourage a child to behave). In this case, though, it would probably only lead to extinction or abuses. Extinction because if your time intervals are somewhat long (say around 30-60 minutes), then the rewards might be given too infrequently to build your motivation and give you energy. Abuses because the big spaces between rewards might encourage you to cheat the system and eat some M&Ms anyhow because you want the energy.

That leads me to my second bit of advice: don't use a VR schedule; instead vary the time interval. I suggest finding some way to randomise the selection (like rolling dice, throwing darts, or having an algorithm spit out a number) and putting a lower bound on the time intervals (to give yourself enough time to build some flow and focus).

Intervals and ratios are going to be essentially the same thing for conventional pomodoros. They are some time on, some time off, repeat. It might be weird to have variable pomodoros since the break is for mental fatigue, not reward. Perhaps some mechanism to reward you with an M&M at some time randomly in the second half of your pomodoros?

hmm, idea, how well'd this work: you have a machine that drops the reward with a certain low probability every second, but you have to put it back rather than eat it if you weren't doing the task?

That's very interesting indeed.

I get one reward per pomodoro, unless I chain the pomodoros together, in which case the reward matches the number of pomodoros completed (so if I do three in a row, 75 minutes of work, then I get 3 M&Ms). If I want to take a break, then I accept that I'll only get 1 M&M, instead of 2 or 3, after the next pomodoro.

In practice, then, I'm using variable intervals. Based on your feedback, I'll experiment with eating all the rewards at the end of the time interval, instead of devouring them after each pom.

Can I ask a silly question? My understanding of your situation is that you want to get your work done, but sometimes you don't have the willpower, so you use your M&M system for motivation. But then you are faced with the possibility of just eating a bunch of M&M's without doing anything. And there is no meta-M&M system to motivate you to keep from eating M&M's. So I don't see how this can actually help you. Empirically, it clearly does, but I have trouble understanding how. Why is it easier to keep from eating M&M's "on your own" and leverage that ability to motivate you to do work, than it is to keep doing work "on your own" in the first place?

If I have just ruined the effect, I sincerely apologize...

Why is it easier to keep from eating M&M's "on your own" and leverage that ability to motivate you to do work, than it is to keep doing work "on your own" in the first place?

I'm assuming because "don't eat undeserved M&Ms" is a clear, simple and binary rule - breaches are obvious, so there's not much of a temptation to rationalize them. "Work on my stuff" is broad and fuzzy and has plenty of room for excuses like "I'm a bit tired today", "I deserve to rest", "I'll do it tomorrow', etc.

That makes sense.

What Emile said, although I do have to make sure I don't cheat! (Also, the M&Ms are in a desk drawer where I can't see them) Before I tried this, every time I goofed off during a pomodoro, the mild buzz of surfing the internet served as a reward. Now, I tell myself, "don't goof off, or no M&M for you!"

There's a second reward as well, which may not apply to everyone equally. I work full-time, basically in legal research. I used to spread 10 pomodoros out over the day (okay, 8). Now I do 10 as fast as possible, and then switch to personal research. This makes the day much more pleasurable. The M&Ms reinforce this faster-moving, more engaging schedule.

Just a reminder that dental cavities are encouraged by increasing the length of time your teeth are exposed to sugar and that it takes about 20 minutes or so to really flush your mouth with saliva, so lots of small sugary snacks is significantly worse for your teeth than sitting down and just eating an entire bag each day.

With all due caution, you could also try using nicotine the same way you're currently using M&Ms (Gwern on the subject), perhaps occasionally dosing yourself in the second half of a pomodoro as someone else suggested. The downside of this is that nicotines addictive potential may make it harder to not take it when you're slacking off. Definitely do not use tobacco as your nicotine delivery system.

This is a very good point, actually. It'd be better to get an instant hit without destroying the ol' teeth, and Gwern's material through your link is fascinating. I'll report back if I try it out.

Alicorn and Anna Salamon once rewarded Eliezer with an M&M every time he said something nice, and the experiment seems to have worked. I reward myself with points, but your mileage may vary.

On possible ways to increase conscientiousness, see this comment thread.

I've read quite a few people that have bribed themselves with food in this way. I should try it out - I love food really way too much, a few extra calories will be worth it. I wonder if I could bribe myself with (very small amounts of) food to exercise?

EDIT: Spelling fix, post should make sense now.