Meta analysis of Writing Therapy

jsalvatier

Robin Hanson recently mentioned "writing therapy" as potentially having surprisingly large benefits. In the example he gives, recently unemployed engineers who write about their experience find jobs more quickly than those that did not.

The meta-analysis paper he links to was pretty lame, but I found another meta-analysis, "Experimental disclosure and its moderators: A meta-analysis", on a somewhat broader topic of Experimental Disclosure that appears to be much better.

My judgment is non-expert, but it looks to me like a very high quality meta-analysis. The authors use a large number of studies (146) and include a large number of potential moderators, discuss their methodology in detail, and address publication bias intelligently.

The authors find small to moderate positive effects on measures of psychological health, physiological health and general life outcomes. They also find a number of interesting moderating factors.

The fifth of the six outcome types, subjective impact of the intervention, was measured by 33 different studies, with a mean unweighted effect size of .159 and a mean weighted effect size of .152. This unweighted effect was significant in a random effects analysis with a significance level of .000035. Because subjective impact of the intervention is a rather broad domain, it was further broken up into four subcategories: positive attitude about intervention (26 studies), attempts to process/make sense of event (21 studies), intervention had no effect (1 study), and negative attitude about intervention (11 studies). Of these four subcategories, two were significant in a random effects analysis: positive attitude about intervention (r ϭ .270) and attempts to process/make sense of event (r ϭ .132). No other subjective impact outcomes approached significance in either the random or the fixed effects approach......Optimism was also found to be a significant moderator for psychological health (r ϭ .340) and reported health effect sizes (r ϭ .157), such that pessimists benefited more from the intervention, but it did not moderate overall or subjective impact effect sizes. The rest of the within-study moderators (age, gender, mood, neuroticism, alexithymia, and emotional inhibition) did not moderate any of the effect size categories....It is interesting that even though disclosure did not affect many objectively measured indicators of disease status, self-reported measures of disease activity were improved. For whatever reason, perhaps because they have improved psychological health, patients feel that they are doing better, even though lab results might indicate otherwise.

(Around page 22.) The psychological health, physical health, reported health were all around 0.05 weighted effect size or less, compared to 0.15 (3x); 'health behaviors' may've had a negative effect size. 'General functioning/life outcomes' was smaller, at 0.036 weighted (most from 'work' and 'social relationships').

So, from the sound of it, the writing therapy helped most with personal relationships, but people considerably overestimate how much it helps. Which is interesting. I was thinking that this wasn't sounding terribly impressive, but the author cover that point:

Some may argue that an effect of .075 is considered to be quite small by traditional standards (e.g., conventions by Cohen, 1988), as it accounts for only 0.56% of the variance in the measured outcomes. However, even Cohen (1988) himself stated that “there is a certain risk in offering conventional operational definitions for [the terms of small, medium, and large] for use in power analysis in as diverse a field of inquiry as behavioral science” (p. 112). Rather than relying on Cohen’s conventions, researchers have argued that the practical importance of an effect depends entirely on its relative costs and benefits (Glass, McGaw, & Smith, 1981). When one considers that the act of disclosing has virtually no costs—it is a free, noninvasive, independent activity and is perceived by participants to be helpful—it seems that any effect that is nonzero and in the positive direction is worth noting (see Prentice & Miller, 1992)...As an example, consider the act of taking a daily aspirin after a heart attack to prevent death from a second heart attack: This treatment is widely regarded in the medical community as extremely valuable, and it has a r-effect size of .034 (Rosenthal, 1994), less than half of the effect size found for experimental disclosure. Similarly, because a number of experimental disclosure studies have been conducted on college students and some of the outcomes were measures of scholastic achievement, it would also be appropriate to consider the size of the effect in light of the educational literature. In particular, educators have recently argued that effect sizes as small as .050 are reasonable effects to expect in educational research and, although small, are nonetheless important in terms of improvements in learning and achievement (Lanahan, McGrath, McLaughlin, Burian-Fitzgerald, & Salganik, 2005); our r-effect size of .075 is even a bit higher than the reasonable and important effect of .050...M. L. Smith and Glass (1977), in a review of approximately 500 studies on psychotherapy, found the equivalent of a r-effect size of psychotherapy to be about .322. Clearly, this is quite a bit larger than the effect size of .075 that was found in the present study for experimental disclosure. However, given that psychotherapy typically takes place for 1 hr per week over the course of several months (sometimes years) and is conducted by a therapist who has had many years of education and training, of course it should be the case that spending only 20 min a day for 3 days on an independent writing (or talking) activity should have an effect size that is quite a bit smaller than months of time-consuming and expensive psychotherapy. To have arrived at any other result should cause the reader to be suspicious. Indeed, it does seem quite impressive that an intervention that is so easy (requires one only to write or talk), so brief (a total of about an hour), so cost efficient (completely free), and so well received by participants (most participants enjoy it or report it to be helpful) can improve so many facets of a person’s life (psychological, physical, social, academic), even if it is considered a very small improvement by conventional standards.

The publication data is fun:

Published studies had significantly higher overall effect sizes (published, r ϭ .095; unpublished, r ϭ .054) and reported health effect sizes (published, r ϭ .141; unpublished, r ϭ .064). Publication status also marginally moderated the subjective impact effect size (published, r ϭ .213; unpublished, r ϭ .123), but there was no significant difference between published and unpublished studies for the psychological health effect size. Across the 146 studies, 52% were published...However, studies in which participants were paid had significantly higher subjective impact effect sizes than studies in which participants were given no payment (paid, r ϭ .167; unpaid, r ϭ Ϫ.006)...With 112 studies containing psychological health effect sizes and a sum of zs of 40.31, there would need to be 488 studies with null effects hidden away in file drawers to make this psychological health effect size non-significant; it seems highly unlikely that such a large number of these studies exists. In addition, it is encouraging that this relation was found only with psychological health effect sizes and not with any of the other effect size categories.

I was worried because early on it talked about the effect sizes being bigger in college students, but in the end:

No participant variables significantly moderated any of the effect size categories in the between-studies analysis.

Some useful advice:

Furthermore and perhaps more important, a number of moderators of experimental disclosure were identified with a random effects approach; effect sizes tended to be larger when studies

included only participants with physical health problems,

included only participants with a history of trauma or stressors,

did not draw from a college student sample,

had participants disclose at home,

had participants disclose in a private setting,

had more male participants,

had fewer participants,

paid the participants,

had follow-up periods of less than 1 month,

had at least three disclosure sessions,

had disclosure sessions that lasted at least 15 min,

had participants who wrote about more recent events,

instructed participants to discuss previously undisclosed topics,

gave participants directed questions or specific examples of what to disclose,

gave participants instructions regarding whether they should switch topics,

and did not collect the products of disclosure.

So it sounds like for us, the basic idea should be to "write at home in private for non-distribution for half an hour every few days for a week about something you haven't dealt with before".

This actually sounds kind of similar to Alicorn's Luminosity journalling stuff, going on my vague memories.

I had the idea that journalism can be helpful and beneficial to rationalists, and thought to write about it, but i first looked to see if someone did it already.

This post was really helpful, though i found it only through a different post since it didn't have the keywords "journaling" and "diary".

Here's another benefit that isn't discussed and relevant specifically to rationality:

We know that our memory is far from perfect, and that whenever we remember something we change the memory a little, and maybe add after the fact analysis to it.

long term journaling gives you a window into how you thought in the past, and how your thinking has changed.

I think it would also be very valuable for someone who's just staring out with rationality, to start a journal and as part of it track his journey with rationality. We're all families with how many of the ideas in the sequences feel obvious after sometime getting used to them. a journal would record and remind you how it was for you to first grapple with these new ideas.

Another benefit is noticing patterns and principles:

I noticed in my life that many times after some events (meaningful, happy, sad, important, etc.), i note a principle to myself, a lesson from it. but also many times when it happens, i notice that i already "noted" this principle to myself, but since i haven't written it or talked with someone about it i forgot it. not making the smae mistakes over and over again is a large part of rationality, being able to review such moments and your thoughts on them at the moment would be very helpful.

This study seems rather peculiar.

Note that they weren't asked to write about their values with respect to science. Perhaps the context increased the likelihood they did so, or perhaps there was a place dependence to the effect - the feelings of value got anchored to the location you felt them in?

Otherwise, I'd expect to see the result generalize far and wide to their lives. On the other hand, that a 15 minute writing assignment would have such far ranging effects in a person's life seems rather unlikely to me. That it would have such a wide ranging effect in that one class seems miraculous in itself.

Hence, I find it all rather peculiar.

Notice how the men did significantly worse on their exam scores after values affirmation. What's the explanation for that?

And "stereotype threat" just seems like a non sequitur here. How is that in any way related to the writing task? I see in the abstract that they found that "Benefits were strongest for women who tended to endorse the stereotype that men do better than women in physics."

And the "control" is as much an experiment as the "treatment". Why shouldn't we conclude that the "control" had a large negative effect on women, and particularly women who believed the stereotype (and data) that men are better at math?

Maybe the women who believed the data that men were better at math showed the greatest jump because they believed in data, and so had greater aptitude thereby? Maybe those women were just more impressionable to the value of others, and so disheartened by contemplating things they didn't value that other people did.

The raw data seems odd, and the interpretation even more dubious. Just peculiar all the way around. It certainly warrants further study, and I'd particularly like to see it controlled for each individual with a test of their aptitude/achievement going into the class.

What on earth are you talking about? Where do math exams come into the picture of jsalvatier's linked meta-analysis?

Sorry. I was replying to a link to an article below.

Here's the author's summary of the moderators of effect size:

perhaps more important, a number of moderators of experimental disclosure were identified with a random effects approach; effect sizes tended to be larger when studies included only participants with physical health problems, included only participants with a history of trauma or stressors, did not draw from a college student sample, had participants disclose at home, had participants disclose in a private setting, had more male participants, had fewer participants, paid the participants, had follow-up periods of less than 1 month, had at least three disclosure sessions, had disclosure sessions that lasted at least 15 min, had participants who wrote about more recent events, instructed participants to discuss previously undisclosed topics, gave participants directed questions or specific examples of what to disclose, gave participants instructions regarding whether they should switch topics, and did not collect the products of disclosure. Conversely, a number of variables that were originally hypothesized to moderate experimental disclosure were not significantly related to effect size: psychological health selection criteria, participant age, participant ethnicity, participant education level, warning participants in advance that they might disclose traumatic events, spacing of disclosure sessions, valence of disclosure topic, focus of disclosure instructions, time reference of disclosure instructions, and mode of disclosure(hand writing, typing, or talking). (p. 851)

The overall weighted effect size was .063 (p. 834)

Due to a lack of focus I could not read the whole document, but it does look pretty good to my untrained eyes.

The moderating factors seem to be pretty important, I was unable to collect them all but they should sum up nicely to a how to do writing therapy guideline.

Here's another study that shows significant effects for a particular type of writing therapy. I'm going to use this in my classes next semester.

This study seems rather peculiar.

Hence, I find it all rather peculiar.

Notice how the men did significantly worse on their exam scores after values affirmation. What's the explanation for that?

Notice how the men did significantly worse on their exam scores after values affirmation. What's the explanation for that?

That difference looks to me to be within the margin of error.

Maybe the women who believed the data that men were better at math showed the greatest jump because they believed in data, and so had greater aptitude thereby?

Among the stereotyped group that most believed the stereotype, there was the greatest divergence between the effects of the two writing exercise. Your suggestion should predict that all of the stereotype-believing group would improve equally. Also, "they believed in data, and so had greater aptitude thereby"? It would be a lot less embarrassing if you just figured out and stated your true rejection of this study.

perhaps there was a place dependence to the effect - the feelings of value got anchored to the location you felt them in

I think that's the mechanism the authors believe in; the place or context of the science classroom becomes less intimidating when the first thing they did in the semester is prime themselves with confidence.

You can defy the data if you like, but it seems pretty plausible to me.

I think that's the mechanism the authors believe in; the place or context of the science classroom becomes less intimidating when the first thing they did in the semester is prime themselves with confidence.

What did they write that gave you that impression?

The fifth of the six outcome types, subjective impact of the intervention, was measured by 33 different studies, with a mean unweighted effect size of .159 and a mean weighted effect size of .152. This unweighted effect was significant in a random effects analysis with a significance level of .000035. Because subjective impact of the intervention is a rather broad domain, it was further broken up into four subcategories: positive attitude about intervention (26 studies), attempts to process/make sense of event (21 studies), intervention had no effect (1 study), and negative attitude about intervention (11 studies). Of these four subcategories, two were significant in a random effects analysis: positive attitude about intervention (r ϭ .270) and attempts to process/make sense of event (r ϭ .132). No other subjective impact outcomes approached significance in either the random or the fixed effects approach......Optimism was also found to be a significant moderator for psychological health (r ϭ .340) and reported health effect sizes (r ϭ .157), such that pessimists benefited more from the intervention, but it did not moderate overall or subjective impact effect sizes. The rest of the within-study moderators (age, gender, mood, neuroticism, alexithymia, and emotional inhibition) did not moderate any of the effect size categories....It is interesting that even though disclosure did not affect many objectively measured indicators of disease status, self-reported measures of disease activity were improved. For whatever reason, perhaps because they have improved psychological health, patients feel that they are doing better, even though lab results might indicate otherwise.

Some may argue that an effect of .075 is considered to be quite small by traditional standards (e.g., conventions by Cohen, 1988), as it accounts for only 0.56% of the variance in the measured outcomes. However, even Cohen (1988) himself stated that “there is a certain risk in offering conventional operational definitions for [the terms of small, medium, and large] for use in power analysis in as diverse a field of inquiry as behavioral science” (p. 112). Rather than relying on Cohen’s conventions, researchers have argued that the practical importance of an effect depends entirely on its relative costs and benefits (Glass, McGaw, & Smith, 1981). When one considers that the act of disclosing has virtually no costs—it is a free, noninvasive, independent activity and is perceived by participants to be helpful—it seems that any effect that is nonzero and in the positive direction is worth noting (see Prentice & Miller, 1992)...As an example, consider the act of taking a daily aspirin after a heart attack to prevent death from a second heart attack: This treatment is widely regarded in the medical community as extremely valuable, and it has a r-effect size of .034 (Rosenthal, 1994), less than half of the effect size found for experimental disclosure. Similarly, because a number of experimental disclosure studies have been conducted on college students and some of the outcomes were measures of scholastic achievement, it would also be appropriate to consider the size of the effect in light of the educational literature. In particular, educators have recently argued that effect sizes as small as .050 are reasonable effects to expect in educational research and, although small, are nonetheless important in terms of improvements in learning and achievement (Lanahan, McGrath, McLaughlin, Burian-Fitzgerald, & Salganik, 2005); our r-effect size of .075 is even a bit higher than the reasonable and important effect of .050...M. L. Smith and Glass (1977), in a review of approximately 500 studies on psychotherapy, found the equivalent of a r-effect size of psychotherapy to be about .322. Clearly, this is quite a bit larger than the effect size of .075 that was found in the present study for experimental disclosure. However, given that psychotherapy typically takes place for 1 hr per week over the course of several months (sometimes years) and is conducted by a therapist who has had many years of education and training, of course it should be the case that spending only 20 min a day for 3 days on an independent writing (or talking) activity should have an effect size that is quite a bit smaller than months of time-consuming and expensive psychotherapy. To have arrived at any other result should cause the reader to be suspicious. Indeed, it does seem quite impressive that an intervention that is so easy (requires one only to write or talk), so brief (a total of about an hour), so cost efficient (completely free), and so well received by participants (most participants enjoy it or report it to be helpful) can improve so many facets of a person’s life (psychological, physical, social, academic), even if it is considered a very small improvement by conventional standards.

The publication data is fun:

Published studies had significantly higher overall effect sizes (published, r ϭ .095; unpublished, r ϭ .054) and reported health effect sizes (published, r ϭ .141; unpublished, r ϭ .064). Publication status also marginally moderated the subjective impact effect size (published, r ϭ .213; unpublished, r ϭ .123), but there was no significant difference between published and unpublished studies for the psychological health effect size. Across the 146 studies, 52% were published...However, studies in which participants were paid had significantly higher subjective impact effect sizes than studies in which participants were given no payment (paid, r ϭ .167; unpaid, r ϭ Ϫ.006)...With 112 studies containing psychological health effect sizes and a sum of zs of 40.31, there would need to be 488 studies with null effects hidden away in file drawers to make this psychological health effect size non-significant; it seems highly unlikely that such a large number of these studies exists. In addition, it is encouraging that this relation was found only with psychological health effect sizes and not with any of the other effect size categories.

I was worried because early on it talked about the effect sizes being bigger in college students, but in the end:

No participant variables significantly moderated any of the effect size categories in the between-studies analysis.

Some useful advice:

Furthermore and perhaps more important, a number of moderators of experimental disclosure were identified with a random effects approach; effect sizes tended to be larger when studies

included only participants with physical health problems,

included only participants with a history of trauma or stressors,

did not draw from a college student sample,

had participants disclose at home,

had participants disclose in a private setting,

had more male participants,

had fewer participants,

paid the participants,

had follow-up periods of less than 1 month,

had at least three disclosure sessions,

had disclosure sessions that lasted at least 15 min,

had participants who wrote about more recent events,

instructed participants to discuss previously undisclosed topics,

gave participants directed questions or specific examples of what to disclose,

gave participants instructions regarding whether they should switch topics,

and did not collect the products of disclosure.

So it sounds like for us, the basic idea should be to "write at home in private for non-distribution for half an hour every few days for a week about something you haven't dealt with before".

This actually sounds kind of similar to Alicorn's Luminosity journalling stuff, going on my vague memories.

I had the idea that journalism can be helpful and beneficial to rationalists, and thought to write about it, but i first looked to see if someone did it already.

This post was really helpful, though i found it only through a different post since it didn't have the keywords "journaling" and "diary".

Here's another benefit that isn't discussed and relevant specifically to rationality:

We know that our memory is far from perfect, and that whenever we remember something we change the memory a little, and maybe add after the fact analysis to it.

long term journaling gives you a window into how you thought in the past, and how your thinking has changed.

Another benefit is noticing patterns and principles:

This study seems rather peculiar.

Hence, I find it all rather peculiar.

Notice how the men did significantly worse on their exam scores after values affirmation. What's the explanation for that?

What on earth are you talking about? Where do math exams come into the picture of jsalvatier's linked meta-analysis?

Sorry. I was replying to a link to an article below.

Here's the author's summary of the moderators of effect size:

perhaps more important, a number of moderators of experimental disclosure were identified with a random effects approach; effect sizes tended to be larger when studies included only participants with physical health problems, included only participants with a history of trauma or stressors, did not draw from a college student sample, had participants disclose at home, had participants disclose in a private setting, had more male participants, had fewer participants, paid the participants, had follow-up periods of less than 1 month, had at least three disclosure sessions, had disclosure sessions that lasted at least 15 min, had participants who wrote about more recent events, instructed participants to discuss previously undisclosed topics, gave participants directed questions or specific examples of what to disclose, gave participants instructions regarding whether they should switch topics, and did not collect the products of disclosure. Conversely, a number of variables that were originally hypothesized to moderate experimental disclosure were not significantly related to effect size: psychological health selection criteria, participant age, participant ethnicity, participant education level, warning participants in advance that they might disclose traumatic events, spacing of disclosure sessions, valence of disclosure topic, focus of disclosure instructions, time reference of disclosure instructions, and mode of disclosure(hand writing, typing, or talking). (p. 851)

The overall weighted effect size was .063 (p. 834)

Due to a lack of focus I could not read the whole document, but it does look pretty good to my untrained eyes.

The moderating factors seem to be pretty important, I was unable to collect them all but they should sum up nicely to a how to do writing therapy guideline.

Here's another study that shows significant effects for a particular type of writing therapy. I'm going to use this in my classes next semester.

This study seems rather peculiar.

Hence, I find it all rather peculiar.

Notice how the men did significantly worse on their exam scores after values affirmation. What's the explanation for that?

Notice how the men did significantly worse on their exam scores after values affirmation. What's the explanation for that?

That difference looks to me to be within the margin of error.

Maybe the women who believed the data that men were better at math showed the greatest jump because they believed in data, and so had greater aptitude thereby?

perhaps there was a place dependence to the effect - the feelings of value got anchored to the location you felt them in

You can defy the data if you like, but it seems pretty plausible to me.

I think that's the mechanism the authors believe in; the place or context of the science classroom becomes less intimidating when the first thing they did in the semester is prime themselves with confidence.

What did they write that gave you that impression?

LESSWRONG
LW

LESSWRONG
LW

23

Meta analysis of Writing Therapy

23

23

23