Re-edited to remove/integrate much of the added notes - Sep 5th:
This is a long post, and it was a first attempt to simply start trying to explain the whole topic, and see what kind of mistakes I made in the communication.
I did indeed make many mistakes, and started to feel that I should ask people not to read this original attempt at first, and so posted added notes to the beginning to say so and try to clear up the worst confusions.
But now that my audience is starting to close the inferential gap themselves thanks to amazingly wonderful people like Misha with “What Direct Instruction Is”, I think important points that I tried to express in this original foray might start to become more transparent to that audience.
It's still a very long post, with lots of new terminology, and, as Alicorn said, “sales-y enthusiasm'.
If you do read it, I must ask that you please don't skim, giving me the benefit of a doubt that anything confusing or nonsensical seeming might actually be something that's important and meaningful in some non-obvious way that you do not yet understand, and that some of the 'sales-y enthusiasm' and “applause lights” may be have been intended to serve some useful purpose.
Again, please don't skim (although it is completely my fault if you feel like skimming!), because I just don't know how to do any better until I get more feedback on how the complete whole of what I wrote is understood.
If you do start skimming, and give up, just tell me where you did so.
[The “added notes” from the first edit I've removed, and will go through at a later time to extract anything that was original and useful and integrate it into the post itself or whatever.]
In this post, I'm going to introduce Direct Instruction, or DI (pronounced Dee-Eye, capital D, capital I, accept no imitations). DI is essentially the theory of how to find the best way to teach anything to anyone. And I mean a theory in the true scientific sense: parsimonious, rigorously pinned down by experiment, and with an impressive history of predictive successes.
Furthermore, it bestows upon the skillful wielder astonishing powers of engineering, allowing them accomplish educational feats that are nothing short of spectacular compared to what's traditionally accepted as, well, acceptable. If DI were universally implemented in the school system, it would easily raise the average intelligence to a level that would be considered genius by the standards of today's average.
Or say someone wanted to set up a community that could consistently raise all its citizens (starting from children or adults) to formidable heights of intelligence, abilities, and rationality. DI would be one of the foundational tools they'd need to make it happen.
It's obvious how this should interest anyone with the LessWrongian mission of changing the world from the crazy-stupid mess it is now to a sane, smart, good place to live.
And that's my main purpose in writing this post: to interest. I'm not going to write a tutorial to teach or scholarly report to convince, because that would be redundant with resources already out there (and a lot more work!). Instead, I'm going to do my best to compress a very broad, very deep subject into a (relatively) short “Hey, check this out!”-style piece of writing, quickly hitting the highlights of the science and history well enough to explain why you should go follow the links I'll post and get your hands on the books I'll list.
[If you find my compression to be more confusing than intriguing in any place, please help me fix that with feedback.]
Once we're all on the same page with respect to this background, I'll be able to write another post on the details how we could use this powerful tool to win. I'll talk about the highly unusual round-about way I originally came to find out about DI myself, and how that allowed me to notice some creative strategies that (I think) should allow DI to win against strongly established anti-rational forces in the field of education (“How we can help DI win"), as well as ways in which we could apply DI towards our goals ourselves, within our community (“How DI can help us win”).
Again, “what we can do for DI”/“what DI can do for us” in a later post. This post will just be a super-abridged intro to DI itself, pretty much saying “Hey, check this out!”.
And at this point, I feel I should move on to a *slightly* more concrete description of what DI looks like (but nowhere near as concrete as showing a particular program [like "Reading Mastery Signatures level K"] at this point).
So, what does DI look like in practice?
One thing that it's important to understand about DI is that, while it's certainly possible for someone who's very proficient with the theory to teach amazingly well, it is not necessary for the teacher to even know the difference between induction and deduction.
Because of the very logical nature of the sequences implied for teaching any particular learning outcome, following algorithms that are throughly understood by their designers, it is possible for a small number of expert 'educational engineers' to create scripted courses that whole schools of non-experts can easily be trained to use for great success. (In somewhat the same way we can have a sufficient number of pilots in the world without having to train each and every one of them to build their own planes, and certainly not expecting them to cobble them together from bits.)
These courses control what the teacher does and says, provide carefully matched expansion activities and independent student work, and even specify the correction procedures that may be necessary. They also provide tests to place new students at the appropriate level in the program, and frequent tests of mastery throughout the course.
These courses are designed using logical rules derived from the basic axioms of the theory (which have been empirically pinned down as correct!), and then, like any high-quality complex machine, field-tested to find any design-errors and correct them before full-scale production (or rather, printing).
[And yes, this logical, algorithmic aspect of the instruction means that DI would be extremely well-suited to creating computer-delivered lessons. If you remember the big fuss about 'Computer Assisted Learning' ages ago, yup, DI finally makes it possible for CAL to actually deliver on all those gushing promises.]
Unfortunately, this blessing of scalability-thanks-to-algorithmic-scriptability is also DI's curse. The very idea of using such tightly scripted lessons immediately sticks in the craw of the vast majority of teachers, and showing them graphs of the overwhelming data showing how much better it is for the students is not very effective, and nor is explaining the theory or attempting to straighten out their philosophical confusions.
For instance, an often articulated concern is that these scripted lessons will restrict the creative freedom of the teacher. A common counter is that this is analogous to claiming the creative freedom of a driver is restricted by not having them design and build their vehicle as they drive it, knowing nothing about the science and engineering necessary.
But of course, this rhetoric, while pretty damn well aligned with evidence, has not been amazingly successful as a strategic tool. It seems that the only significant way in which normal teachers are converted over to DI is by actually using a program correctly themselves, and seeing the amazing difference in their own kids. Unfortunately this does not lead to a multiplication factor greater than one in the spread of DI.
But the subject of DI's historical and continuing struggle to overthrow the anti-scientific establishment of the field of education is covered in some of the resources I'll list at the end of this post, so I won't go into any more detail here. And again, I will discuss creative strategies for turning the tide of this struggle in a later post. At this point, knowing that this is an audience that does properly appreciate experimental evidence, I will move to a discussion of something called Project Follow-Through.
Evidence from Project Follow-Through:
Project Follow-Through was originally conceived as a social program in "The War Against Poverty", but, due to lack of funding, ended up morphing instead into the largest educational experiment in history. It ran nine years from 1968 to 1977, cost like a billion bucks, and involved over 200,000 students in over 170 communities across the US, from kindergarten though to grade three.
It had a 'planned variation' design, otherwise described as a 'horse race' between all the different models popular in the field of education at the time, comparing them as composite wholes to find which worked best (rather than prematurely trying to isolate the effects of different variables within the models). And despite some name changes, the range of ideas in these models is pretty much representative of the common ideas in the field today.
Each school site was 'sponsored' by one of the competing models, or was self-sponsored. Sponsors got funding and some support to make sure their models actually got implemented in all the schools they were responsible for.
The majority of involved communities had disadvantaged populations, and the average level of performance in the controls was at the 20th percentile.
Data was handled by two third-parties, with Stanford Research Institute using a variety of standardized achievement tests to collect it all, and Abt Associates doing the analysis.
Here's a graph showing performance in different basic skills for the nine models with sufficient data to evaluate, relative to that 20th percentile baseline:
Pretty striking, eh?
Here's another graph showing children's gains/losses for the nine models in the area of basic skills (the above-graphed skills lumped together), cognitive skills (things like problem solving, creative thinking, etc), and affective skills (things like self-esteem, sense of responsibility for own learning, attitude towards school, etc). Baseline zero represents children who did not participate in Follow-Through.
You can see along the bottom that although models had been pre-classified as focused on primarily addressing one of these three areas, none of the 'affective' models had a positive effect on affective skills, and none of models had a positive effect on cognitive skills except for 'basic skills-oriented' DI, which raised everything, and quite a lot.
Also, while I've never looked very deeply into the details of the other models (for rather the same reason you don't look very deeply into the details of various tribal witch-doctor systems when you want to know about physics), I'd bet that DI was the sponsor with the most problems getting sites to implement the model properly. These results by no means show the limits of what DI can do.
Follow Through was by far the largest and best-funded experiment, doing the most comprehensive comparison of DI and the other competing 'theories' in the field. It is also easy to tell as a dramatic story, hence my selection of it for an introduction.
However, there have been many other interesting experiments since, demonstrating impressive things that DI makes possible (and confirming that both low-performers and high-performers are best served by DI!). One researcher who conducted a meta-analysis of 34 studies making 173 direct experimental comparisons of DI and non-DI educational interventions said this:
The mean effect size average per study is more that .75, which confirms that the overall effect is substantial. … effects of .75 and above are rare in educational research. DI's consistent achievement of such scores is unique in educational research. [My emphasis]
Again, I'll list resources at the end. At this point, I'm going to move on from evidence demonstrating DI's outstanding superiority in the field of education, to the theory of DI itself.
A quick sketch of the basic theory:
The LessWrong audience should be uniquely prepared to 'click' on DI theory, already understanding things like extensional/intensional definitions, 'looking into the dark', thingspace, and being more likely to respond with a "hm, that sounds like it might be interesting" than a blank look if someone says 'guided induction'.
Still, because of the depth of the subject, I had particular trouble in compressing this section, because I had to choose between:
a) Writing this section as a detailed-and-easy-to-follow intro to the very beginning, but leaving you with no clear idea of how far it goes from there
b) Writing it as a super-abridged whirlwind tour in order to better capture the full breadth, but with some risk of ending up burying you in an avalanche of new terminology and lightning jumps from concept to concept
I ended up opting for (b) here, since a tutorial for the basics already exists as an online open module at Athabasca University (as usual, link later), and my purpose in this post is, as I said, more to pique interest than to teach.
[But again, if you find the whirlwind below more confusing than intriguing, please help me fix that with some feedback.]
So, I'm gonna jump right in and kick things off the same way as the book Theory of Instruction: Principles and Applications, and tell you that 'the analysis of cognitive learning is at the intersection of three other analyses'. One is the behavioral analysis of the learner, also called the 'response-locus analysis' in DI. It's covered in DI theory, but all I'm gonna do here is note that and move on to the other two analyses: that of the communication used to teach, and that of the knowledge systems being taught.
These, the analysis of communications and the analysis of knowledge systems, form the 'stimulus-locus analysis', and are the utterly fascinating first focus of DI.
Imagine you want a student to learn something, so you present an instructional sequence, and it fails. You wonder, why? If you had a hundred copies of that that student, and you presented the exact same sequence to all of them, would it fail 100% of the time? Or would it succeed some portion of the time because the student has some random chance of correctly 'guessing what it's supposed to mean' occasionally?
Of course you can't do an experiment like that, because there's no way you could control the variable of the learner finely enough. So what about controlling the variable of the stimulus used to communicate with the learner?
What you could do is create a 'logically faultless communication', with a structure that you know from logical analysis of the communication itself will be successful with a learner with certain characteristics. (Then, even if the instruction fails, you end up with some highly specific information about the learner, which you can then use to figure out how to create success, by applying it to a behavioral analysis of the learner).
[The term "logically faultless communication" does not suggest that if a learner fails to learn, then the learner is the problem and not the theory. In fact, the most common aphorism in the DI community is, "If the learner hasn't learned, the teacher hasn't taught". Until this seems perfectly consistent to you, you will know for sure you are not understanding the technical meaning of "logically faultless communication".]
The basic axioms of the stimulus-locus analysis, therefore, are:
1) The learning mechanism of the learner can learn any concept/quality from examples
2) The learning mechanism generalizes based on the samenesses of examples (it 'makes up a rule')
(Note that how exactly the 'learning mechanism' does these things is unimportant here; This isn't a theory of learning, but of instruction.)
Given these axioms, and a minimal amount of information about the learner's prior knowledge, it's now possible to design the logically faultless communication as a sequence of positive and negative examples of the concept to be taught. The major principles for doing this, which logically follow from the axioms, are:
- Signals of positive and negative must be clear and consistent
- Only the features the learner is supposed to generalize should be shared by the whole set of positive examples
- Greatly different positive examples must be juxtaposed to show the range of variation of the concept
- Minimally different positive and negative examples must be juxtaposed to show the borders of the concept
- The instruction must integrate a test of generalization
This is why I say that a huge part of the basics of DI is 'guided-induction' (my term, not used in the field).
Aside: If you're familiar with the logical induction game 'zendo', it's like you're playing some sort of backwards version where you as the Master are trying to communicate the koan to the Students as well as possible by first showing a sequence of koans with and without Buddha-nature, and marked so, and then presenting a sequence of unmarked koans for the Students to respond to.
That's a quick sketch of the basis of the analysis of communications (skipping over lots of details of how this plays out in controlling extrapolation, stipulation, and interpolation and stuff).
Now, as I said, the analysis of communications is the first part of the stimulus-locus analysis, and it leads directly to the second part: the analysis of knowledge systems.
The aim of the knowledge-systems analysis is to create a classification scheme that groups concepts by their samenesses in logical structure, so that samenesses in the logical structure of concepts are systematically related to samenesses in the logical structure of the communications used to teach the concepts. Thus classification of anything you want to teach in this scheme will tell you the basic template forms you must use and the steps you must go through to design effective instruction for it.
I won't go into any details of this hierarchy here, since that would involve explaining a lot of terminology and the concepts behind it, but that's all in Theory of Instruction, along with details on the response-locus analysis, and details of designing programs, field-testing them, and using the data from the field-tests to correct design-errors and optimize the whole thing.
I also want share a quote from Siegfried Engelmann, 'the father of DI', about when he was writing the text Theory of Instruction with colleague Douglas Carnine:
If we drew a unique logical conclusion about behavior, Doug would indicate that he knew of no experimental data on this issue and would ask if I knew of any empirical data. The answer was usually "no," so Doug would conduct a study.
Ten studies alone were done on all the details of the template for teaching a basic non-comparative concept like "red". For instance, the presence of negative examples, the number of features they differed from positives by, the way examples were juxtaposed, variations in presentation wording, etc. In every case, DI theory's unique and detailed predictions were validated.
However, at this point I feel that I have probably focused enough on the very basic principles of the theory, the application of which is relatively obvious to the teaching of basic discriminations, but at a greater inferential distance to more advanced concepts. Therefore I'm going to hop over to one of my favorite short examples of the original kind of thinking that comes out of DI about how to teach things.
A quick example of a more advanced application of the stimulus-locus analysis:
This section is a quick adaptation of something I wrote elsewhere. I'm including it here as an example of how DI can produce unexpectedly original conclusions about how to teach various things, which differ greatly from what is intuitively obvious, but which, once understood, are obviously logically overwhelmingly superior.
This particular story starts with a list I once saw presented in a book as an example of possible long-term goals for a kindergarten class. In context, it was just used as an example of what an explicit list of goals might look like, but it was the very last bullet that caught my eye:
“Develop basic math concepts (for examples, numbers 1-20 and shapes)”
How can I best put this.... EPIC FAIL!
It’s not at all obvious at first why this is so wrong, so I’ll explain by outlining the correct way to teach the transformation relationship between numerals and their English names, and the rational for this method.
You don’t teach 1-20 first, you teach 1-99. And you do it in a very special order:
- First, teach 1-10
- Then, do the 40s, 60s, 70s, 80s, and 90s
Why? Because this is the simplest, most regular, largest subset of this numeral-name transformation relationship.
The rule is simply, “First you say the number of tens, add a ‘ty’ (which is just a distorted ‘ten’), and then follow it with the number of ones if it’s not a zero”. So you see “41”, and you think “okay, that’s four-ty-one”.
[Note that this verbal explanation I just presented is not how this is presented to the kids. This a description of what you’re teaching, not how.]
- Then you move on to teaching the 20s, 30s, and 50s.
Why? Because these form another large subset, which involves one more addition to the rules that governed the last subset: You simply distort “two” to “twen”, “three” to “thir”, and “five” to “fif”, so that you get “twen-ty-one” rather than “two-ty-one”.
-Then you can move on to 14, 16, 17, 18, and 19.
This subset is far smaller, and involves more complicated behaviors.
The part of the number’s name that tells the ones digit comes first, followed by the part that tells the tens digit, “-teen”, which is another, different distortion of ten.
You think: “14 -> ten-four -> teen-four (distort) -> four-teen (invert).”
-Then you can do 13 and 15 (the tiniest subset), which are the same as above, but also involve another distortion of “three”->”thir” and “five”->”fif”. (Luckily this distortion is already familiar to the kids from working the 30s and 50s! Clever, eh?)
-And finally, the wacky irregulars 12 and 11 can be thrown in.
This order is optimal for making clear to the learner that there is an orderly relationship here. They get the simple rules that cover the largest single group of cases, then they get the slightly more complicated rules that cover the next largest subtype, etc.
This makes clear what the basic pattern is, that the exceptions are exceptions, and exactly how they are exceptions.
Thus you can teach 1-99 far faster and easier than you can 1-20.
[Note that the student must not be worked to mastery on each subtype before the introduction of the next, because this would induce stipulation that the subtype was universal, but given proper pacing for any sequence of introduction, this order is optimal.]
Can you imagine being a very young kid, truly naïve to this concept (not having had ridiculous amount of informal exposure at home as a kid from a non-disadvantage background), and having someone try to teach you 11 to 20 after just getting up to 1 to 10?
For 11 and 12 you’re thinking somewhere in your brain, “Okay, does every number have its own unique name as you keep counting up?” (You might also wonder: “How many numbers are there, anyway?”)
For 13 you really haven't seen anything that contradicts that ‘every number gets it’s own unique name’ hypothesis (how likely do you think it is to occur to you that the 'thir' is related to the '3' in '13' and the 'teen' to the '1'? Nah, aint gonna happen).
At 14 it might occur to you to wonder if the ‘four’ in ‘fourteen’ has something to do with the ‘4’ in ‘14, but since ‘FIFteen’ doesn’t seem to have a ‘five’ in it, you’ll move that hypothesis to the backburner.
At the introduction of 16 you'll go, “Hm, I wonder if the 'six' in 'sixteen' is related to the '6' in... Naaaah, I'm not gonna fall for that one again!”
At 17 you start to reconsider it. 18 and 19 bring it back up to full level of serious consideration, by which time you’re pretty sure there’s at least some bits with some sort of pattern in here...
And then they throw 'twenty' at you.
Huh? I mean, huh?
Now hopefully you can see how the obvious intuitive way of teaching something can be not merely, "Oh, maybe it could be better if you did, like, this or that", but actually downright horrifyingly logically broken and wrong and bad.
Whosoever adopts this crazy ‘teach my kindergarteners 1-20’ goal is going to horribly slow down and confuse their kids. Not just ‘they might be able to teach it better’. They’re doing it wrong.
In DI, a relationship like this one between numerals and their names is classified as a 'transformation concept', and the treatment I described above is called 'subtype analysis'. Hopefully it should now seem quite reasonable to think that this abstract concept could be applied to the teaching of many other not obviously related things (like grammatical conjugation rules in a foreign language, for instance), and that similarly suprising-yet-logical conclusions would be drawn by the stimulus-locus analysis for other concepts in the classification schemes as well as these 'transformations'.
Moving on, I feel at this point that I am unlikely to improve the quality of my super-abridged compression significantly per unit of additional agonizing over it, and I will now present the promised list of resources.
Resources on DI online and in print:
This provides a very short biography of Siegfried Engelmann (as I mentioned, the 'father of DI'), an overview of Project Follow-Through and associated history, and a much easier to follow introduction to the basics of the theory and the application of of the stimulus-locus analysis to the first of the 'basic forms' in the classification hierarchy.
This is pretty much the equivalent of Newton's Principia for the field of education, except luckily it's not written in Latin.
Ironically, in reading this text you will often find yourself wishing that the techniques in this book had been applied to the book itself (and seeing quite clearly how they could be). Understandably though, the authors had to first articulate it all rigorously for themselves, and having done so, and given the low interest in the field in a true scientific theory, they decided to focus their engineering efforts on creating more programs for school children instead.
Nevertheless, the LessWrong audience shouldn't find it too difficult. The AthabascaU module largely covers the basics presented in the first few modules, and having read that and thus already having the concepts in mind, it's quite easy to adapt to the language, after which the extremely logical nature of the ideas presented makes it quite easy to follow.
- The book, Research on Direct Instruction: 25 Years Beyond DISTAR [DISTAR was an early set of DI programs focused on arithmetic and reading]
This is the source of the quote from the meta-analysis I mentioned. It also covers studies on other things such as a program for teaching deaf and non-deaf people to interpret spoken words transformed into tactile vibrations, and some experiments that falsified Piaget's developmental theory (!)
Theory of Instruction also has a section on research.
- Engelmann has also written two books intended for a popular audience with titles that may be overly provocative from a strategic standpoint, but are definitely spot-on in terms of accuracy: “War Against the School's Academic Child Abuse” and “Teaching Needy Kids in Our Backwards System”.
These books deal with many educational issues which are more often than not both historical and current. Much of the material is presented in a partially autobiographical context.
Although I believe we are going to be able to largely step around most of the frustrating quagmires of institutionalized irrationality detailed in these books, I believe it's still good to have a good understanding of exactly what it is we're side-stepping, and many interesting bits of science and things are tied into a common narrative framework too. And finally, since they're written for a popular audience and quite easy reads, I would definitely recommend these books as worthwhile.
- Engelmann's personal website zigsite.com has many interesting short (and not-so-short) documents. I would recommend "Curriculum as the cause of failure", (a couple pages are duplicated in that pdf) and its contextual prologue, for instance, and the video interviews.
How much of the material you're interested in depends on how much you just want to know only about the science itself, and how much you want to know about the horrible lack of science in the field outside of DI.
That is probably a sufficient amount of material for now. I'm hoping that your first taste will draw you in to voraciously devouring everything you can get your hands on, as happened for me.
However, there is one very significant way in which my experience will differ from yours. As I mentioned in passing, I originally found out about DI in a very unusual round-about way. In fact, I became interested in it long before I first heard of it.
To make a long, complicated story as short and streamlined as possible:
Some years back, I decided I wanted to teach myself French, and after much failure, eventually stumbled upon a set of audio lessons that used something called the “Michel Thomas Method”.
The difference between these lessons and all the other 'teach yourself' and formal instruction I'd messed about with was simply incredible. In about a month of using these audio lessons on my mp3 player while walking or riding the bus, I had a strong grasp of the entire structure of the language, and could use it to express my own ideas in a conversational context.
Needless to say, I was a) very excited, and b) very angry that none of the supposed experts in language learning had told me about this sooner.
I wanted to know what this “Michel Thomas Method” actually was. Would it work for everyone, or just learners like me? Would it work for subjects other than languages?
I eventually tracked down a book called “The Learning Revolution” by Jonathan Solity (which I had to order from the UK), and it was here that I first found references to “Direct Instruction” and “Ziggy Engelmann”. I googled it and, like I said, was soon hooked.
But what I got from Solity's book in the end (although not exactly what he said), was that everything in the Michel Thomas lessons that made them so unusually effective was an approximation of DI.
To explain the way I usually think about it now, I'll make a short digression to summarize one of Engelmann's articles criticizing “research-based” educational reforms in reading ("The Dalmation and Its Spots" on zigsite.com if you want to read it yourself).
In it he basically says this:
- These reforms were targeted at mandating that reading instruction have certain features (eg. paying some attention to phonemic awareness), because research had shown that instruction that was effective had these features, and therefore if instruction had these features it would be effective
-However, this is like saying that all dalmatians have spots, and therefore if something has spots, it's a dalmatian.
So I would now say that the Michel Thomas programs were dalmatians rather than merely spotted. A bit mangy, with some mutt in them, but dalmatian enough to suffice for many practical purposes.
Aside: Nobody knows whether Michel Thomas, now deceased, was ever directly aware of Engelmann's work, but he must have at least started developing some of the principles independently, given details of his rather dramatic biography in Europe during the WWII which I won't go into. And independent recreations of the same things are common in science and technology, after all.
At any rate, my personal emotional experience - failing very hard at learning something I wanted to do, and then finally succeeding quickly and easily thanks to, surprise, an instructor that actually had a clue how to teach - is unquestionably responsible for a lot of the enthusiasm I have for this subject. And I just felt I should mention that.
A final aside: If you're interested in learning a language yourself, I can personally recommend both the French and Spanish courses. (I haven't used the German and Italian, and don't know about the courses for other languages made by other people after Michel Thomas's death.)
I can't recommend that you simply download these from the internet, since that may be illegal in some jurisdictions, but there's a good chance you can find a copy at a local library, as I originally did.
Having used these courses does provide an enlightening additional perspective on DI, as well as being, as I mentioned, the context in which I originally thought of some strategies that could allow DI to finally win against irrational forces in the educational establishment, which I will talk about in a later post.
Of course it's not necessary to have the same experience yourself in order to understand what I'll talk about, if you are not particularly interested in learning (or already know) French or Spanish, but if you are, then it would definitely be worthwhile.
And that, I believe, wraps up this super-sized “Hey, check this out!”
I look forward to your feedback.