# 28

I've been teaching math to people one or two levels below me my entire life. Although this seems like a limitation, I think it's the natural state of affairs.

On the Kiseido Go Server (KGS), there's a room called the KGS Teaching Ladder where players can find teaching games with players just a few stones stronger than them. The few times I participated, it was extraordinarily positive. Because of the relative linearity of progression in Go, losing to a slightly stronger player is legible: they will usually play a move you considered but just barely don't understand, or find the simplest good moves that you don't know yet. Losing to a much stronger player, however, is completely illegible. Much stronger players will often play completely incorrect moves ("overplays") just to test your instincts, or play otherwise incomprehensibly complicated variations and traps that you immediately fall into.

I distinguish between two models of teaching:

1. (Traditional) The master teaches everyone.
2. (Teaching Ladder) The students one or two stages up from you teach you.

Previously, I noted that many progressions come in three stages: "naive, cynical, naive but wise," where the third stage bears more resemblance to the first than the second. The value of Teaching Ladders is that they naturally mesh with the three stages: Stage 3's have a difficult time teaching Stage 1's, and Stage 2's are needed to fill that gap.

## Scaffolding and Assimilation

The history of every major galactic civilization tends to pass through three distinct and recognizable phases, those of Survival, Inquiry and Sophistication, otherwise known as the How, Why, and Where phases. For instance, the first phase is characterized by the question ‘How can we eat?’, the second by the question ‘Why do we eat?’ and the third by the question, ‘Where shall we have lunch?’ (Douglas Adams, “The Hitchhiker’s Guide to the Galaxy“)

In Singularity Mindset, I articulated the following model of development without explaining its origins:

Oftentimes, progress curves look like “naive, cynical, naive but wise”:
For mathematicians, the curve is pre-rigor, rigor, post-rigor.
Picasso said, “It took me four years to paint like Raphael, but a lifetime to paint like a child.”
Scott Alexander foretold that idealism is the new cynicism.
Knowing about biases can hurt you.

This is a general phenomenon which applies not just at the level of an entire field, but also at the level of individual skills. With Terry's example of mathematics in mind, the stages look like:

1. (naive) The student has bad instincts. He thinks that proof by example is a valid argument. Saying "trust your instincts" doesn't help and deeply frustrates him. Progress is achieved by dropping the instincts, acquiring explicit knowledge, and following fixed and deliberate rules.
2. (cynical) The student has the knowledge and understands the fixed and deliberate rules. He starts every proof with a cookie-cutter template for proof by induction or proof by contradiction and fills in the logic line by line. Unfortunately, doing everything by System 2 is slow and clunky. Progress is achieved by pushing acquired knowledge back down to System 1 via practice, metaphor, and exploration.
3. (naive but wise) The student has successfully integrated skills into System 1. He produces intuitive arguments that only mention the salient details. A completely rigorous proof can be reconstructed on demand, but requires effort. At this point the explicit structures originally built to progress to Stage 2 are unnecessary, and are slowly taken down.

This model resembles - and perhaps generalizes - the interaction of Babble and Prune, where conscious Prune filters are slowly pushed down into the subconscious Babble. Learning occurs as superior algorithms are constructed in System 2 and then pushed back down to System 1, the instinctual level. After the algorithm is constructed, however, the remaining machinery in System 2 is outdated scaffolding. The farther along a student is past Stage 2, the more of this scaffolding is forgotten.

Let's call the transition between Stage 1 and Stage 2 Scaffolding and the transition between Stage 2 and Stage 3 Assimilation. Every progression in every domain looks roughly like a ladder built out of alternating Scaffolding and Assimilation rungs. In the Scaffolding stage, bad instincts are explicitly corrected with procedure and hard-and-fast rules. In the Assimilation stage, the explicit Scaffolding built is now practiced and stretched until it becomes instinct. Afterwards, although there is rarely an explicit call to remove the Scaffolding, it is no longer in use and slowly crumbles, leaving only the pure instinct behind.

## Teach Scaffolding

In a traditional teaching model, the master teaches students at all levels of development, from precalculus to (infinity, 1)-categories. The basic pitfall to this model is usually described as Expecting Short Inferential Distances, i.e. that the master has a hard time reaching back down the tall tower of inferences to meet her students. She may even be a great speaker, throwing down her instincts in the way of quirky metaphors in an attempt to boost her students up. But she is no Rapunzel and the students are left staring up longingly from the bottom of the tower. Every so often, one of them tries to hop up and make progress by asking what that symbol means, but the tower is too damn high.

Long inferential distances are certainly part of the problem, but even if the master is sufficiently humble to back down a hundred steps, she may lack key pieces of Scaffolding that are required to convey ideas to the students. A master looks down the tower of inference and sees only the transition between Stage 1 and Stage 3, as if Assimilation can be achieved without the Scaffolding. She would never dream of teaching proof by contradiction with a cookie-cutter mad-lib proof template, but that seems to be an effective starting point for students who've never handled proofs.

Teaching Ladders, on the other hand, not only reduce inferential distance but introduce teachers who still have their Stage 2 Scaffolding mostly intact. That's why (in my experience) the TA in an undergraduate-level math course is usually more effective than the lecturer. Unless they are explicitly trained in teaching, even lecturers who understand and correct for the inferential gulfs involved lack the mental machinery to convey Scaffolding.

Often when I'm asked to teach a concept I recently learned, I have the urge to punt it to a known master's writing or lectures. Today I'm learning to fight that urge. However incomplete my knowledge, I convey it with Scaffolding intact, and that will do more good than harm.

# 28

New Comment

I generally like this sort of thing, as much for building community as anything else, and also for learning by teaching, but counterpoint: I have a dance teacher who runs a group dance lesson every week, and he goes out of his way to tell us not to teach each other in this way (e.g. he generally discourages us from giving each other feedback except in specific feedback exercises). I find it annoying, but I also see where he's coming from: my sense is that he thinks we'll give each other bad advice and reinforce bad habits, and he thinks he can do better by pointing out our bad habits himself.

A relevant feature of this domain is that it's hard to tell when you're dancing poorly in many ways - you might be doing flashy stuff that looks cool but trains bad habits that will screw you down the road - and part of the skill my teacher has is that he's much better at diagnosing this sort of thing than we are. This is less relevant in domains with clearer and tighter feedback loops, although even in e.g. programming people have subtle opinions about style and how bad style can screw you down the road.

One way to say it is that there's a separate ladder for ability to teach the thing. You can get to e.g. the postrigorous stage in mathematics and still be bad at teaching, which requires different skills, like the ability to model students and a sense of what their most likely struggles are with various topics.

Here we go: the pattern of this conversation is "first correction, second correction, accurate belief" (see growth triplets).

Naive view: "learn from masters"

The OP is the first correction: "learn from people just above you"

Your comment is the second correction: "there are cases where teacher's advice is better quality"

The accurate belief takes all of this into account: "it's best learn from multiple people in a way that balances wisdom against accessibility"

I worry that some kind of fallacy of grey is going on here which loses despite being technically more accurate.

[Note: I'm not sure if this was your concern - let me know if what I write below seems off the mark.]

The most accurate belief is rarely the best advice to give; there is a reason why these corrections tend to happen in a certain order. People holding the naive view need to hear the first correction, those who overcompensated need to hear the second correction. The technically most accurate view is the one that the fewest people need to hear.

I invoke this pattern to forestall a useless conversation about whose advice is objectively best.

In fact, I think it would be a good practice to always before giving advice, do your best to trace back to the naive view and count the reversals, and inform your reader on which level you are advising. (This is surprisingly doable if you bother to do it.)

In fact, I think it would be a good practice to always before giving advice, do your best to trace back to the naive view and count the reversals, and inform your reader on which level you are advising. (This is surprisingly doable if you bother to do it.)

I quite like this.

[-]gjm60

This is in no way a substantive comment on the actual material here, but it amused me and might amuse someone else:

When I saw the title of this post on the front page, I thought "Hmm, I wonder whether that's ladders in the usual sort of metaphorical sense, or ladders as in Go -- perhaps there's some interesting analogy between those and teaching. Probably just the usual metaphor and nothing to do with Go." Then I started reading the article and saw the words "Kiseido Go Server" and thought "Huh, I guess it's some clever Go analogy after all." And then I read the rest of the sentence and was enlightened :-). A nice little seesaw, and I half-wonder whether the ambiguity in the title was deliberate.

The example of Go is worth diving into more detail. There's a type of game a master can play with a student which is explicitly called a teaching game, in which the master's goal is explicitly to optimize for the student's learning as opposed to winning. Done right these can probably be extremely educational, but they don't scale well since masters just don't have that much time.

Unfortunately I didn't notice that and can't think of a good way to make it work.

I think this works well for mental activities like math or Go, but it's much less effective for physical skills. Say you're trying to learn tennis as an absolute beginner. If you try to learn it from someone only slightly better than you, you will inevitably learn bad/incorrect/inefficient habits when you should have just learned from the 'master' to start.