This post is an exercise in "identifying with the algorithm." I'm a big fan of the probabilistic method and randomized algorithms, so my biases will show.
How do human beings produce knowledge? When we describe rational thought processes, we tend to think of them as essentially deterministic, deliberate, and algorithmic. After some self-examination, however, I've come to think that my process is closer to babbling many random strings and later filtering by a heuristic. I think verbally, and my process for generating knowledge is virtually indistinguishable from my process for generating speech, and also quite similar to my process for generating writing.
Here's a simplistic model of how this works. I try to build a coherent sentence. At each step, to pick the next word, I randomly generate words in the category (correct part of speech, relevance) and sound them out one by one to see which continues the sentence most coherently. So, instead of deliberately and carefully generating sentences in one go, the algorithm is something like:
- Babble. Use a weak and local filter to randomly generate a lot of possibilities. Is the word the right part of speech? Does it lie in the same region of thingspace? Does it fit the context?
- Prune. Use a strong and global filter to test for the best, or at least a satisfactory, choice. With this word in the blank, do I actually believe this sentence? Does the word have the right connotations? Does the whole thought read smoothly?
This is a babble about embracing randomness.
Research on language development suggests that baby babble is an direct forerunner to language. You might imagine that infants learn by imitation, and that baby babble is just an imperfect imitation of words the baby hears, and progress occurs as they physiologically adapt to better produce those sounds. You would be wrong.
Instead, infants are initially capable of producing all the phonemes that exist in all human languages, and they slowly prune out which ones they need via reinforcement learning. Based on the sounds that their parents produce and respond to, babies slowly filter out unnecessary phonemes. Their babbles begin to drift as they prune out more and more phonemes, and they start to combine syllables into proto-words. Babble is the process of generating random sounds, and looking for clues about which ones are useful. Something something reinforcement learning partially observable Markov decision process I'm in over my head.
So, we've learned that babies use the Babble and Prune algorithm to learn language. But this is quite a general algorithm, and evolution is a conservative force. It stands to reason that human beings might learn other things by a similar algorithm. I don't think it's a particularly controversial suggestion that human thought proceeds roughly by cheaply constructing a lot of low-resolution hypotheses and then sieving from them by allowing them to play out to their logical conclusions.
The point I want to emphasize is that the algorithm has two distinct phases, both of which can be independently optimized. The stricter and stronger your Prune filter, the higher quality content you stand to produce. But one common bug is related to this: if the quality of your Babble is much lower than that of your Prune, you may end up with nothing to say. Everything you can imagine saying or writing sounds cringey or content-free. Ten minutes after the conversation moves on from that topic, your Babble generator finally returns that witty comeback you were looking for. You'll probably spend your entire evening waiting for an opportunity to force it back in.
Your pseudorandom Babble generator can also be optimized, and in two different ways. On the one hand, you can improve the weak filter you're using, to increase the probability of generating higher-quality thoughts. The other way is one of the things named "creativity": you can try to eliminate systematic biases in the Babble generator, with the effect of hitting a more uniform subset of relevant concept-space. Exercises that might help include expanding your vocabulary, reading outside your comfort zone, and engaging in the subtle art of nonstandard sentence construction.
Poetry is Babble Study
Poetry is at its heart an isolation exercise for your Babble generator. When creating poetry, you replace your complex, inarticulate, and highly optimized Prune filter with a simple, explicit, and weird one that you're not attached to. Instead of picking words that maximize meaning, relevance, or social signals, you pick words with the right number of syllables that rhyme correctly and follow the right meter.
Now, with the Prune filter simplified and fixed, all the attention is placed on the Babble. What does it feel like to write a poem (not one of those free-form modern ones)? Probably most of your effort is spent Babbling almost-words that fit the meter and rhyme scheme. If you're anything like me, it feels almost exactly like playing a game of Scrabble, fitting letters and syllables onto a board by trial and error. Scrabble is just like poetry: it's all about being good at Babble. And no, I graciously decline to write poetry in public, even though Scrabble does conveniently rhyme with Babble.
Puns and word games are Babble. You'll notice that when you Babble, each new word isn't at all independent from its predecessors. Instead, Babble is more like initiating a random walk in your dictionary, one letter or syllable or inferential step at a time. That's why word ladders are so appealing - because they stem from a natural cognitive algorithm. I think Scott Alexander's writing quality is great partly because of his love of puns, a sure sign he has a great Babble generator.
If poetry and puns are phonetic Babble, then "Deep Wisdom" is semantic Babble. Instead of randomly arranging words by sound, we're arranging a rather small set of words to sound wise. More often than not, "deep wisdom" boils down to word games anyway, e.g. wise old sayings:
"A blind person who sees is better than a seeing person who is blind."
"A proverb is a short sentence based on long experience."
"Economy is the wealth of the poor and the wisdom of the rich."
Reading is Outsourcing Babble
Reading and conversation outsource Babble to others. Instead of using your own Babble generator, you flood your brain with other people's words, and then apply your Prune filter. Because others have already Pruned once, the input is particularly high-quality Babble, and you reap particularly beautiful fruit. How many times have you read a thousand-page book, only to fixate on a handful of striking lines or passages?
Prune goes into overdrive when you outsource Babble. A bug I mentioned earlier is having way too strict of a Prune filter, compared to the quality of your Babble. This occurs particularly to people who read and listen much more than they write or speak. When they finally trudge into the attic and turn on that dusty old Babble generator, it doesn't produce thoughts nearly as coherent, witty, or wise as their hyper-developed Prune filter is used to processing.
Impose Babble tariffs. Your conversation will never be as dry and smart as something from a sitcom. If you can't think of anything to say, relax your Prune filter at least temporarily, so that your Babble generator can catch up. Everyone starts somewhere - Babbling platitudes is better than being silent altogether.
Conversely, some people have no filter, and these are exactly the kind of people who don't read or listen enough. If all your Babble goes directly to your mouth, you need to install a better Prune filter. Impose export tariffs.
The reason the Postmodernism Generator is so fun to read is because computers are now capable of producing great Babble. Reading poetry and randomly generated postmodernism, talking to chatbots, these activities all amount to frolicking in the uncanny valley between Babble and the Pruned.
Tower of Babble
A wise man once said, "Do not build Towers out of Babble. You wouldn't build one out of Pizza, would you?"
NP is the God of Babble. His law is: humans will always be much better at verifying wisdom than producing it. Therefore, go forth and Babble! After all, how did Shakespeare write his famous plays, except by randomly pressing keys on a keyboard?
NP has a little brother called P. The law of P is: never try things you don't understand completely. Randomly thrashing around will get you nowhere.
P believes himself to be a God, an equal to his brother. He is not.
I just re-read this sequence. Babble has definitely made its way into my core vocabulary. I think of "improving both the Babble and Prune of LessWrong" as being central to my current goals, and I think this post was counterfactually relevant for that. Originally I had planned to vote weakly in favor of this post, but am currently positioning it more at the upper-mid-range of my votes.
I think it's somewhat unfortunate that the Review focused only on posts, as opposed to sequences as a whole. I just re-read this sequence, and I think the posts More Babble, Prune, and Circumambulation have more substance/insight/gears/hooks than this one. (I didn't get as much out of Write). But, this one was sort of "the schelling post to nominate" if you were going to nominate one of them.
The piece as a whole succeeds very much as both Art as well as pedagogy.
Some aspects of this remind me of generative adversarial networks (GANs).
In one use case: The Generator network (Babbler) takes some noies as input and generates an image. The Discriminator network (sorta Pruner) tries to say if that image came from the set of actual photographs or from the Generator. The Discriminator wins if it guesses correctly, the Generator wins if it fools the Discriminator. Both networks get trained up and get better and better at their task. Eventually (if things go right) the Generator makes photorealistic images.
So the pruning happens in two ways: first the Discriminator learns to recognize bad Babble by comparing the Babble with "reality". Then the Generator learns the structure behind what the Discriminator catches and learns a narrower target for what to generate so that it doesn't produce that kind of unrealistic Babble in the first place. And the process iterates - once the Generator learns not to make more obvious mistakes, then the Discriminator learns to catch subtler mistakes.
GANs share the failure mode of a too-strict Prune filter, or more specifically a Discriminator that is much better than the Generator. If every image that the Generator produces is recognized as a fake then it doesn't get feedback about some pieces of Babble being better than others so it stops learning.
(Some other features of Babble aren't captured by GANs.)
Yes, and this concept and these connections have already been discussed in 5 or 10 different posts on LW and related blogs, see e.g. this though I won't bother to compile a full list.
(Note that I still like the post, converging on a "catchy" way to put an important concept is valuable.)
Cool, it seems like we're independently circumambulating the same set of ideas. I'm curious how much your models agree with the more fleshed out version I described in the other post.
I find the similarities between modern chatbots and the babble/prune model more appropriate. For example, the recent MILA chatbot uses several response models to generate candidate responses based on the dialogue history, and then a response selection policy to select which of the responses to return.
More generally, the concept of seperate algorithms for action proposal and action evaluation is quite widespread in modern deep learning. For example, you can think of AlphaGo's policy network as serving the action proposal/babble role, while the MCTS procedure serves does action evaluation/pruning. (More generally, you can see this with any sort of game tree search algorithm that uses a heuristic to expand promising nodes.) Or, with some stretching, you can think of actor-critic based reinforcement learning algorithms as being composed of babble/prune parts.
GANs fall into the Babble/Prune model mainly insofar as there are two parts, one serving as action proposal and the other serving as action evaluation; beyond this high level; the fit feels very forced. I think that from modern deep learning, both the MILA chatbot and AlphaGo's MCTS procedure are much better fits to the babble/prune model than GANs.
Curated for being a highly readable and memorable explanation of a useful concept. I especially like the concept handles “babble and prune” rather than “generate and evaluate/filter”. (Also extra awesome points for the post being an example of itself.)
This post opens with the claim that most human thinking amounts to babble-and-prune. My reaction was (1) that's basically right, (2) babble-and-prune is a pretty lame algorithm, (3) it is possible for humans to do better, even though we usually don't. More than anything else, "Babble" convinced me to pay attention to my own reasoning algorithms and strive to do better. I wrote a couple posts which are basically "how to think better than babble" - Mazes and Crayon and Slackness and Constraints Exercises - and will probably write more on the topic in the future.
"Babble" is the baseline for all that. It's a key background concept; the reason the techniques in "Mazes and Crayon" or "Slackness and Constraints" are important is because without them, we have to fall back on babble-and-prune. That's the mark to beat.
Wow, I really enjoyed reading this post. I am a bit unsure whether the post was just generally well written and I enjoyed the aesthetic experience of it, or whether it actually had a lot of insights. But I do actually feel that you've very compellingly explained an important concept. We will see how useful the concept will be over the coming weeks, but my immediate system 1 response to this post is very positive.
Thank you for highlighting that dichotomy; I pay a lot of attention to optimizing aesthetic experience over insight. A quote of Tolstoy's really stuck with me, "If I were told that I could write a novel whereby I might irrefutably establish what seemed to me the correct point of view on all social problems, I would not even devote two hours to such a novel; but if I were to be told that what I should write would be read in about twenty years’ time by those who are now children and that they would laugh and cry over it and love life, I would devote all my own life and all my energies to it." I'm not sure I quite agree with the first half of his sentiment, but I definitely agree with the second.
I weakly think this post should be included in Best of LessWrong 2018. Although I'm not an expert, the post seems sound. The writing style is nice and relaxed. The author highlights a natural dichotomy; thinking about Babble/Prune has been useful to me on several occasions. For example, in a research brainstorming / confusion-noticing session, I might notice I'm not generating any ideas (Prune is too strong!). Having this concept handle lets me notice that more easily.
One improvement to this post could be the inclusion of specific examples of how the author used this dichotomy to improve their idea generation process.
This post made a lot of things click for me. Also it made me realize I am one of those with an "overdeveloped" Prune filter compared to the Babble filter. How could I not notice this? I knew something was wrong all along, but I couldn't pin down what, because I wasn't Babbling enough. I've gotta Babble more. Noted.
Cool post! I've found that spending a few minutes away from all information sources and trying to think up new thoughts (optimized for not repeating anything I ever thought before) works well for me. After a minute my babble generator goes "okay, time to get creative!"
Update after two years: trying to think thoughts I never thought before (or words, images, sounds, etc) still works amazingly every time I try it. It's pretty much the key to creativity in every area. I find myself not doing it as often as I could, but when I know someone else will be interested (like when some friends want me to write lyrics for a song), it comes easily and naturally. So everyone, please demand more creativity from me and from each other!
Thanks for this awesome post! I like the babble/prune distinction, but the analogy to randomized algorithms was probably the more helpful idea in here for me. It made perfect sense, since a lot of probabilistic algorithms are really simple combinations of random babble and efficient pruning.
This analogy makes me wonder: given that many in complexity theory assume that BPP = P, what is the consequence of derandomization on Babble and Prune? Will we eventually be able to babble deterministically, such that we have a high guaranteed probability of finding what we looked for while pruning?
A slight issue with the post: I disagree that poetry is pure babble/phonetic babble. Some parts of poetry are only about the sounds and images, but many poems try to compress and share a feeling, an idea, an intuition. That is to say, meaning matters in poetry.
This post changed how I think about everything from what creativity is to why my friend loves talking one-on-one but falls silent in 5 person groups. I will write a longer review in December.
What alternatives to babble-prune algorithm are there? Every search algorithm with a vast search space (such as the space of all sentences) is going to look somewhat random, at least until you understand how it works. Ultimately babble is about selecting some candidates from the infinite search space and prune is about selecting good sentences from those candidates. The two steps are not that distinct. I propose that the reason you see a difference between babble and prune steps is that pruning is to some extent conscious while babble is always fully automatic.
I think you're right, this is essentially the main distinction, and that it is still an important one. In other discussion we've roughly agreed that useful filters in Prune are eventually pushed down into the subconscious Babble layer. Ultimately you don't have that much control over Babble but you do have control over in what direction and how tightly you Prune and that's where the leverage in the system is.
A good indication that this terminology is useful is that I immediately have an urge to use it to describe my thoughts. Specifically, does anyone else worry that less(er)wrong is particularly babble-unfriendly (particularly to the form of babble which involves multiple people)? And if so, is there anything which can/should be done about it?
EDIT: this probably should be in meta instead but I don't know how to delete it.
Yes, I have this worry. I'm trying to shift the norm by writing more poetic comments, and it looks like some other people are doing this too.
I've been thinking about this same idea, and I thought your post captured the heart of the algorithm (and damn you for beating me to it 😉). But I think you got the algorithm slightly wrong, or simplified the idea a bit. The “babble” isn't random, there are too many possible thoughts for random thought generation to ever arrive at something the prune filter would accept. Instead, the babble is the output of of a pattern matching process. That's why computers have become good at producing babble: neural networks have become competent pattern matchers.
This means that the algorithm is essentially the hypothetical-deductive model from philisophy of science, most obvious when the thoughts you're trying to come up with are explanations of phenomena: you produce an explanation by pattern matching, then prune the ones that make no goddamn sense (then if you're doing science you take the explanations you can't reject for making no sense and prune them again by experiment). That's why I've been calling your babble-prune algorithm “psychological adbuctivism.”
Your babble's pattern matching gets trained on what gets accepted by the prune filter, that's why it gets better over time. But if your prune filter is so strict that it seldom accepts any of your babble's output, your babble never improves. That's why you must constrain the tyranny of your prune filter if you find yourself with nothing to say. If you never accept any of your babble, then you will never learn to babble better. You can learn to babble better by pattern matching off of what others say, but if your prune filter is so strict, you're going to have a tough time finding other people who say things that pass your prune filter. You'll think “thats a fine thing to say, but I would never say it, certainly not that way.” Moreover, listening to other people is how your prune filter is trained, so your prune filter will be getting better (that is to say, more strict) at the same time as your straggling babble generator is.
I've had success over the past year with making my prune filter less strict in conversational speech, and I think my babble has improved enough that I can put my prune filter back up to its original level. But I need to do the same with my writing and I find it harder to do that. With conversational speech, you have a time constraint, so if your prune filter is too strict you simply end up saying nothing — the other person will say something or leave before you come up with a sufficiently witty response. In writing, you can just take your time. If it takes you an hour toncome up with the next sentence, then you sit down and wait that god-forsaken hour out. You can get fine writing out with that process, but it's slow. My writing is good enough that I never proofread (I could certainly still get something out of proofreading, but it isn't compulsory, even for longer pieces of writing), but to get that degree of quality takes me forever, and I cant produce lower quality writing faster (which would be very useful for finishing my exams on time).
I think I mostly agree and tried to elaborate a lot more in the followup. Could you provide more detail about your hypothetical-deductive model and in what ways that's different?
I've made a reply to your followup.
That's an odd definition of poetry. It seems, at least to me, that people want it to make sense, maybe whimsical, unapologetic, comfy, soaring, blunt or some other specific kind of sense, which is hardly a filter that maximizes "rational thought", but - pure rhyme and meter? Seriously?
Fair enough, that's definitely an oversimplification on my part. I think the broader point that plugging into a weird filter is a general method (perhaps THE method) of stretching your Babble generator still stands.
Have you read Impro? You'd like it, a lot.
Starting to read it, fantastic stuff. One initial comment:
One day, when I was eighteen, I was reading a book
and I began to weep. I was astounded. I'd had no idea that literature
could affect me in such a way. If I'd have wept over a poem in class
the teacher would have been appalled. I realised that my school had
been teaching me not to respond.
(In some universities students unconsciously learn to copy the
physical attitudes of their professors, leaning back away from the play
or film they're watching, and crossing their arms tighdy, and tilting
their heads back. Such postures help them to feel less 'involved', less
'subjective'. The response of untutored people is infinitely superior.)
I've always felt this is quite awful about mathematics classes, and very vindicated when I can get a bit of a reaction from people with a particularly nice proof. Tangentially related is an idea I've had that it might be possible and useful to build certain mathematical ideas or techniques into muscle memory.
Since reading this I've noticed a connection between overactive pruning and depression-y feelings in my own life. When I feel "depressed" (used in the non-clinical sense) and really pay attention, I notice that I still have many fleeting impulses but that some part of my mind is smothering them almost below the level of conscious awareness. Just noticing this seems to help let those impulses flow to action, which in turn seems to counter the higher-level feeling of depression. In particular, letting these impulses manifest in physical babble--doing silly things with my body, like falling over or dancing--substantially improves my mood. I wonder what this mechanism is.
Babble and Prune has stuck as a concept in my mind since reading this sequence. The need for both of them has informed how I approach intellectually personally and also in how I think about the communal process I try to support on LessWrong. Like so many things, it's something we need to juggle the balance between.
I read more than I talk, and yet I have almost no filter. This would seem to be a counterexample, although with a sample size of only one. However, this could be the result of other factors with the opposite sign of correlation - from one perspective, "not talking much" leads naturally to "having an underdeveloped filter". Alternatively, we're using the word "filter" differently. Perhaps I do have a filter (what I say is usually relevant), but my lack of social interaction/skills means that the filter is misaligned with social expectations or desires (it's not often what they want to hear).
This quote perfectly describes a problem of mine. I have however, already received advice in the same vein as what's mentioned: let out more of your ideas/words regardless of how good they are--my "good pruning" will eventually leave me with the best of my slew of ideas. While I can't concretely say that I've experienced great improvement in my thought-quality, I can say that I've been able to actually "come up" with ideas. Rather than, well staying silent and having none whatsoever. Upvoted.
Cool. This reminds me of something I've been thinking about lately: the default mode network. I plan to circle around at some point and say some things about it in the context of Robert Wright's Why Buddhism is True, but for now might as well point out the association so you can explore it on your own if you are interested.