The MIT Mystery Hunt is a collection of puzzles, solved in teams over a long weekend every year. The prize for winning is that your team gets to write next year's hunt. Mystery Hunt puzzles are generally designed to take a few hours for a few people. A hunt typically has around 100 such puzzles, organized into a dozen or so metapuzzles; the metapuzzles can typically be solved with only a subset of the answers from the puzzles for that round, so not every puzzle needs to be solved to win.
My team, Codex, won in 2011 and thus wrote the 2012 hunt, which has just concluded. I wanted to share some thoughts about the hunt, and also share one of the puzzles that didn't make it in, but that I think Less Wrong will appreciate.
Edward Z. Yang compared the process of solving puzzles to science. It's not always that way -- in particular, Duck Konundrum is the prototype of a class of puzzle which merely requires following a very complicated set of instructions, while Square Mess is a simple matter of programming (well, and univat n ovt rabhtu qvpgvbanel). But it's a pretty good way of looking at things.
This year, I was a puzzle editor as well as an author. One of the things I learned about puzzles is that authors always think their puzzles are solvable, whether or not they are. This is the Illusion of Transparency in action -- it's obvious to the author how the puzzle ought to be solved. One job of editors is to ensure that every aha is properly clued, and that there is internal confirmation that solvers are on the right track. Internal confirmation means that when there are two steps to solving a puzzle, the intermediate result contains something intelligible even with omissions or errors. For example, if an intermediate result is a set of trigrams, those trigrams should be plausibly English-like. In nature, internal confirmation comes naturally, since all of nature follows a single set of rules. But in a puzzle, the rules are entirely arbitrary, so internal confirmation must be added.
In past hunts, a number of puzzles went completely unsolved, because there wasn't a rigorous testsolving process. Some puzzles were released with serious undetected errors, and some puzzles were simply too hard. In 2012, every puzzle was solved forwards (that is, without inferring the answer from the constraints in the metapuzzle) at least once.
The only way to tell if a puzzle really works is to have some solvers test it. Of course, these solvers can't just be people picked off the street -- they should be familiar with the conventions of the form (for instance, when converting between numbers and letters, A=1, and A+A=B, generally). Sometimes specialized knowledge is needed; some of the puzzles I wrote could not have been solved by non-programmers, and one of Codex's puzzles which failed testing required a solver with perfect pitch. But generally, it should be clear from looking at a puzzle what kind of knowledge is needed (at least for the first step). Codex avoided the problems of the past by testing every puzzle. Every puzzle that wasn't solved cleanly (and some that were) got revised and tested until it either passed, or was cut.
One of the puzzles that failed testing was one that I wrote with Danielle Sucher and Emily Morgan: Write More. We think Less Wrong readers might appreciate it anyway, so I'm posting it here.