I strongly suspect that there is a possible art of rationality (attaining the map that reflects the territory, choosing so as to direct reality into regions high in your preference ordering) which goes beyond the skills that are standard, and beyond what any single practitioner singly knows.  I have a sense that more is possible.

The degree to which a group of people can do anything useful about this, will depend overwhelmingly on what methods we can devise to verify our many amazing good ideas.

I suggest stratifying verification methods into 3 levels of usefulness:

  • Reputational
  • Experimental
  • Organizational

If your martial arts master occasionally fights realistic duels (ideally, real duels) against the masters of other schools, and wins or at least doesn't lose too often, then you know that the master's reputation is grounded in reality; you know that your master is not a complete poseur.  The same would go if your school regularly competed against other schools.  You'd be keepin' it real.

Some martial arts fail to compete realistically enough, and their students go down in seconds against real streetfighters.  Other martial arts schools fail to compete at all—except based on charisma and good stories—and their masters decide they have chi powers.  In this latter class we can also place the splintered schools of psychoanalysis.

So even just the basic step of trying to ground reputations in some realistic trial other than charisma and good stories, has tremendous positive effects on a whole field of endeavor.

But that doesn't yet get you a science.  A science requires that you be able to test 100 applications of method A against 100 applications of method B and run statistics on the results.  Experiments have to be replicable and replicated.  This requires standard measurements that can be run on students who've been taught using randomly-assigned alternative methods, not just realistic duels fought between masters using all of their accumulated techniques and strength.

The field of happiness studies was created, more or less, by realizing that asking people "On a scale of 1 to 10, how good do you feel right now?" was a measure that statistically validated well against other ideas for measuring happiness.  And this, despite all skepticism, looks like it's actually a pretty useful measure of some things, if you ask 100 people and average the results.

But suppose you wanted to put happier people in positions of power—pay happy people to train other people to be happier, or employ the happiest at a hedge fund?  Then you're going to need some test that's harder to game than just asking someone "How happy are you?"

This question of verification methods good enough to build organizations, is a huge problem at all levels of modern human society.  If you're going to use the SAT to control admissions to elite colleges, then can the SAT be defeated by studying just for the SAT in a way that ends up not correlating to other scholastic potential?  If you give colleges the power to grant degrees, then do they have an incentive not to fail people?  (I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.)  If a hedge fund posts 20% returns, are they really that much better than the indices, or are they selling puts that will blow up in a down market?

If you have a verification method that can be gamed, the whole field adapts to game it, and loses its purpose.  Colleges turn into tests of whether you can endure the classes.  High schools do nothing but teach to statewide tests.  Hedge funds sell puts to boost their returns.

On the other hand—we still manage to teach engineers, even though our organizational verification methods aren't perfect.  So what perfect or imperfect methods could you use for verifying rationality skills, that would be at least a little resistant to gaming?

(Added:  Measurements with high noise can still be used experimentally, if you randomly assign enough subjects to have an expectation of washing out the variance.  But for the organizational purpose of verifying particular individuals, you need low-noise measurements.)

So I now put to you the question—how do you verify rationality skills?  At any of the three levels?  Brainstorm, I beg you; even a difficult and expensive measurement can become a gold standard to verify other metrics.  Feel free to email me at sentience@pobox.com to suggest any measurements that are better off not being publicly known (though this is of course a major disadvantage of that method).  Stupid ideas can suggest good ideas, so if you can't come up with a good idea, come up with a stupid one.

Reputational, experimental, organizational:

  • Something the masters and schools can do to keep it real (realistically real);
  • Something you can do to measure each of a hundred students;
  • Something you could use as a test even if people have an incentive to game it.

Finding good solutions at each level determines what a whole field of study can be useful for—how much it can hope to accomplish.  This is one of the Big Important Foundational Questions, so—

Think!

(PS:  And ponder on your own before you look at the other comments; we need breadth of coverage here.)

3 Levels of Rationality Verification
New Comment
246 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Well, you asked for DUMB ideas, so here's mine. It has the advantage that I'm sure no one else will suggest it. This is based on an accidental discovery (so far as I know, unpublished) that one can compare two arbitrary documents for similarity (even if they are in different word-processor formats) by running them both through a recognizer built out of a random state machine and comparing bit masks of all the states traversed. The more common they are, the more states will be traversed in both.

So, lets assume we have a panel of highly rational individuals which are our control group. We generate a random multiple-choice questionnaire consisting of nonsensical questions and answers. Things like:

1) How Green is the Smell of Bacon?

a) 7.5

b) Neon

c) Introspection

d) Larger

You then do a correlation over how your panel of experts chose their answers and see if there is a common pattern. You then score students who take the test based on how similar to the common pattern they are.

Assuming this idea works at all, the advantage of this is that it would be extremely difficult to game. The disadvantage would be that it would penalize those who are significantly more rational than the 'norm'. It... (read more)

NOT CRAZY ENOUGH! We need EVEN STUPIDER ideas!

(Voted up for being the best try so far, though.)

5MichaelVassar
I think that this resembles the MMPI methodology. http://en.wikipedia.org/wiki/Minnesota_Multiphasic_Personality_Inventory
1[anonymous]
What is the MMPI supposed to test?
1[anonymous]
There are similarities. What I observed when doing an MMPI was that it seemed altogether gameable. I believe I have more than enough knowledge about psychology, including the type of metrics that MMPI uses, to more or less choose whatever result I desired.
1thomblake
I've actually proposed something like this to test for personality type. The main reason it never got implemented is there isn't really a good, workable theory of persistent personality.
0[anonymous]
That scares me! It sounds altogether too much like the famous beauty pagent, with a bit of "guess the teachers answer" and radomly generated poetry thrown in for good measure. I know I'd be far happier if it was shown to be a really stupid idea. I have a hunch, however, that a correlation of the kind you hypothesis would exist. The part that scares me is that there could well be more than one style of thinking of equal merit, with one being far more common than the other. Naturally the suspicion that I'd end up in the minority and downgraded for it is troublesome. There is more than enough of that sort of bias in schools already! Upvoted for being the right kind of idea and incidently my answer to the example question is a) 7.5. The other three make absolutely no sense while I acknowledge that there is a possibility (though it is improbable) that the way the brain functions could make a quantisation of said greeness at least have some meaning.
2swestrup
When I look at my question there, the only answer that seems appropriate is 'Introspection' as that's at least a step towards an answer.

Occasionally, well-respected community members could say things that are intentionally false, but persuasive and subtle, a la http://www.overcomingbias.com/2008/02/my-favorite-lia.html.

You get points for catching these mistakes. Perhaps you submit your busts privately to some arbiter so others have the same challenge.

Later, the error is revealed and discussed.

This would also have the benefit of causing everyone to read the most-respected members' writings ultra-critically, rather than sitting back and being spoon-fed.

One key thing this idea has is short term feedback. Frequent, rapid feedback is essential for getting good at this kind of thing. (IMO that's why economics is still so useless relative to the other sciences: the experiments take fifty years to run.)

5Jiro
This doesn't work, because people here say controversial things. By definition, controversial means that many people think they are wrong, but they do not think they are wrong themselves. Anyone who finds a mistake might have found one of the intentional mistakes, or might happen to disagree on a controversial issue and believes the community member made a mistake where the community member thinks otherwise. Unless you think that community members are perfectly correct 100% of the time on controversial issues or at least always recognize their own mistakes when pointed out to them (and no human being is like that), the idea will become unworkable. Everyone will have to think "is this an intentional misake, or is an unintentional mistake that the community member won't recognize as such, earning me demerits for pointing it out?"
6[anonymous]
There are objective ways of finding out some classes of mistakes. Fallacies are well-defined and most of them can be easily diagnosed. I often do this at Facebook to blow off steam. Even better: the website can accomodate for this. It's as easy as adding a "report logical fallacy" button next to each comment. Moderators can award points to all who noticed the correct fallacy. A leaderboard can be put up. It can be made a sport. Another benefit is that those who make mistakes receive detailed feedback. Edit: I'd like to learn why this was downvoted. How might I be wrong?
2DPiepgrass
Nothing makes me want to upvote someone like a downvote-without-comment on a post that seems vaguely reasonable.
3MBlume
I can see the need for anonymity to avoid spoilers, but I think doing the thing publicly has benefits too -- that way there's the risk on the other side of having publicly denounced the Great Teacher when he was speaking truthfully.
4Eliezer Yudkowsky
You could have private points subtracted off and that gives you the same incentive not to make uncertain accusations. Attach confidence levels and take Bayes-score.
3JGWeissman
With the Bayes-score being always negative, I don't see what incentive one would have to submit a mistake report. I think it would be better to test for better than, for example, 90% confidence, by awarding 1 point for a correct report and deducting 9 points for an incorrect report. This achieves the goal of detecting ability to detect bad arguments. Measuring calibration would have to be a seperate test.
1jyasskin
Treat not submitting a mistake report as the "I have no idea" claim: that you've assigned a probability of "mistakes/total emails" to this particular email being a mistake.

For 'hot' political and religious biases, create materials in which apparent advocates of different ideologies or parties are arguing for some particular empirical prediction, e.g. about the relationship between different tax rate changes and economic growth, with some predictions being right and some wrong. The subject then needs to make his or her own prediction about some easily-verifiable but obscure empirical fact related to the argument, e.g. whether a graph of GDP and tax rates matches Norway or Iceland.

Scoring would reflect the degree to which the ideological affiliation in the prompt biased the results. If it was being gamed you might need to add in scoring for accuracy. Challenges would be producing a large enough inventory of test items, keeping them secret, and the need to tailor tests to locally popular ideologies or ideologies of interest.

More surveys that study the relationship between knowledge about verifiable facts and values. What sorts of information do those with different values tend to have, and what are the values of those whose knowledge covers the pet facts of all camps? There is a fair amount of this literature in political science aimed at the electorat... (read more)

3Roko
Hot political/religious issues seem like a great way to tempt people into saying/believing irrational things. This is a good idea.
1[anonymous]
Very solid example of how to test for that bias.

People tend to compartmentalize. We need to bear in mind that anything we come up with that involves testing someone when they know they're being tested can only check how rational they can be if they put their mind to it, not how rational they are when they're not being tested.

5Roko
It is possible to test people for one thing, and claim that you are testing them for another thing. E.g. Asch's experiments wouldn't have worked if he had told people the truth about what he was testing for. As long as the person doesn't know they're being tested for rationality, it should be OK. You could test people for ability to make money, ability to get some task done, etc. http://www.overcomingbias.com/2007/12/aschs-conformit.html
4swestrup
I agree. The only solutions to this that I can see is to either not let students know when they are being tested, or to have a system of continual testing.

They key is probably to test someone without letting them know you are testing them. If I ran a martial arts dojo and wanted to make sure my students were really super badass ninjas, I would give them a convincing looking "test" that included things you would expect to see: strength, speed, form, technique, success in actual matches, etc.

This would have very little weighting in the actual grade, however. The real test would be some sort of surprise fight or fights where the student has no idea that the fight is actually one of the tests. Perhaps he (or she) is followed by the assailant until an opportunity to pick a fight arises.

The main advantage of the surprise test is that it is much hard to game. Imperfect metrics are much more likely to say something meaningful about the student in this surprise situation than if the student knows the test is coming.

When it comes to the rationality dojo, there are numerous normally easy-to-game heuristics that could be used, for example:

  • how susceptible the student is to group-think
  • what they do in some sort of strenuous situation (e.g., do they blow up the Huygens?) The situation must seem real to them.
  • are they willing to b
... (read more)
2[anonymous]
See artemis fowl and the butler training.
1[anonymous]
An insurmountable problem?

I think that the most important skill a rationalist can have is the ability to assess the quality of other rationalists, and to participate effectively in team projects. A measurement of individual rationality has to include how well a randomly selected team including that individual performs on team rationality tests.

So, I think that a rationalist 'decathlon' would consist of a variety of competitions between individuals and small teams including math/logic problems, general knowledge tests, cooperative and non-cooperative game theory games, prediction markets, and engineering challenges (egg drops, programming robots to compete in some arena, etc.)

But then there would be a second level, in which individuals and teams would compete in a prediction market in which they observe (by video recording) the deliberations of other teams on first-level problems and bet on their relative performance.

And even a third level, in which individuals observe the deliberations of second-level teams and bet on their performance in that second-level prediction market.

There are a variety of other things that might be interesting to measure - for example, what team sizes perform best, whether individual rationalism and team-participant rationalism are different skills, and whether team performance is best predicted by strongest member, average member, or weakest member.

1TheOtherDave
This is a brilliant idea.

I'm not sure why "teaching to the test" is so disparaged for its effects on the learning process. Obviously that is a different use for tests than evaluation of ability, as is the main goal here.

Studying for the LSAT taught me to feel genuine physical unease when I read a bad argument, then be calm it by the next problem. It's very hard to turn that off when reading the newspaper.

The third stage of my growth as a rationalist was discovering this site. I no longer go through the day thinking of things I read and hear: "Wrong (fallacy), wrong (incorrect premise), wrong (fallacy), true (but irrelevant)." Now it's more like: "Wrong (fallacy), not even wrong (internally inconsistent), wrong (map/territory confusion), wrong (fallacy), not even wrong (argument from definition)."

I propose thinking of ways to hijack the human mental machinery as an alternative to overcoming it, akin to what evolution does.

Hrm... Well, one initial notion I have is along the lines of this: Rationality training should improve how good one can become at other stuff, or at least improve ability to gain skills/etc in other fields.

So, maybe tests could be something along the lines of find various subjects/fields a student is unfamiliar with and basically assign them to "get some knowledge and skill in this field."

How efficiently students can basically bootstrap up into something they're unfamiliar with should vary with their rationality, right? So something like this may be a starting point.

(Yes, I can see a bunch of details that would need to be worked out, but seems to be that this notion may at least be somewhere to start for developing rationality tests.)

3MichaelVassar
I think Tim Ferris was going to display this ability as the theme of a TV show.
1[anonymous]
This biasses towards fast learners. A different problem.
[-]Emile160

Organize large games/contests where a lot of candidates are locked up in an area, and have a finite time to reach a certain point / find a certain object.

The exact rules would be specially designed each time for that years challenge, by a group of rationalists and game designers. So the details would vary, but some common themes would be:

  • physical prowess does not come into play (beyond maybe moving around faster, not getting tired as easily etc.)
  • some people would be liars / saboteurs, and not real candidates

For example, the candidates are blindfolded and brought into a large underground circular room, whose only unlocked exits are twenty slides along on the edge (so, one-way exit only). The goal is to take the exit that's due north.

Or, the players are dropped in a maze, and each player is given twenty balls with his name written on them. In the maze are tall glass tubes in which the player can drop their balls. The players know that at the end of the games everyone gets points for the balls with his name that are in "good" tubes (from 10 to 1 points, depending on whether his ball is at the bottom or top - only ten balls fit in a tube), and loses points for balls in &... (read more)

4Nebu
Voted up if only because this reads like a description for the first reality TV show I would actually want to watch.
4MichaelHoward
Here you go :) (and here's the kids' version)
1Charles Paul
Love this idea, here is another game: two teams, red and blue team. Blue team plays as computer scientists who are trying to build an AI to help them do something about an asteroid heading towards earth, (or some other extential threat that would justify building an AGI without knowing if its friendly) but they build it so fast they have no idea if its friendly. They win if they save humanity.   the read team plays as the AI, and gets a point for each paperclip in its future light cone.   you would have to have rules like: the AI is contained in a box, the AI must execute all orders given to it by the blue team, etc. 
1[anonymous]
Fascinating concept.

Frank Mager, in various books, including "Preparing Instructional Objectives", suggests working backward from evidence that would make you conclude that someone is, e.g. a Bayesian Master Rationalist, to the tests (and instructional objectives) for a course of instruction intended to turn someone into a Bayesian Master Rationalist (or whatever you want to turn them into).

3pjeby
After skimming some of his stuff on Amazon, I bought the whole "Mager Six-Pack" and am eagerly devouring it. I can already tell it''s going to make a huge difference in the way I teach mind-hacking. One of the first ones I read, Goal Analysis, is particularly relevant to LW discussions: how to turn "fuzzies" (abstract qualities, adjectives, and adverbs) into concrete, measurable specifications of behavior. One minor catch: goal analysis can't make people magically agree on the True Meaning of a term, it can only expose the things they do or don't agree on... ...which probably makes it an incredibly valuable Rationality Tool in its own right. Anyway, thanks for mentioning Mager's books -- I'd never heard of them before your comment.
2Eliezer Yudkowsky
Example?
7Johnicholas
Telephone operators were supposed to have good "tone of service". So then the education people asked "What does good tone of service mean? What evidence would help you conclude whether an operator has good tone of service?" And drilling down, they found that there was an entire list of behaviors implicit in the phrase "tone of service", like inflection as the operator reads the standardized phrases, such as "I'm sorry". One of the behaviors amused me - no banging - that is, hitting the telephone handset against something, presumably in anger at a frustrating customer. So you can test for "good tone of service" by testing the observable behaviors. If your concept of a Master Rationalist includes an "aura of competence", then probably we can break that down into concrete evidence that would cause you to conclude that someone has an "aura of competence". The concrete items become instructional objectives. If evidence that someone failed a bias or calibration test would cause you to conclude that they're NOT a Master Rationalist, then passing the bias or calibration test can be one of the instructional objectives.
4MichaelHoward
Bearing in mind the human tendency to favor authority over quality given a choice between the two, I think it's important when testing to distinguish between "aura of competence" and ability to achieve useful results, and after testing to connect the former to the latter.
5Johnicholas
Right. EY has mentioned a couple of times that he expects graduates of the hypothetical Rationality Dojo to exude their abilities, like Taking a Level in Badass, or his hedge-fund elites. I want to clarify that I do not agree with this notion, and I suspect that individuals who exude preternatural skills are primarily good at exuding, not at performing. The example was just an example.
1[anonymous]
Bearing in mind the human tendency to favor authority over quality given a choice between the two, I think it's important when testing to distinguish between "aura of competence" and ability to achieve useful results, and after testing to connect the former to the latter.

Compile a large enough database of historical events that nobody could memorize more than a fraction of it. For the test, choose a few events at random, describe the initial conditions and ask the candidate to predict the outcomes.

2pcm50
This is a good idea. Though I think that the condition that 'nobody could memorize more than a fraction of it' is actually quite hard to meet. E.g. legal training seems analogous, and lawyers seem to be able to remember a lot of examples. If the corpus could be kept secret or ever changing that might help. When I was thinking of something similar, I had a concern about the task length. E.g. will this result only in relatively short or simple tasks?
1[anonymous]
That would actually work.

Carry around a notepad, form probabilistic opinions on lots of little questions that you can find out the answer to soon after, record all the probabilities assigned to correct answers, where applicable add tags like "politics", "project completion", "my social status", "trivia", put into a spreadsheet or something and see if you're miscalibrated globally and for different tags.

1Fhyve
This can get gamed pretty easily though, by selecting things that you have more previous knowledge of or know the actual probabilities of over things that you know are more likely to be wrong.... realization Except that that could be exactly the point, the ability to identify what you know you are likely to assign accurate probabilities for and identifying when you aren't as likely. However, there still is the problem of just not reporting certain things to boost your scores. There could be something that takes into account or measures the ability to identify when you are likely to be wrong.
3ejstheman
If you break the habit of claiming confidence you don't really have, to improve your score, then it seems the exercise has had the intended effect, no?
0datadataeverywhere
Or: guess confidence intervals. 95% might not be as useful as 50%; test yourself not only on how often you are under or over, but make sure that 50% (or %5) of the time it is outside the range you guessed. If you try to guess things that you're really sure about, this forces you to quantify how sure you are about that, and makes those guesses no more or less useful than those that you are much less sure about.
1[anonymous]
How do I tell?

Here's a stupid idea: Evaluate people by auditing their domiciles. I've read (and from personal experience, I believe it) that you get really solid insight into someone's personal qualities by inspecting their home, as good as interviewing them and all of their friends and family. (I googled a bit, but I can't find the source.)

Anyway, it can probably be gamed.

4[anonymous]
deleted
2David_Gerard
Heh. I have recently applied this to our house, which is remarkably better after just a few months, and visitors remark upon it. Doing so is the origin of this rant, which is made of hard-won anecdotal experience.
1[anonymous]
That's a test women do. I game it.
[-]MBlume110

Here's an immoral one: crack a rationalist

Most, if not all, human minds are vulnerable to hacking, eg by cults, religions, pseudoscience, etc. The minds of rationalists should be harder to hack than others.

Make a copy of a (would-be) rationalist, subject the copy to significant emotional stress, and then send missionaries his way.

The myths carried by the missionaries should be invented for the challenge so everyone can agree that they are false, but should, of course, be significantly more plausible than today's religions.

Make a copy of a (would-be) rationalist, subject the copy to significant emotional stress, and then send missionaries his way.

Moral qualms aside, we should probably have a back-up plan just in case we don't solve human uploading before we want to start testing.

3JGWeissman
"crack a rationalist" made me think of the AI-Box Experiment ("http://yudkowsky.net/singularity/aibox") Maybe a rationality test could be something like how long the subject lasts as the gatekeeper before letting the AI out.
3gwern
What ciphergoth said. Also, we can't derive an 'ought' from an 'is' - we don't actually know whether letting the AI out is the right thing to do (unless the contest had a stipulation that the AI was evil and the box keeper knew it, which I don't remember being the case). Perhaps the rational thing is to let the AI out! Further, this could also just be a test of stubbornness or patience. Which aren't neither of them rationality. But good try anyway.
3JGWeissman
For the first objection, that the AI Box experiment has too many unknowns, let us instead construct an argument based on psychological tricks for any bad conclusion to try on the subject. For the second objection, that this tests stubbornness rather than rationality, use a sequence of tests, some using tricks to argue for false conclusions, and some using Bayesian evidence for a good conclusion. The score should reward being convinced when, and only when, the subject should be convinced. Stubbornness can only meet half this requirement. The task of compiling arguments of both types, which would not be readily available to the subject ahead of time, remains.
2Paul Crowley
The means by which EY persuades people to let the AI out of the box are secret. We shouldn't draw any conclusions from that experiment except that it is plausible to think a boxed AI could talk its way out of the box.
1[anonymous]
Brilliant.
1Roko
The same complaint applies to this comment as to the wife-cheating test. It may actually (under certain really bad circumstances) be rational (in the "winning" sense) to believe in religion.
[-]MBlume480

I'll be honest -- my life has taken a sharp downturn since I deconverted. My theist girlfriend, with whom I was very much in love, couldn't deal with this change in me, and after six months of painful vacillation, she left me for a co-worker. That was another six months ago, and I have been heartbroken, miserable, unfocused, and extremely ineffective since.

Perhaps this is an example of the valley of bad rationality of which PhilGoetz spoke, but I still hold my current situation higher in my preference ranking than happiness with false beliefs.

You have my sympathy and my praise.

If anyone's unusually good at deconversions, there might be a market for deconversion attempts aimed at the friends and family of atheists.

[-]MBlume480

Thank you. You taught me (a large chunk of) everything I know, so that means a lot.

Honestly, thinking back, I suspect the best opportunity I ever had to deconvert her was when I myself did not yet identify as atheist -- when the crisis of faith was still in full swing. I'd have been perceived as sharing my doubts, rather than as "attacking" her with arguments.

Of course, back then I feared atheism -- I saw it as something terrible happening to me, that I should avoid doing to her. If I'd done a better job of leaving a line of retreat, I might have made better choices -- I might have shared each doubt as it occurred to me, instead of winding up 30 inferential steps removed from the woman I loved.

(And no, explaining that there is an inferential distance between you greater than is likely to be encountered in the ancestral environment really does not help in a fight)

I've been thinking lately of trying to write something addressed specifically to those beginning to question their religions. Life doesn't come with save points, but standing at the spot you went wrong, calling out advice to passers-by seems like the next best thing.

5DSimon
That last sentence is just ludicrously dense with both important advice and good tips for game design. It's excellent, is what I'm saying, and thanks for writing it. :-)
0MBlume
Thanks XD
1[anonymous]
Please do.
5MartinB
Isn't there already one to get people out of not widely accepted cults? The market might explode once public perception changes.

My empathies: that happened to me about 6 years ago (though thankfully without as much visible vacillation).

My sister, who had some Cognitive Behaviour Therapy training, reminded me that relationships are forming and breaking all the time, and given I wasn't unattractive and hadn't retreated into monastic seclusion, it wasn't rational to think I'd be alone for the rest of my life (she turned out to be right). That was helpful at the times when my feelings hadn't completely got the better of me. I suppose we can be haunted by stuff that is real.

4MBlume
Thank you. I've been struggling with that haunting myself. I think part of the problem is that when you're in a relationship long enough, you wind up with a term in your utility function for that person. And even if you know you could wind up with someone objectively better, better suited, the outcome doesn't seem like good news to your mind. A job for self-modification, I suppose, even if it's the slow, manual kind. Very glad to hear she was right =)

There are two problems with measuring rationality, one of which is difficult but manageable, the other of which might be insurmountable. The first problem is that most conceivable tests of rationality require using information from other fields (such as finance, physics, or psychology), such that you can gain a considerable advantage on the test by studying things from that field which don't actually make you more rational. This can be solved with sufficient cleverness.

The second problem is that how rational someone is depends on how well they maintain it under stress. Pressure, fatigue, emotionally charged situations, alcohol, and/or deliberate manipulation, can make the best rationalists act completely insane. (About a year ago, I went on a reality television show, which was in a way like a series of rationality tests. I didn't do all that well, rationality-wise, but some people who should have known better did dramatically worse.)

1patrissimo
Yes, the maintaining under stress aspect is key. This is a large part of why poker is hard - it has many characteristics which maximize stress by triggering bad primal instincts.
0handoflixue
This suggests a very easy way of inducing conditions appropriate to a more thorough testing of rationality. Any student who insists on leaving (which I think you'd be ethically obliged to allow for) would receive a failing grade. See how well the rest manage to be rational despite the circumstances. This one is probably also eminently doable, especially in a casual setting. I'm sure enough people would object to "Binge drinking night" that you couldn't make it a course requirement in modern-day US, alas. (There's possibly also more ideal drugs than alcohol for these purposes - at a minimum, given individual reactions and tolerances vary, using a variety of pharmaceuticals would probably reduce noise some)
1beoShaffer
I'm not sure how well this would carry over to mental stuff, but I know that some martial arts schools and many police and military organizations use physical exercise to create fatigue and/or adrenaline highs during training.

Give the students sodium pentothal and ask if they're one of the top 50% of rationalists in their school. However many out of 200 say 'no', that's the school's percentage score. Schools scoring over 100% are thrown out for cheating.

3JGWeissman
A school that reports to each student their class ranking easily games this test. The test could even favor schools that don't teach students enough to question an arbitrary class rank. Also, this doesn't consider the possibility that students can be good rationalists, but don't interact with enough of the other students to make a good assessment of their relative strengths.
5Eliezer Yudkowsky
Good rationalists, taken as a group, shouldn't be systematically optimistic.
[-]pjeby110

Good rationalists, taken as a group, shouldn't be systematically optimistic.

They should be if they want to win in practice, as opposed to just getting theoretically-correct answers. See, e.g., the studies referenced in Seligman's "Learned Optimism", that show optimists consistently out-perform pessimists (i.e., realists) in a wide variety of fields and endeavors.

(Of course, Seligman's definition of optimism may be different from yours.)

1JGWeissman
Perhaps we can still test for this systematic optimism, while filtering for the noise I objected to, by instead of asking a "yes" or "no" question, asking for the probability that the student is in the top 50%. Treat the sum of these probabilities as the count of "yes" answers in the original version. Then a rational student should be able to account for his ignorance of other students in his answer.
0jschulter
This is even easier to game: assuming the school has any merit, any individual you ask should have good incentive to simply say "50%" guaranteeing a perfect score. The very first time you used the test it might be okay, but only if nobody knew that the school's reputation was at stake.
1[anonymous]
haha
[-]MBlume100

Ask a thousand married rationalists of a given school to estimate the probability that their spouses have cheated on them. Confidentially ask their spouses if they have. Measure group calibration.

ETA: This applies to any potentially painful, but verifiable question. Ask them to draw a probability distribution over their date of death, or the longevity of their marriages. Estimate the probability of various kinds of cancer appearing over the next (5,10,15) years, etc. etc.

5Roko
I've thought of a problem with this: if rationality is about /Winning/, then it may be rational to not consider the hypothesis that your wife cheats on you. You may better serve your preferences if you remain in blissful ignorance. Also, human relationships have a very Newcomb-like feel to them, because other humans are very good at ascertaining your true beliefs. If you are entertaining the hypothesis seriously, your wife will probably detect it. So in this case winning and having a map that accurately reflects the territory may be anti-aligned.
3MBlume
There is a difference between wanting not to be a cuckold and wanting not to believe that you are a cuckold. I want the former. Presumably, if you are entertaining the hypothesis -- at least beyond a societal average, or some such -- there is a root problem already in play. But yes, this does have some self-fulfilling aspects which make it rather hard to model well.
2Roko
For me the biggest problem is that many people's preferences will be: (a) wanting to not be cheated on AND (b) wanting to trust the other person so much that the possibility doesn't even arise. i.e. your preferences in this area are a function of your own mind-state.
2MBlume
On introspection, this does agree with my preferences, yes. That does complicate things -- I'm not sure how to resolve this one. I think we are using the world "rationalist" to cover too many meanings. One highly socially useful meaning for the word would be "person who can be reliably expected to speak the truth". Whatever you choose to call those, it'd certainly be useful to have some around for any society you'd like to build. We would want to have some tests to identify them.
2swestrup
You'd have to define 'cheated on'. A fair number of the most rational folks I know live in non-traditional marriage arrangements.
4MBlume
This is entirely true. We're going for emotional effect, so on that test, I'd keep it to the self-identified monogamists
1[anonymous]
Perhaps because they realise the real probability of cheating.

(I consider it drop-dead obvious that the task of verifying acquired skills and hence the power to grant degrees should be separated from the institutions that do the teaching, but let's not go into that.)

Was/are there any organizations that are just dedicated to verifying rationality skills? CFAR tried to do both IIRC. Seems pretty bad if there haven't been any attempts at this even.

CFAR tried to do both IIRC.

According to me (who worked at CFAR for 5 years) CFAR did approximately 0-rationality verification whatsoever. 

Indeed, while that would be crucial to the kind of experimental rationality development that's described in the Craft and the Community, it isn't and wasn't a natural component of CFAR's functional strategy, which was something more like rationality community-building and culture-building.

[I hope to write more about what CFAR did and why, and how it differed from the sort of thing outlined in the Craft and the Community, sometime.]

1John Steidley
I'm currently one of the four members of the core team at CFAR (though the newest addition by far). I also co-ran the Prague Workshop Series in the fall of 2022. I've been significantly involved with CFAR since its most recent instructor training program in 2019. I second what Eli Tyre says here. The closest thing to "rationality verification" that CFAR did in my experience was the 2019 instructor training program, which was careful to point out it wasn't verifying rationality broadly, just certifying the ability to teach one specific class.

Use small-scale, limited-term betting markets with play money.

Put the group of people you want to rank relative to each other into a room - without internet access. Everyone starts with 0 points. People are ranked on how many points they have at the end of the test.

Participants make bets (for points) with each other. There's a time limit for settling those debts; all bets made have to be specified in a way that clearly determines the winner within a fixed period after the end of the test. Of course, bets that can be settled immediately (e.g. on current tri... (read more)

1steven0461
Good idea. It could work online if there's enough trust between participants.
1Sebastian_Hagen
As an addendum, I think the whole thing could still work pretty well even if everyone is explicitly allowed to use the web (or any other data store) for research. Bets that can be settled with immediately available information won't be very useful in that context, of course; but you could still bet on near future events. Speed research would be a valuable skill in this variant. Nevertheless, if you have any significant domain specific knowledge useful for making a short-term prediction, that should give you an advantage over someone speed-researching the topic before deciding if they want to make a specific bet on it against you. The real problem is that access to the internet (or any nontrivial subset) also allows you to do realtime communication with other humans, so you might convince/hire a master rationalist to offer you advice during the test, which would be an extremely effective way to cheat.
0rysade
A fairly simple windows application could nearly eliminate the problem of research during the test - if it were timed. Each round being timed would allow little time to bypass the lockdowns that can be imposed through a windows API. Each time the test is given, a new version of the test software would be released Even the fastest hacker would be locked into taking the test!

Well, there's always the idea of using fMRI scans to determine if someone is thinking in 'rational' patterns. You stick them under the machine and give them a test. You ignore the results of the test, but score the student on what parts of their brains light up.

[-]Roko70

Clearly real life achievement correlates well with rationality, by definition. So an impractical but "gold standard guaranteed" test of rationality would be to wait until the person in question got to the age of, say, 50, and check to see whether they had made lots of money, or achieved other obvious life goals (fame, for example).

A more specific good test of rationality is the world of startups. Other than the OB/LW community, the entrepreneurial world is the closest to perfect rationality I have found. You could test someone in a month or so b... (read more)

2[anonymous]
Not by definition.

I don't see what I thought were the obvious answers, so here they are. The foundations are elsewhere on the site, but they seemed missing from this list.

Reputational: Expect Bayesian masters to participate in other scientific fields. People who make more discoveries in other fields get more street cred among rationalists, especially when they can explain how rationalism helped them make the discoveries. Obviously, this is a long-term process that doesn't lend itself to improving the art quickly.

Experimental: This one's a two-step process. First, ask a larg... (read more)

-1[anonymous]
Note that for some of them, leaving the career track altogether might be the rational choice.

"Piggyback" on other tests: ask people taking part in various tests (standardized exams, sport competitions, driving lessons, programming contests, art exhibitions - whatever) their chances of success (or their probability distribution over the range of results).

The other items should themselves be important enough, so it would fit well with a university cursus, so that it can be "automated" for a lot of things. The way of asking for predictions should be made so as to maximize bad predictions: for example the students are asked to give... (read more)

[-]haig60

There is a recent trend of 'serious games' which use video games to teach and train people in various capacities, including military, health care, management, as well as the traditional schooling. I see no reason why this couldn't be applied to rationality training.

I always liked adventure style games as a kid, such as King's Quest or Myst, and wondered why they aren't around any more. They seemed to be testing rationality in that you would need to guide the character through many interconnected puzzles while figuring out the model of the world and how b... (read more)

3steven0461
Google "interactive fiction".
2rysade
I just finished playing a side-scrolling game called Closure (http://www.closuregame.com) that has some qualities of Myst, et al. I think that you've got a good idea here, but a problem could arise from the 'death penalty' that most games impose. Typically, you just restart the 'mission.' Games that operate like that don't provide quite enough incentive to pull out your whole intellect. If the player knew ahead of time that a single failure meant permanent loss, they would be more apt to give the game effort enough to have their rationality tested accurately.
0handoflixue
That would be the RogueLike genre, of which NetHack is a pretty good example of "painful trial and error to learn how the world works". Most successful players just go online and read the spoilers, and I'd argue that this is the more rational approach - it's irrational to go out and pay the price of failure when someone else has already done that for you, and you can learn from them. Besides, most people don't find that sort of trial and error game play fun, which I think is a fairly important consideration if you're trying to teach people.
1[anonymous]
Good idea. What details would you be able to convey?

I'm not sure if this has already been said, but does the "biases" literature not already contain a lot of perfectly good (although probably overly game-able) rationality tests? Just pick an experiment at random from Tversky and Kahneman and see how well the people in the school do.

Of course, there is a problem of people learning how to do some of these tests, but I'm pretty sure there are some that could be reworked so that they're pretty damned hard to pass even if you're well-acquainted with the literature. I'm thinking particularly those wher... (read more)

2zaph
Shouldn't the rationality school suggested by Eliezer, though, be able to train someone to be able to do well on these tests, by essentially becoming very familiar with the literature? Just devil's advocating against your devil's advocation; it seems like this would actually be pretty ideal, as you have scientifically benchmarked tests that show what let's say "naive" individuals think when encountering these problems, from where you could then see progress from the "trained" rationalists. The problem with gaming this system would be with people who are studying rationality but plan to subvert it at some point; the rationalist community would need to have frequent re-certifications so that rationalists don't rest one their laurels and rely on status to convey and inferred rationality of the decision.
5Eliezer Yudkowsky
The problem is if they do well on written questions in classes but no better than average at applying the same knowledge to real life.
2bogdanb
This is a problem with “class tests” of anything, of course. I've thought (more than five minutes) on your post, but I didn't come up with much specifically about rationality testing. (Except for “automatically build arbitrary but coherent «worlds» automatically, let students model them and the check how well their model fits «reality» afterwards”, which is an obvious application of the definition, and has been suggested already several times.) I've come up with a few thought on testing in general: 1) As you say, cheap-but-game-able tests are often useful; we do have useful universities despite the problem of Us awarding diplomas to their own students. I think this is more than just “works well enough”, in some case it's actually useful: (a) Having good tests (e.g., by a third party) requires defining well in advance exactly what you're testing. But in many cases it can be useful if a school experiments with what it teaches (and even why), and the only test needed is internal. (b) In many (most?) cases, you can't really test some ability until you really try using it. There are plausible cases where a quick-and-dirty (but cheap) test (e.g. university diplomas) is needed only to pre-select people (i.e., weed out most incompetents), and then get to real testing doing actual work (e.g., hiring interviews and tests, then probation period). If you make the initial test «better» (e.g., harder to game) but more expensive you may be actually loosing if it's not «better» in the sense of accurate for whatever you need people to be good in. OK, now I'm getting to what you're saying about doing good in class but bad in real life. It seems an obvious solution that you should actually be doing the testing in real life: first weed out the bad as well as you can with an approximate test (how good you do on this tests your map against reality), then “hire” (whatever that means in the context) people who look promising, make them do real work, and evaluate them there. You don't h
4bogdanb
Oh, and another thing that seems obvious: change tests often enough that they can't be gamed. This is of course hard and expensive, which is why it isn't done very often.
0rysade
I had a similar idea, but I'm still not sure about it. Succeeding in Real Life does seem like a good measure, to a point. How could one gauge one's success in real life, though? Through yearly income, or net worth? What about happiness or satisfaction?
2thomblake
You have to admit that's an empirical question, though. It could be that getting the competence to do well on rationality tests requires the same skill as applying the same knowledge to real life. There are some areas where 'fake it till you make it' works, and there are some things you can't pretend to do without actually succeeding in doing the thing.
1[anonymous]
Test for real life? Ouch.

(haven't looked through comments, so this may have been suggested many times over)

In a college-level rationality course, it would be most appropriate for a portion of the grade to be determined by an artificial economy. That is, set up a currency and a (relatively even) starting distribution, add (probabilistic) opportunities for investment (perhaps linked to other important parts of the course) and, most importantly, make defection possible, anonymous and easy. Make it, as much as possible, like a vast array of one-shot (or known number of iterations) P... (read more)

1Will_Newsome
What's the starting rationality level of the students? Traditional rationality level or post-Sequences level?
1orthonormal
I'm assuming an introductory type of class, for students with some scientific background but no rationality training. (Where on earth would you find a college class full of post-Sequences people?)

I'm tempted to say "have them play poker", except it uses lots of domain-specific knowledge as well as general rationality. Perhaps if you could generate random games from a large enough space that people don't build up game-specific skills, and the games just end up testing general rationality? While poker-like games don't test all aspects of rationality, there are some things like "ability to keep making good decisions when frustrated / bored / angry" that these games test very well.

I think people would develop skill at the whole class of games...but at the same time, they would be improving their rationality.

[-][anonymous]50

Maybe there is a simple thing, which rational people can't do - always get wrong.

Some not very good examples could be:

Skipping with closed eyes.

Telling a lie to a stranger without it being discovered

Saying - "Ooops, I' m wrong," quickly enough

Going to church and sitting thru' a whole sermon without getting very very upset

Multi-tasking

Irony

Understanding metaphors metaphorically.......

7Rings_of_Saturn
Yeah... I can't think of any good actual examples either, but maybe we should be trying to falsify rationality, rather than verify it.
3infotropism
I don't know if any of those particular suggestions would work, but the general idea is interesting, no one else suggested testing a negative correlate of rationality I think.
1[anonymous]
Huh? Those are mostly independent of rationality.
[-]Roko50

Another key feature of [edit] group rationality is the ability to not be swayed by what the social group thinks.

There are simple experiments (though I cannot think of the relevant keywords) where a test subject is put in a room full of confederates, all of whom estimate one line segment to be longer than another when the two lines are in fact the same length.

EDIT: Conforming to the group opinion (on average) increases the probability that you are right, thus improving individual truth-tracking. But adding more conformers to the LW community just screws i... (read more)

2Mike Bishop
You're thinking of Asch's experiments. Apparently, they are widely misrepresented: http://webpage.pace.edu/yrafferty/Yvonne/AschConformityStudy.pdf See also: http://www.hss.caltech.edu/~jkg/Conformity.pdf (I don't remember where I found these... possibly through OB)
4Roko
You are the proud recipient of a gold-plated uniform distribution on a finite set. Congrats. Since my comment has been downvoted to 0, I assume that the LW community likes people who go along with the group opinion even when they know it is wrong? Perhaps people are unsatisfied with this as a rationality test because they think that the test should focus on getting as close to the truth as possible (in which case conforming is good in most cases for most people) rather than adding value to a rationalist community (in which case conforming just because everyone else does is actively hurting the community). Also, having skimmed the pace.edu link, I am unconvinced that Asch's results are being misinterpreted, at least by me. Asch found that, in the situation of overwhelming evidence, only 25% of subjects could be trusted to consistently call things the way they really were, i.e. 25% of the subjects pass what I would call the absolute minimum standard of rationality over social conformity. Note that Carl's link to the OB article gives us a more nuanced version of this debate, which I recommend. "Paul Crowley reminds me to note that when subjects can respond in a way that will not be seen by the group, conformity also drops, which also argues against an Aumann interpretation."
3MichaelHoward
Hasty generalization/Belief in the law of small numbers
1Roko
yeah, OK, it's only 1 person's opinion, I'll wait and see what happens when more time passes and more people get the chance to vote. In defense of my interpretation... few comments get downvoted to zero, so even a small amount of time at zero is fairly significant evidence that people don't like what you're saying.
1MichaelHoward
...and here's the video (the one in the OB link is dead).
1CarlShulman
http://www.overcomingbias.com/2007/12/aschs-conformit.html

Reputational: D&D.Sci.

Experimental: D&D.Sci, with a consistent limit on time & resources used.

Organizational: D&D.Sci, with a consistent limit on time & resources used, using freshly-baked scenarios you know no-one has ever played before.

Limitations:

  • Takes several hours to play most scenarios.
  • Requires generic coding/spreadsheeting/data-science-ing skills in addition to Rationality; people who are good at those skills get an unfair(?) advantage.
  • Getting familiar with the genre gives an unfair(!) advantage.

Misc. addl. reflections on the top... (read more)

Let's see...

  • Prediction contests are an obvious one.
  • Also, perhaps, having people compete at newly designed games, so that everyone has the same amount of time to learn the rules and how to win, given the rules.
  • Perhaps we could design puzzles that intentionally have places where one would make a mistake, error, or wrong choice, and such errors are visible (to an observer who knows the puzzle) when made.
[-][anonymous]40

deleted

6MichaelVassar
I rate fairly poorly by these metrics. That makes me suspect that people like me also do. I see that this comment has been poorly rated and hope that people haven't rated it poorly for being unflattering. If you have done this, please rate it back up, OK.
3John_Maxwell
I'm pretty sure Rational Man never buys a book he can borrow for free from the local library.
3MBlume
I certainly don't mean to refer to myself as a candidate for Rational Man, but I do like owning books. Especially textbooks, I would not want to go down to the library every time I wanted to go through my copy of Sakurai. But even old favorite novels, it's good to have them on the shelf, ready to throw in a saddlebag at a moment's notice before a long train ride.
2beriukay
I know of some other stupid tests for rationality, borrowed happily from Invader Zim. 1. Absorbency 2. Electrical Conductivity 3. Something involving a beaver and a toy taxi. On a less stupid note: Reputationally, I have an explicit agreement with one of my friends that we fact check each other. This was actually a one-way fact checking until fairly recently when he asked me why I didn't call him on something he later realized was total bullcrap. Note, this works best if you actually have a good memory and aren't pickling your brain with alcohol. It also seems to help check the mindkilling effects of disagreement. A long time ago, I was reading about critical thinking, and was presented a relatively short list of questions to try and use to stimulate critical thought. Questions of this nature could be used in some form of standardized test; or could be used to build a portfolio of rationale behind opinions on all manner of things, which could be graded by peers or instructors (preferably ones who also aspire to rationality, and disagree). I suppose the portfolio would be more organizational than experimental, and almost as easy to game as cheating on essays. But those were my main thoughts before reading the cool ideas other people came up with. In case you're interested, this was the list as I transcribed it: Oh, and after reading the Logic of Failure, maybe running simulations like they did with the Sim City-like vibe, or the optimizing bug population or the refrigeration tests could be instructive. Even after learning about them, (especially the city planning and the African tribe) they may be sufficiently complicated to be of experimental or organizational value. On the other hand, they may turn out to be just as useless as chess for testing rationality if success strategies are posted and shared. Maybe some of the sims could have randomly assigned (Kirk resistant) Kobayashi Maru modes, but then I don't see how a predetermined loss would be very instruct

Give them a motivation that is higher than the drive to game the test. I'm an immortalist. I don't want to die. I could deceive myself and others in many ways about my skills, purposes, beliefs, but in the end I can't do that at the expense of my chances of not dying. Finding a similarly important purpose, something that might even be gamed, but for which gaming means you loose. Some real life test.

Maybe, measuring someone's capability to win. I have often wondered if being rational correlates with being succesful in society. I can't be sure, though it see... (read more)

[-]Roko40

Send rationalists to do consulting work where real money is involved, for example techdirt:

http://www.techdirt.com/

The Techdirt group blog uses a proven economic framework to analyze and offer insight into news stories about changes in government policy, technology and legal issues that affect companies’ ability to innovate and grow.

Here you basically get paid for good insights. A "team" of rationalists could be sent in to dominate this particular arena, thereby validating the technique. Basically any online arena where real money can be made is fair game. Trading in Second Life, for example.

4Johnicholas
The feature of "profitable in the real world" is very valuable. Keeps the test calibrated to what we're interested in measuring. Real-money, real-world prediction markets also have this feature; I wonder what other examples exist.