You query whether Debate favours short arguments over long arguments because "the weakest part of an argument is a min function of the number of argument steps."
This is a feature, not a bug.
It's true that long arguments are more debatable than short arguments because they contain more inferential steps and each step is debatable.
But additionally, long arguments are less likely to be sound than short arguments because they contain more inferential steps and each step is 100% certain.
So "debate" still tracks "unlikely".
"An efficient Debate argument is not a guarantee of truth-tracking, if the judge is biased."
Correct. The efficient market hypothesis for Debate basically corresponds to coherent probability assignments. Cf. the Garrabrant Inductor.
The precision-recall tradeoff definitely varies from one task to another. I split tasks into "precision-maxxing" (where false-positives are costlier than false-negatives) and "recall-maxxing" (where false-negatives are costlier than false-positives).
I disagree with your estimate of the relative costs in history and in medical research. The truth is that academia does surprisingly well at filtering out the good from the bad.
Suppose I select two medical papers at random — one from the set of good medical papers, and one from the set of crap medical papers. If a wizard offered to permanently delete both papers from reality, that would rarely be a good deal because the benefit of deleting the crap paper is negligible... (read more)
Moreover, even if the post shouldn't have been published with hindsight, that does not entail the post shouldn't have been published without hindsight.
You are correct that precision is (in general) higher than the threshold. So if Alice publishes anything with at least 10% likelihood of being good, then more than 10% of her poems will be good. Whereas, if Alice aims for a precision of 10% then her promising threshold will be less than 10%.
Unless I've made a typo somewhere (and please let me know if I have), I don't claim the optimal promising threshold τ⋆ = 10%. You can see in Graph 5 that I propose a promising threshold of 3.5%, which gives a precision of 10%.
I'll edit the article to dispel any confusion. I was wary of giving exact values for the promising threshold, because τ⋆=3.5 yields 10% precision only for these graphs, which are of course invented for illustrative purposes.
So 90% of sci-fi books are crap, and 90% of medical research, and 90% of jazz solos, and 90% of LessWrong posts.
Etc, etc — 90% of anything is crap.
Now, people usually interpret Sturgeon’s Law pessimistically.
“What a shame!” people think, “If only all the sci-fi books were great! And all the medical research, jazz solos, LessWrong posts, etc etc.”
I think these people are mistaken. 90% of anything should be crap.In this article, I will explain why.
Theodore Sturgeon (1918–1985)
The precision-recall tradeoff
Imagine one day you read a brilliant poem, and you decide to read the poet’s complete works. If you discover that every poem she ever wrote... (read 1539 more words →)
- You query whether Debate favours short arguments over long arguments because "the weakest part of an argument is a min function of the number of argument steps."
- "An efficient Debate argument is not a guarantee of truth-tracking, if the judge is biased."
... (read more)This is a feature, not a bug.
It's true that long arguments are more debatable than short arguments because they contain more inferential steps and each step is debatable.
But additionally, long arguments are less likely to be sound than short arguments because they contain more inferential steps and each step is 100% certain.
So "debate" still tracks "unlikely".
Correct. The efficient market hypothesis for Debate basically corresponds to coherent probability assignments. Cf. the Garrabrant Inductor.