24

After reading the Arbital postmortem, I remembered some old ideas regarding a tool for claim and prediction aggregation.

First, the tool would have the basic features. There would be a list of claims. Each claim is a clear and concise statements that could be true or false, perhaps with a short explanation. For each claim, the users could vote on its likelihood. All these votes would be aggregated into a single number for each claim.

Second, the tool would allow the creation of composite claims by combining two existing claims. In particular, a conditional claim IF B THEN A would represent the conditional probability P(A|B). For every claim, it should be easy to find the conditionals it participates in, or the claims it is composed of. Conditionals are voted on same as simple claims (I would even consider a version where only conditionals are voted on).

Third, the tool would understand the basic probability laws and use this to direct the users' attention. For example, if three claims don't satisfy the law P(A|B) P(B) < P(A), users might be alerted about this error. On the other hand, if P(A|B) = P(B|A) = 1, the two claims might be merged or one could be discarded, to reduce the clutter.

Fourth, given a claim the tool might collect every other claim that supports it, follow every chain of argument and assemble them into a single graph, or even a semi-readable text, with the strongest arguments and counterarguments most visible.

Let's consider a possible workflow. Suppose you browse the list of claims, and find a ridiculous claim X assigned a high likelihood. You could just vote to decrease the likelihood and perhaps leave an offensive comment, however this is unlikely to have much effect. Instead you could find a convincing counterargument Y, then add both P(Y) = 1 and P(X|Y) = 0 to the list of claims. Now other users would be notified of the resulting inconsistency and would reply by voting on one of these claims, changing their vote on X, or by creating additional arguments that contradict Y or support X. In turn you would attack these new arguments, eventually creating a large graph of reasoning. Perhaps at some point there would be enough general claims that people from different debates could reuse some, instead of duplicating them, and only create new conditional claims, making the graph dense.

I took some time to write a small prototype. It is initialized with some AI related claims, and it implements first, second, and a tiny bit of third paragraph. Now for some meta. The prototype is a single page app, using angular1, backed by nodejs and mongodb. The actual features took between 1 and 2 hours to write. The backend and deployment took a couple more hours, largely because I hadn't done that in a while. Therefore I think that it's quite feasible to make similar prototypes for other ideas. Is there any value in it though?

New Comment

I was expecting to write about how I disagreed with this approach... but I actually found the prototype surprisingly delightful. Specifically, voting on a claim, and then seeing how it interacted with the related claims and prompting to further click, was a neat experience. (I wouldn't turn the related claims red, which I associated with DANGER, but turning them bluish or greenish would work fine)

[edit: I'd misinterpreted what the red claims meant, I'd thought it meant "this is generally related", and it looks like depending on how you vote on a thing, the related claims turn either green or red, which I haven't fully parsed yet but makes more sense than my initial interpretation]

My underlying disagreement with this sort of approach is that:

a) the internet is littered with the corpses of things somewhat similar to this, which suggests there's something hard about it

b) I think part of the issue is that people don't naturally think and engage with ideas via discrete claims, they're more likely to engage via conversations, or essays. (This is a not-very-well-supported version of a claim I expect a friend of mine to write a more nuanced essay about soon)

c) relatedly, just seeing the initial claims listed out, even with short descriptions, sort of misses a lot of the context. I think an app like this is a useful tool for people who have already communicated a bunch (or thought about things a lot, if working solo), and are using the tool in realtime to flesh out their confusions and disagreements.

I definitely second all three of these points. That having been said, aside from (a) (which is for sure an important outside-view thing to keep in mind, but also doesn’t directly inform the design and development of such a concept, except insofar as it strengthens the suggestion to do a “lit survey” of prior projects like this), these points seem like they don’t so much suggest not doing the thing, but rather suggest doing it differently, or adding certain features or design elements.

Specifically:

1. There are ways to place such claims in context (simple hyperlinking goes a long way, and excerpting/transclusion of contextual information goes even further). Could this sufficiently “conversationalize” bare claims, without destroying their atomic nature (which lets them be composed in the way this tool does)? Maybe not, maybe yes. It seems worth trying out.

2. If the app turns out to be useless for the general public but useful for existing groups or communities, then that’s certainly a failure of the design intent, but possibly still a success more broadly speaking.

Perhaps these considerations make the concept not worth developing; that seems possible, but not necessarily so.

Well then, the prototype did have some usefulness.

c) Yes, a shared context and precise definitions are definitely needed. Even then, I worry that there exist important debates that for some fundamental reason would not fit the format. Though I don't have examples in mind.

b) Just think of it as rationalist twitter (I'm joking). Alternatively, perhaps there could be some additional tool that would help extract a graph of reasoning from text and conversation.

a) Do you have examples?

Great work, thanks a lot for doing this!

I had a lot of fun translating one of the contradictions into a Dutch book, until I realized that's pointless, because any two users disagreeing about any probability is already enough for a Dutch book. The next step could be a prediction market where people can Dutch book each other for real, but I'm not sure these work well with distant future events and play money.

I was surprised that such a simple system can effectively support arguments; for some reason I expected argument mapping to be a complication for this sort of thing, but of course making conditional claims is enough to represent the links between claims making up an argument. On the face of it it's missing other logical operations like AND, OR, and FORALL, but if you need those you can just make new claims representing those, and conditional claims connecting things up in the right way (though of course this becomes cumbersome at some point).

The belief aggregation here is very simplistic, but it scales fairly easily to more complicated systems like bayesian truth serum.

Moved to frontpage.

I like this idea, and am particularly happy about you building a small prototype to test it. I mostly share Ray's scepticism, and also expect that conditional probabilities that condition on more than three independent events will be extremely hard to vote on, and will generally be quite badly calibrated. I remember Eliezer having a post on the difficulty of explicitly assessing probabilities for long chains of conditionals (it's on FB, so who knows whether that link is going to work): https://www.facebook.com/yudkowsky/posts/10154036150109228

I understand the concern, but I'm hoping that as claims about the future are explicit and distinguished from conditional claims, this might not be a problem. That is, if the user has already set P(Trump nominated) = 0.01 and P(Trump president) = 0.009, they will be satisfied with having rejected the claims, and will be able to consider that P(Trump is president | Trump was nominated) = 0.8, in isolation. Also, the conditional P(Trump was nominated | Trump is president) is obviously almost 1, and that should prevent anyone from setting P(Trump is president | Trump was nominated) too low. Also, P(Trump nominated) and P(Trump president) should always have reasonable values on prediction markets, which would set some reasonable bounds on the conditionals.

More generally, I suspect that the Multiple-Stage Fallacy comes from confusing event probabilities with conditional probabilities, and ultimately, I believe that all problems and confusions can be solved with more rigour.

Feedback on the prototype:

Bugs

• One can vote multiple times just by clicking multiple times.
• One can use the same claim as condition and event, and then weird things happen with the bounds and posterior probability

Confusing aspects

• I voted on a claim and it made a claim in the “related claims” list go red. I clicked on the red claim, and… apparently… something with the bounds? This badly needs explanatory text!

Obvious enhancements

• REDACTED. (I was about to start writing this section, but this is just a prototype and you have, no doubt, thought of everything I was going to suggest. But if you do decide to go ahead with developing this—which I encourage!—then I have many thoughts on design, features, etc.)
I voted on a claim and it made a claim in the “related claims” list go red. I clicked on the red claim, and… apparently… something with the bounds? This badly needs explanatory text!

Bounds on conditional probabilities are checked and if violated, the claim turns red. There are two inequalities checked. First which translates into . And second which translates into . Currently the first rule is violated, but only because in the code I wrote "" instead of "".

The bounds suggest how to correct the conditional probability to solve the incoherence. The condition and the event can also be voted on to solve it, but it's hard to figure out which way they should be changed. On the other hand, maybe there should be no suggestions, and you should sit down and honestly figure out which of the three probabilities is wrong.

I really like this idea, and thing that it’s worth developing further. Two thoughts:

1. A smooth workflow and effective visual presentation of the information—i.e., good UX design—are critical to such a tool’s success.

2. It would be very useful to do some searching to see if similar tools have been made in the past, to examine them for ideas / lessons. Does anyone know of such?

Seeing you here, I started wondering what our utility discussion would look like in this format. The claim that continuity can be violated would fit fine, but the full counterargument would be harder to decompose. And the whole other part we never agreed on is even harder (on the other hand, it would be a good thing if demonic threads were impossible to have in the tool).

I agree that the utility discussion would make an excellent (which is to say, tough, and thereby informative) test case for this sort of format/tool.

If it included sufficiently flexible views on the relationship data between claims, and certain rudimentary annotation/linking features, it might work very well to maintain a persistent, and easy-to-reference, model of the existing disagreement and the state of the argument, quite apart from the probability calculation features per se. (Of course, now we’re veering into “mind-mapping software” territory, but I do think there’s some overlap there, and, incidentally, that that category of tools would be useful to survey and mine for ideas/lessons.)

The problem I see is that I don’t know of a good way to assign probability estimates to claims like that. (However, this problem—in its strong form—is one of the deepest conceptual disagreements that I have with Less Wrong and LW-adjacent thinking, so I don’t expect this to be quick or easy to resolve. There is also a weak form of the problem that concerns how to assign such probability estimates in practice; I don’t have an answer to that one, either.)

quite apart from the probability calculation features per se.

You're right that the tool as described suggests that is should be used for predictions only, however I think the same tool should work for other kinds of claims. For math, I'd conflate P(A|B)=1 with the logical B->A. If you don't have a full mathematical proof, then the implications you used should be weighted from 0 to 1. Whether it makes sense to apply probability laws to these weights is another question.

Regarding the utility discussion, one problem is that it had many branches (which one should we start with?). And the more demonic branch basically started with the claim that having explicit utility functions is useful for decision making. Is "useful" a bit too vague to be used in a claim? Perhaps it would be fine, to have "X is useful" if it is implied by "X can be used for Y", but then I never gave such specific examples.

[-]Elo20

>I took some time to write a small prototype

Start the post with this line!

This post impressed me, I think, because it didn't start out saying there was a prototype. I may have stopped reading at "prototype" if that line came first. There's a huge "blech" factor for me at the idea of giving feedback on yet another partially created bit of software.

Reading the ideas about what the tool would (hopefully) do had me thinking, "this sounds cool, but is anyone ever going to actually make it?"

Then, near the end, the prototype is revealed. I thought, "Wow. Huh. I'm impressed they didn't just talk about an idea that never goes anywhere. They executed. And the prototype seems neat when I play with it."

Well, I'm conflicted. On one hand, you're totally right, but on the other, I'm not that confident that the prototype actually adds much value to the post, aside from clickbait. But then again, I wrote it in part for the engagement, so not using it is silly.

Oh well, let's take a step back and realize that this kind of inability to appeal to readers is exactly the true reason I propose the kind of system as described above.

By the way, did you open the link? Did it work? Feel free to actually vote on claims and create new ones.