PredictionBook itself has a bunch more than three participants and functions as an always-running contest for calibration, although it's easy to cheat since it's possible to make and resolve whatever predictions you want. I also participate in GJ Open, which has an eternally ongoing prediction contest. So there's stuff out there where people who want to compete on running score can do so.
The objective of the contest was less to bring such an opportunity into existence as to see if it'd incentivise some people who had been "meaning" to practice prediction-making and not gotten to it yet to do so on on one of the platforms, by offering a kind of "reason to get around to it now"; the answer was no, though.
I don't participate much on Metaculus because for my actual, non-contest prediction-making practice, I tend to favour predictions that resolve within about six weeks, because the longer the time between prediction and resolution, the slower the iteration process on improving calibration; if I predict on 100 things that happen in four years, it takes four years for me to learn if I'm over or under confident at the 90% or so mark, and then another four years for me to learn if my reaction to that was an over or under reaction. Metaculus seems to favour predictions 2-4 or more years out, and requires sticking with private predictions to create your own short term ones in number, which is interesting for getting a crowd read on the future, but doesn't offer me so much of an opportunity to iterate and improve. It's a nice project, though.
It's not a novel algorithm type, just a learning project I did in the process of learning ML frameworks, a fairly simple LSTM + one dense layer, trained on the predictions + resolution of about 60% of the resolved predictions from PredictionBook as of September last year (which doesn't include any of the ones in the contest). The remaining resolved predictions were used for cross-validation or set aside as a test set. An even simpler RNN is only very slightly less good, though.
The details of how the algorithm works are thus somewhat opaque but from observing the way it reacts to input, it seems to lean on the average, weight later in sequence predictions more heavily (so order matters) and get more confident with number of predictions, while treating the propositions with only one probability assignment as probably being heavily overconfident. It seems to have more or less learnt that insight Tetlock pointed out on its own. Disagreement might also matter to it, not sure.
It's on GitHub at https://github.com/jbeshir/moonbird-predictor-keras; this doesn't include the data, which I downloaded using https://github.com/jbeshir/predictionbook-extractor. It's not particularly tidy though, and still includes a lot of unused functionality for input features- the words of the proposition, the time between a probability assignment and the due time, etc- which I didn't end up using because the dataset was too small for it to learn any signal in them.
I'm currently working on making the online frontend to the model automatically retrain the model at intervals using freshly resolved predictions, mostly for practice building a simple "online" ML system before I move on to trying to build things with more practical application.
The main reason I ran figures for it against the contest was that some of its individual confidences seemed strange to me, and while the cross-validation stuff was saying it was good, I was suspicious I was getting something wrong in the process.
I'm concerned that the described examples of holding individual comments to high epistemic standards don't seem to necessarily apply to top-level posts, or linked content- one reason I think this is bad is that it is hard to precisely critique something which is not in itself precise, or which contains metaphor, or which contains example-but-actually-pointing-at-a-class writing where the class can be construed in various different ways.
Critique of fuzzy intuitions and impressions and feelings often involves fuzzy intuitions and impressions and feelings, I think- and if this stuff is restricted in critique but not in top level content it makes top level content involving fuzzy intuitions and impressions and feelings hard to critique, despite I think being exactly the content which needs critiquing the most.
Strong comment standards seem like they would be good for a space (no strong opinion on whether LW should be that space), but it would probably want to also have high standards in top level posts, possibly review and feedback prior to publication, to keep them up to the same epistemic standards. Otherwise I think moderation argument over which interpretations of vague content were reasonable would dominate.
Additionally, strong disagree on "weaken the stigma around defensiveness" as an objective of moderation. One should post arguments because one believes they are valid, and clarify misunderstandings because they are wrong, not argue or post or moderate to try to save personal status. It may be desirable to post and act with the objective of making it easier to not be defensive, but we still want people in themselves to try to avoid taking it as a referendum on their person. In terms of fairness, I'm not sure how you'd judge it- it is valid for the part people have most concerns about to not be the part which is desired to be given the most attention, I think, in even formal peer review. It's also valid for most people to disagree with and have critiques of a piece of content. The top level post author (or the link post's author) doesn't have a right to "win"- it is permissible for the community to just not think a post's object level content is all that good. If there was to be a fairness standard that justified anything, it'd certainly want to be spelled out in more detail and checked by someone other than the person feeling they were treated unfairly.
It might be nice to have a set of twenty EA questions, a set of twenty ongoing-academic-research questions, a set of twenty general tech industry questions, a set of twenty world politics questions for the people who like them maybe, and run multiple contests at some point which refine predictive ability within a particular domain, yeah.
It'd be a tough time to source that many, and I feel that twenty is already about the minimum sample size I'd want to use, and for research questions it'd probably require some crowdsourcing of interesting upcoming experiments to predict on, but particularly if help turns out to be available it'd be worth considering if the smaller thing works.
The usefulness of a model of the particular area was something I considered in choosing between questions, but I had a hard time finding a set of good non-personal questions which had very high value to model. I tried to pick questions which in some way depended on interesting underlying questions-for example, the Tesla one hinges on your ability to predict the performance of a known-to-overpromise entrepreneur in a manner that's more precise than either maximum cynicism or full trust, and the ability to predict ongoing ramp-up of manufacturing of tech facing manufacturing difficulties, both of which I think have value.
World politics are I think the weakest section in that regard, and this is a big part of why rather than just taking twenty questions from the various sources of world politics predictions I had available, I looked for other questions, and made a bunch of my own EA-related ones by going through EA org posts looking for uncertain pieces of the future, reducing the world politics questions down to only a little over a third of the set.
That said, I think the world politics do have transferability in calibration if not precision (you can learn to be accurate on topics you don't have a precise model for by having a good grasp of how confident you should be), and the general skill of skimming a topic, arriving at impressions about it, and knowing how much to trust those impressions. I think there are general skills of rationality being practiced here, beyond gaining specific models.
And I think while it is the weakest section it does have some value- there's utility in having a reasonable grasp of the behaviour and in particular the speed of change under various circumstances in governments- the way governments behave and react in the future will set the regulatory environment for future technological development, and the way they behave in geopolitics affects risk from political instability, both as a civilisation risk in itself and as something that could require mitigation in other work. There was an ongoing line of questioning about how good it is, exactly, to have a massive chunk of AGI safety orgs in one coastal American city (in particular during the worst of the North Korea stuff), and a good model for that is useful for deciding whether it's worth trying to fund the creation and expansion and focusing of orgs elsewhere as a "backup", for example, which is a decision that can be taken individually on the basis of a good grasp of how concerned you should be, exactly, about particular geopolitical issues.
These world politics questions are probably not perfectly optimised for that (I had to avoid anything on NK in particular due to the current rate of change), and it'd be nice to find better ones, and maybe more other useful questions and shrink the section further next year. I think they probably have some value to practice predicting on, though.
I need to take a good look over what GJO has to offer here- I'm not sure if running a challenge for score on it would meet the goals here well (in particular I think it needs to be bounded in amount of prediction it requires in order to motivate doing it, and yet not gameable by just doing easy questions, and I'd like to be able to see what the probability assignments on specific questions were), but I've not looked at it closely with this in mind. I should at least hopefully be able to crib a few questions, or more.
Sounds good. I've looked over them and I could definitely use a fair few of those.
Thanks for letting me know! I've sent them a PM, and hopefully they'll get back to me once they're free.
On the positive side, I think an experiment in a more centrally managed model makes sense, and group activity that has become integrated into routine is an incredibly good commitment device for getting the activity done- the kind of social technology used in workplaces everywhere that people struggle to apply to their other projects and self-improvement efforts. Collaborative self-improvement is good; it was a big part of what I was interested in for the Accelerator Project before that became defunct.
On the skulls side, though, I think the big risk factor that comes to mind for me for any authoritarian project wasn't addressed directly. You've done a lot of review of failed projects, and succeeded projects, but I don't get an impression you've done much of a review of abusive projects. The big common element I've seen in abusive projects is that unreasonable demands were made that any sensible person should have 'defected' on- they were asked things or placed under demands which from the outside and in retrospect staying in the group was in no way worth meeting- and people didn't defect. They stayed in the abusive situation.
A lot of abusive relationships involve people trading off their work performance and prospects, and their outside relationship prospects, in order to live up to commitments made within those relationships, when they should have walked. They concede arguments when they can't find a reason that will be accepted because the other person rejects everything they say, rather than deciding to defect on the personhood norm of use of reasons. I see people who have been in abusive relationships in the past anxiously worrying about how they will find a way to justify themselves in circumstances where I would have been willing to bite the bullet and say "No, I'm afraid not, I have reasons but I can't really talk about them.", because the option of simply putting their foot down without reasons- a costly last resort but an option- is mentally unavailable to them.
What I draw from the case studies of abusive situations I've encountered, is that humans have false negatives as well as false positives about 'defection'; that is, people maintain commitments when they should have defected as well as defecting when they should have maintained commitments. Some of us are more prone to the former, and others are more prone to the latter. The people prone to the former are often impressively bad at boundaries, at knowing when to say no, at making a continually updated cost/benefit analysis to their continued presence in an environment, at protecting themselves. Making self-protection a mantra indicates that you've kind of seen a part of it, but the overall model being "humans defect on commitments too much" rather than "humans are lousy at knowing when to commit and when not to" seems like it will miss consideration of what various ideas will do with false negatives often.
The rationalist community as a whole probably is mostly people with relatively few false negatives and mostly false positives. Most of us know when to walk and are independent enough to be keeping an eye on the door when things get worrying, and have no trouble saying "you seem to be under the mistaken impression I need to give you a reason" if people try to reject our reasons. So I can understand failures the other way not being the most salient thing. But the rationalist community as a whole is mostly people who won't be part of this project.
When you select out the minority who are interested in this project, I think you will get a considerably higher rate of people who fail in the direction of backing down if they can't find a reason that (they think) others will accept, in the direction of not having good boundaries, and more generally in the direction of not 'defecting' enough to protect themselves. And I've met enough of them in rationalist-adjacent spaces that I know they're nearby, they're smart, they're helpful, some are reliable, and they're kind of vulnerable.
I think as leader you need to do more than say "protect yourself". I think you need to expect that some people you are leading will /not/ say no when they should, and you won't successfully filter all of them out before starting no more than you'll filter all people who will fail in any other way out before starting. And you need to take responsibility for protecting them, rather than delegating it exclusively for them to handle. To be a bit rough, "protect yourself" seems like trying to avoid part of the leadership role that isn't actually optional: that if you fail in the wrong way you will hurt people, and you as leader are responsible for not failing in that way, and 95% isn't good enough. The drill instructor persona does not come off as the sort of person who would do that- with the unidirectional emphasis on committing more- and I think that is part of why people who don't know you personally find it kinda alarming in this context.
(The military, of course, from which the stereotype originates, deals with this by simply not giving two shits about causing psychological harm, and is fine either severely hurting people to turn them into what it needs or severely hurting them before spitting them out if they are people who are harmed by what it does.)
On the somewhat more object level, the exit plan discussed seems wildly inadequate, and very likely to be a strong barrier against anyone who isn't one of our exceptional libertines leaving when they should. This isn't a normal house share, and it is significantly more important than a regular house share that people are not prevented from leaving by financial constraints or inability to find a replacement who's interested. The harsh terms typical of an SF house share are not suitable, I think.
The finding a replacement person part seems especially impractical, given most people trend towards an average of their friends and so if their friends on one side are DA people, and they're unsuited to DA, their other friends are probably even more unsuited to DA on average. I would strongly suggest taking only financial recompense on someone leaving for up to a limited number of months of rent if a replacement is not secured, and either permitting that recompense to be paid back at a later date after immediate departure, or requiring it as an upfront deposit, to guarantee safety of exit.
If there are financial costs involved with ensuring exit is readily available, there are enough people who think that this is valuable that it should be possible to secure capital for use in that scenario.
Assuming by "it" you refer to the decision theory work, that UFAI is a threat, Many Worlds Interpretation, things they actually have endorsed in some fashion, it would be fair enough to talk about how the administrators have posted those things and described them as conclusions of the content, but it should accurately convey that that was the extent of "pushing" them. Written from a neutral point of view with the beliefs accurately represented, informing people that the community's "leaders" have posted arguments for some unusual beliefs (which readers are entitled to judge as they wish) as part of the content would be perfectly reasonable.
It would also be reasonable to talk about the extent to which atheism is implicitly pushed in stronger fashion; theism is treated as assumed wrong in examples around the place, not constantly but to a much greater degree. I vaguely recall that the community has non-theists as a strong majority.
The problem is that this is simply not what the articles say. The articles imply strongly that the more unusual beliefs posted above are widely accepted- not that they are posted in the content but that they are believed by Less Wrong members, part of the identity of someone who is a Less Wrong user. This is simply wrong. And the difference is significant; it is incorrectly accusing all people interested in the works of a writer of being proponents of that writer's most unusual beliefs, discussed only in a small portion of their total writings. And this should be fixed so they convey an accurate impression.
The Scientology comparison is misleading in that Scientology attempts to use cult practices to achieve homogeneity of beliefs, whereas Less Wrong does not- the poll solidly demonstrates that homogeneity of beliefs is not a thing which is happening. A better analogy would be a community of fans of the works of a philosopher who wrote a lot of stuff and came to some outlandish conclusions in parts, but the fans don't largely believe that outlandish stuff. Yeah, their outlandish stuff is worth discussing- but presenting it as the belief of the community is wrong even if the philosopher alleges it all fits together. Having an accurate belief here matters, because it has greatly different consequences. There are major practical differences in how useful you'd expect the rest of the content to be, and how you'd perceive members of the community.
At present, much of the articles are written as "smear pieces" against Less Wrong's community. As a clear and egregious example, it alleges they are "libertarian", for example, clearly a shot at LW given RW's readerbase, when surveys tell us that the most common political affiliation is "liberalism", and while "libertarianism" is second, "socialism" is third. It does this while citing one of the surveys in the article itself. Many of the problems here are not subtle.
If by "it" you meant the evil AI from the future thing, it most certainly is not "the belief pushed by the organization running this place"; any reasonable definition of "pushing" something would have to meancommunicating it to people and attempting to convince them of it, and if anything they're credibly trying to stop people from learning about it. There are no secret "higher levels" of Less Wrong content only shown to the "prepared", no private venues conveying it to members as they become ready, so we can be fairly certain given publicly visible evidence that they aren't communicating it or endorsing it as a belief to even 'selected' members.
It doesn't obviously follow from anything posted on Less Wrong, it requires putting a whole bunch of parts together and assuming it is true.