Ahh I see what you mean now, thank you for the clarification.
I agree that in general people trying to exploit and Goodhart LW karma would be bad, though I hope the experiment would not contribute to this. Here, post karma is only being used as a measure, not as a target. The mentors and mentees gain nothing beyond what any other person would normally gain by their research project resulting in a highly-upvoted LW post. Predicted future post karma is just being used optimise over research ideas, and the space of ideas itself is very small (in this experiment) and I doubt we'll get any serious Goodharting by selection of them that are perhaps not very good research but likely to produce particularly mimetic LW posts (and even then this is part of the motivation of having several metrics, so that none get too specifically optimised for).
There is perhaps an argument that those who have predicted a post would get high karma might want to manipulate it up to make their prediction come true, but those who predicted it would be lower have the opposite incentive. Regardless of that, that kind of manipulation is I think quite strictly prohibited by both LW and Manifold guidelines, and anyone caught doing it in a serious way would likely be severely reprimanded. In the worst case, if any of the metrics are seriously and obviously manipulated in a way that cannot be rectified, the relevant markets will be resolved N/A, though I think this happening is extremely low probability.
All that said, I think it is important to think about what more suitable / better metrics would be, if research futarchy was to become more common. I can certainly imagine a world where widespread use of LW post karma as a proxy for research success could have negative impacts on the LW ecosytem, though I hope by then there will have been more development and testing of robust measures beyond our starting point (which, for the record, I think is somewhat robust already).
Thank you for the suggestion!
Thanks for your engagement with the post. I'm not quite sure I understand what you're getting at? Please could you elaborate?
Thank you for partaking!
Your linked experiment looks very interesting, I will give it a read, thank you for the heads up.
Will you randomize (some of) your choices, as dynomight suggests?
We're not going to randomise choices as the symmetry of the sorts of actions being chosen combined with the fact that the market both makes the decision and mentors are trading on the markets (as suggested by Hanson) means we shouldn't suffer any of the weird oddities decision markets can theoretically occasionally suffer from.
I've ended up making another post somewhat to this effect, trying to predict any significant architectural shifts over the next year and a half: https://manifold.markets/Jasonb/significant-advancement-in-frontier
I made a manifold post for this for those who wish to bet on it: https://manifold.markets/JasonBrown/will-a-gpt4-level-efficient-hrm-bas?r=SmFzb25Ccm93bg
Thank you for providing a good introduction and arguments in favour of this research direction. Whilst I strongly agree with the idea of safety pre-training being valuable (and have even considered working on it myself with some collaborators), I think there are several core claims here that are false and that ultimately one should not consider alignment to be solved.
TL;DR I think safety pre-training is probably a huge boost to alignment, but our work is far from done and there are still lots of issues / uncertainties.
Thank you!
Your post was also very good and I agree with its points. I'll probably edit my post in the near future to reference it along with some of the other good references your post had that I wasn't aware of.
Yes, it's still unclear how to measure modification magnitude in general (or if that's even possible to do in a principled way) but for modifications which are limited to text, you could use the entropy of the text and to me that seems like a fairly reasonable and somewhat fundamental measure (according to information theory). Thank you for the references in your other comment, I'll make sure to give them a read!
UPDATE:
The projects have been chosen! They are:
These markets will be left locked until their individual metrics are resolvable, all other markets for the un-chosen projects will be resolved N/A.
Thank you to everyone who traded on these markets, and special thanks to those who provided feedback about the research projects and the futarchy experiment itself.