This is a linkpost for ai-plans.com

This is the current idea of how the points system for AI-plans.com works. There are still problems to be worked out, which I'd very much like some input on.

An overview of how the points system works: 
Criticisms 
Each criticism has one metric- how many criticism points it has 
Users have the option to 'upvote' or 'downvote' a criticism 
Downvote points and upvote points have the same value, just in opposite directions- downvotes decrease criticism points and upvotes increase criticism points. 
Users do not have the option to vote on their own criticisms. 
Users can begin with an A*B number of points An xN number of downvote points will lower the criticizers karma by N point, an xN number of upvote points will raise the criticizer's karma by N points. 

Users can start off with an arbitrary amount of karma if they link their arxiv or alignmentforum or other such account, a moderator goes over it, checks if they are who they say they are and if they've done work in alignment and approves it. 

There will be a low limit on how many points such a user can start out with- because the skills for doing good AI research can be very different to actually doing alignment work and the skills for doing some alignment work may not correlate with the skill of making and judging good criticisms. Currently, I'm thinking 50-100 .

Users without prior research to show can gain karma from doing a small, timed alignment quiz to get 5 karma if they pass the quiz- this will be a one time thing for each user. 

The lowest a users karma can get is 0. A user's karma acts as a multiplier for their vote 

Plans 

Each plan has two metrics- it's rank number and the total number of criticism points it has. 

The total number of criticism points has is the sum of criticism points(total upvote points - total downvote points) each criticism of the plan has E.g. 

Plan X has 4 criticisms, one with sum 20 points, two with sum 12 points and one with sum 10 points- so it has a total 42 points. 

Suppose this makes Plan X the 10th ranked plan out of 100 plans. 

Plan X then has a coefficient m of 100/10, where when a criticism is upvoted it gets the upvote points + m

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 8:10 PM

I think the plan for critique voting and user karma is fine, but the way this is used to rank AI plans is highly incomplete.

The problem is that good critiques are not the same as severe problems with the plan. If someone writes an excellent critique of an alignment plan, but it's a critique of a minor flaw in an otherwise excellent plan, that critique will and should get upvotes; but the alignment plan it's attached to shouldn't effectively get that many downvotes.

Hi. As I understand the point system has two main outcomes: changes in Plan points, and changes in user karma. (Criticism points are a means to both.) 

Let me adress the proposed changes in user karma based on criticism up/downvote. The original suggestion is that "downvote points will lower the criticizers karma by N point, an xN number of upvote points will raise the criticizer's karma". In this case most criticism will end up increasing the user's karma, some more, some less, but the balance will be almost always towards upvote, becaus people are friendly, they are more ready to upvote than to downvote. The final result is that even if you are doing rather bad criticism (like around bottom 25%) you always get a little positive karma and if you are busy enough you end up as a superstar while in fact you are bad. So I would rather make sure that the bottom half of criticism gets an increasing potential for negative karma impact, by applying a weight on the upvote points starting from 1 for the median criticism, and progressing towards 0 for the worst criticism. (goodness can be measured as unweighted votes divided by number of votes.)

It needs to be defined when the user karma impact becomes effective. For example after 2 weeks the criticism was placed. Then it may be necessary to evaluate and amend the karma impact once more, maybe after 8 weeks.

The main issue with the system will be low level of user interaction in terms of criticising and voting. I have gone through this when I designed and operated a similar system. Some measures must be identified to overcome this. Here are three proposals:

I would add karma points just for the action of criticising and voting. 

For users in high karma range I would engourage to do much criticising and especially voting. For this reason I would apply a constant monthly karma reduction on them which can only be undone by sufficient karma collected through criticising and voting.

For the most active people (top 20% in terms of number of votes in the given month) for those in the higher karma range (top 50%) some additional appreciation is needed, like placing a list of heroes, or similar.

Thank you very much for this. 
I agree, it does seem like this way, people will end up getting a bunch of karma even for bad criticisms. Which would defeat the whole point of the points system.

I'm not sure I fully understand "So I would rather make sure that the bottom half of criticism gets an increasing potential for negative karma impact, by applying a weight on the upvote points starting from 1 for the median criticism, and progressing towards 0 for the worst criticism. (goodness can be measured as unweighted votes divided by number of votes.)"

I think there's a lot of merit in affecting karma points just for the action of criticism and voting. Perhaps 1 karma for every criticism that has net positive votes? And perhaps 1 karma for the first 5 votes, 25 votes, 125 votes etc? 

"For users in high karma range I would engourage to do much criticising and especially voting. For this reason I would apply a constant monthly karma reduction on them which can only be undone by sufficient karma collected through criticising and voting." This is really interesting - we've been talking about ideas such as -perhaps after some time karma dissolves or turns into a different form of karma that doesn't give weight - or create ways to spend karma(perhaps to give a criticism a stronger upvote/downvote?)
 


 

Let me explain this suggestion of mine: "So I would rather make sure that the bottom half of criticism gets an increasing potential for negative karma impact, by applying a weight on the upvote points starting from 1 for the median criticism, and progressing towards 0 for the worst criticism. (goodness can be measured as unweighted votes divided by number of votes.)"

I explain on an example. There are 800 criticisms arrived in Januar 2024, in total. all have their upvote/downvote based points (let us say as of 15 Feb), let me call these "raw points".  We put them in the order of increasing raw points. The worst let be -5, the 100th 5, the the 400th (the middle one) 25, the top one 110. Now a multiplier "m" is calculated for the bottom 400 criticisms, it will be 1-(400-x)/400 , where x is the rank of the criticism, so x=1 for the worst one, x=100 for the 100th one. 

Now, for example, the worst criticism had raw point -5, and this was calculated as a sum of upvote - downvote points (raw = up - down), let us assume total upvote points 10, total downvote 15, so -5 = 10-15. We now apply the multiplier : final points = m*up - down. In this example, final points = (1-399/400)*10 - 15 = -15.  So the final point will be approximately -15 because a heavy multiplier has decreased the value of the upvotes.