What this means in practice is that the "entry-level" positions are practically impossible for "entry-level" people to enter.
This problem in and of itself is extremely important to solve!
The pipeline is currently: A University group or some set of intro-level EA/AIS reading materials gets a young person excited about AI safety. The call to action is always to pursue a career in AI safety and part of the reasoning is that there are very few people currently working on the problem (it is neglected!). Then, they try to help out but their applications keep getting rejected.
I believe we should:
Strong upvote.
I'm biased here because I'm mentioned in the post but I found this extremely useful in framing how we think about EM evals. Obviously this post doesn't present some kind of novel breakthrough or any flashy results, but it presents clarity, which is an equally important intellectual contribution.
A few things that are great about this post:
Random final thought: It's interesting to me that you got any measurable EM for a batch size of 32 on such a small model. My experience is that sometimes you need a very small batch size to get the most coherent and misaligned results and some papers use a batch size of 2 or 4 so I suspect they may have similar experiences. It would be interesting (not saying this is a good use of your time) to rerun everything with a batch size of 2 and see if this affects things.
Thank you! These are very interesting.
Does anyone have good examples of “anomalous” LessWrong comments?
That is, are there comments with +50 karma but -50 agree/disagree points? Likewise, are there examples with -25 karma but +25 agree/disagree points?
It is entirely natural that karma and agreement would be correlated but I would expect that comments which are especially out of distribution would be interesting to look at.
I like the original post and I like this one as well. I don't need convincing that x-risk from AI is a serious problem. I have believed this since my sophomore year of high school (which is now 6 years ago!).
However, I worry that readers are going to look at this post, the original and use the karma and the sentiment of the comments to update on how worried they should be about 2026. There is a strong selection effect for people who post, comment and upvote on LessWrong and there are plenty of people who have thought seriously about x-risk from AI and decided not to worry about it. They just don't use LessWrong much.
This is all to say that there is plenty of value of people writing about how they feel and having the community engage with these posts. I just don't think that anyone should take what they see in the posts or the comments as evidence that it would be more rational to feel less OK.
As someone who has been to this reading group several times, my take is that the quality of discussion was good/detailed enough that having wrestled with the reading before hand was a prerequisite to participating in a non-trivial way. From my perspective, the expectation was closer to "read what you can and its not a big deal if you can't read anything" but I wanted to be able to follow every part of the discussion so I started doing the readings by default.
The original comment says 10-25 not 10-15 but to respond directly to the concern: my original estimate here is for how long it would take to set everything up and get a sense of how robust the findings are for a certain paper. Writing everything up, communicating back and forth with original authors, and fact checking would admittedly take more time.
Also, excited to see the post! Would be interested in speaking with you further about this line of work.
Awesome! Thank you for this comment! I'm 95% UChicago Existential Risk Lab would fiscally sponsor if funding came from SFF or OpenPhil or some individual donor. This would probably be the fastest way to get this started quickly by a trustworthy organization (one piece of evidence of trustworthiness is OpenPhil consistently gives reasonably big grants to the UChicago Existential Risk Lab).
This is fantastic! Thank you so much for the interest.
Even if you do not end up supporting financially, I think it is hugely impactful for someone like you to endorse the idea so I'm extremely grateful, even for just the comment.
I'll make some kind of plan/proposal in the next 3-4 weeks and try to scout people who may want to be involved. After I have a more concrete idea of what this would look like, I'll contact you and others who may be interested to raise some small sum for a pilot (probably ~$50k).
Thank you again Daniel. This is so cool!
My colleagues and I are finding it difficult to replicate results from several well-received AI safety papers. Last week, I was working with a paper that has over 100 karma on LessWrong and discovered it is mostly false but gives nice-looking statistics only because of a very specific evaluation setup. Some other papers have even worse issues.
I know that this is a well-known problem that exists in other fields as well, but I can’t help but be extremely annoyed. The most frustrating part is that this problem should be solvable. If a junior-level person can spend 10-25 hours working with a paper and confirm how solid the results are, why don’t we fund people to actually just do that?
For ~200k a year, a small team of early career people could replicate/confirm the results of the healthy majority of important safety papers. I’m tempted to start an org/team to do this. Is there something I’m missing?
EDIT: I originally said "over 100 upvotes" but changed it to "over 100 karma." Thank you to @habryka for flagging that this was confusing.