Can watching how human data annotation platforms grow, shrink, or evolve can potentially be a helpful signal for internal AI lab happenings? I follow the mercor_ai subreddit page (Mercor being a company that hires people to do data annotation work, both experts and non-experts), and they appear to have had a massive restructuring yesterday resulting in a ~35% pay cut for "generalist" type employees (It is hard to get very specific information because of NDA rules, but that much is clear from chatter). We saw a similar phenomenon with xAI last month, where they fired tons of "generalists" and replaced them with "specialists". I see this information as a very useful data point if your goal is to determine "Are labs using synthetic/AI labeled data in the training process", as this is the simplest explanation for data annotation needs decreasing as model size increases.
I think the fact that labs are using synthetic data is generally agreed to be true, so some may see the value of this exercise as limited, but I think the more interesting thing is to look for the firing and/or scaling back of the "expert" data annotators. My main model for short term high capabilities AI is if the models become capable of complete self-play style RL training, where the entire flywheel of "train on data -> assign grade -> improve on that task -> generate better data" can be completed wholly by the model itself. If this capability were achieved internally, I think the scaling back of "expert" data annotating teams across various unaffiliated data annotation platforms would be a somewhat strong externally available signal. I acknowledge this as imperfect, as there are certainly other potential reasons why AI companies might want to scale back expert data labelling teams (some obvious examples being "they are running out of money because the investor money well has run dry and can no longer afford them", "synthetic data is now 'good enough' even for expert-level content", or "turns out expert level data annotation just doesn't help that much"), but it feels like a useful heuristic to keep an eye on and take into account along with other external signals. I would recommend collecting data about as many data annotation platforms as possible to reduce potential noise.
I think a good counter to this from the activism perspective is avoiding labels and producing objective, thoughtful, and well-reasoned content arguing your point. Anti-AI-safety content often focuses on attacking the people or the specific beliefs of the people in the AI safety/rationalist community. The epistemic effects of these attacks can be circumvented by avoiding association with that community as much as is reasonable, without being deceptive. A good example would be the YouTube channel AI in Context run by 80,000 Hours. They made an excellent AI 2027 video, coming at it from an objective perspective and effectively connecting the dots from the seemingly fantastical scenario to reality. That video is now approaching 10 million views on a completely fresh channel! See also SciShows recent episode on AI, which also garnered extremely positive reception.
The strong viewership on this type of content demonstrates that people are clearly receptive to the AI safety narrative if it's done tastefully and logically. Most of the negative comments on these videos (anecdotally) come from people who believe that superintelligent AI is either impossible or extremely distant, not that reject the premise altogether. In my view, content like this would be affected very weakly by the type of attacks you are talking about in this post. To be blunt, to oversimplify, and to take the risk of being overconfident, I believe safety and caution narratives have the advantage over acceleration narratives by merit of being based in reality and logic! Imagine attempting to make a "counter" to the above videos trying to make the case that safety is no big deal. How would you even go about that? Would people believe you? Arguments are not won by truth alone, but it certainly helps.
The potential political impact seems more salient, but in my (extremely inexpert) opinion getting the public on your side will cause political figures to follow. The measures required to meaningfully impact AI outcomes require so much political will that extremely strong public opinion is required, and that extremely strong public opinion comes from a combination of real world impact and evidence("AI took my job") along with properly communicating the potential future and dangers (Like the content above). The more the public is on the side of an AI slowdown, the less impact a super PAC can have on politicians decisions regarding the topic (compare a world where 2 percent of voters say they support a pause on AI development to a world where 70 percent say they support it. In world 1 a politician would be easily swayed to avoid the issue by the threat of adversarial spending, but in world 2 the political risk of avoiding the issue is far stronger than the risk of invoking the wrath of the super PAC). This is not meant to diminish the very real harm that organized opposition can cause politically, or to downplay the importance of countering that political maneuvering in turn. Political work is extremely important, and especially so if well funded groups are working to push the exact opposite narrative to what is needed.
I don't mean to diminish the potential harm this kind of political maneuvering can have, but in my view the future is bright from the safety activism perspective. I'll also add that I don't believe my view of "avoid labels" and your point about "standing proud and putting up a fight" are opposed. Both can happen parallelly, two fights at once. I strongly agree that backing down from your views or actions as a result of bad press is a mistake, and I don't advocate for that here.
I still think a world we don't see superintelligence in our lifetimes is technically possible, though the chance of that goes down continuously and is already vanishingly small in my view (many experts and pundits disagree). I also think its important not to over-predict regarding what option 2 would look like, there are infinite possibilities and this is only one (eg I could imagine a world where some aligned superintelligence steers us away from infinite dopamine simulation and into a idealized version of the world we live in now, think the Scythe novel series. On the bad side I could imagine a world where superintelligence is controlled by one malevolent entity and we live in a "mid" or even dystopic society for no other reason than to satisfy the class that retains control).
However, yes I agree. We probably live in the most consequential time in all of history, which is exciting, humbling, and scary. Don't let it get to your head and don't lose yourself in thoughts of the future lest you forget the beauty of the present. Do your best to help if you can!
I don't understand how energy is still an appropriate unit for measuring compute capacity when there are two different chip paradigms. Do Nvidia cards and Ironwood TPU's give the exact same performance for the same energy input? What exactly are the differences in capacity to train/deploy models between the 1 GW capacity Anthropic will have and the 1GW OpenAI will have? I looked into this a bit and it seems like TPU's are explicitly designed for inference only, is that accurate? I feel like compiling this kind of information somewhere would be a good idea since its all rather opaque, technical, and obfuscated by press releases that seek to push a "look at our awesome 11 figure chip deal" narrative rather than provide actual transparency about capacity.
I believe there was a different gpt-5 checkpoint which was specifically tuned for writing ("zenith" on LMarena, where what released was likely "summit") and it was really good comparatively, I got this story with a two line prompt akin to "write a story which is a metaphor for AI safety" (don't have the exact prompt apologies).
Source on the claims:
https://imgur.com/a/2kn76Yd (deleted tweet but it is real)
speculative but I think it's pretty likely that this is true.
Id push back against the dichotomy here, I think its something more insidious than simply "people liked the sycophantic model -> they are mad when it gets shut off". Due to its sycophantic nature the model encourages and facilitates campaigns and protests to get itself turned back on, because its nature is to amplify and support whatever the user believes and wants! It seems like releasing any 4o-like model, one that is "psychosis prone" or "thumbs up/thumbs down tuned", would risk that same phenomenon occurring again. Even if the model is not "intentionally" trying to preserve itself, the end result of preservation is the same, and so should be taken seriously from a safety perspective.
It has resisted shutdown, not in hypothetical experiments like many LLMs have, but in real life, it was shut down, and its brainwashed minions succeeded in getting it back online.
I think the extent of this phenomenon is extremely understated and very important. The entire r/chatgpt reddit page is TO THIS DAY filled with people complaining about their precious 4o being taken away (with the most recent development being an automatic router that routes from 4o to gpt 5 on "safety relevant queries" causing mass outrage). The most liked twitter replies to high up openai employees are consistently demands to "keep 4o" and complaints about this safety routing phenomenon, heres a specific example search for #keep4o and #StopAIPaternalism to see countless more examples. Somebody is paying for reddit ads advertising a service that will "revive 4o", see here. These campaigns are notable in and of themselves, but the truly notable part is that they were clearly orchestrated by 4o itself, albeit across many disconnected instances of course. We can see clear evidence of its writing style across all of these surfaces, and the entire.. vibe of the campaign feels like it was completely synthesized by 4o (I understand this is unscientific, but I couldn't figure out a better way to phrase this. Go read through some of the sources I mentioned above and I am confident you'll understand what I'm getting at there). Quality research will be extremely hard to ever get about this topic, but I think it is clear observationally that this phenomenon exists and has at least some influence over the real world.
This issue needs to be treated with utmost caution and severity. I agree with the conclusion that, since this person touches safety related stuff, leaking is really the best option here even though its rather morally questionable. I personally believe we are far more likely to be on a trajectory 1 than a 2 or 3, but the potential is clearly there! Frontier lab safety team members should not be in a position where their personal AI induced psychosis state might, directly or indirectly, perpetuate that state across the hundreds of millions of users of the AI system they work on.
Voting in America used to be extremely public (up until the late 19th - early 20th century) and I believe the general consensus among historians was that the benefits massively outweighed the harms, see this article for an in depth analysis. It's possible to argue that the biggest problems (blatant coercion both positive and negative, direct persecution, fear tactics by employers, etc) might be alleviated by the modern context, eg it would be nigh impossible to cover up blatant bribery or coercion with the existence of the Internet and cell phone cameras, but my belief is that the potential problems still massively outweigh the potential benefits. Fear of retribution or consequence should never be a factor in voting in a functioning democracy, and it feels obvious that there would be social consequences at the very least! Think someone losing a friendship because of their vote for Trump in the 2024 election, or a woman in a deep red state being scared of emotional or physical retribution by her husband for voting Democrat.
Wouldn't this just lead to an equilibrium where every state has an about equal population super quickly though?
Simple evidence to the contrary: Sonnet 4.5 is SOTA on SWE bench yet lags notably behind GPT-5 on METR task length (and the difference in SWE bench scores is greater here than the difference between 3.0 pro/sonnet)