Thanks for this post! The two new thoughts for me are:
1. The RLVR era requires that you have to get the big pods tech. If you don't get them, doesn't matter how many GPUs you have - no chance at the frontier.
2. Infrastructure takes time. So unless we see 10B+ in investment this year, we shouldn't expect frontier capabilities from this lab next year.
Hi, I want your opinion on my little experiment. I made a short AI-generated podcast from Zvi's latest posts.
The idea is to get a 15 minute summary that you can listen to while walking or doing chores. Works for me, but I'm not sure how much nuance I'm missing. What do you guys think? I'd really appreciate the feedback.
Seems so
https://thezvi.substack.com/p/zuckerbergs-dystopian-ai-vision
"In some ways this is a microcosm of key parts of the alignment problem. I can see the problems Zuckerberg thinks he is solving, the value he thinks or claims he is providing. I can think of versions of these approaches that would indeed be ‘friendly’ to actual humans, and make their lives better, and which could actually get built.
Instead, on top of the commercial incentives, all the thinking feels alien. The optimization targets are subtly wrong. There is the assumption that the map corresponds to the territory, that people will know what is good for them so any ‘choices’ you convince them to make must be good for them, no matter how distorted you make the landscape, without worry about addiction to Skinner boxes or myopia or other forms of predation. That the collective social dynamics of adding AI into the mix in these ways won’t get twisted in ways that make everyone worse off.
And of course, there’s the continuing to model the future world as similar and ignoring the actual implications of the level of machine intelligence we should expect.
I do think there are ways to do AI therapists, AI ‘friends,’ AI curation of feeds and AI coordination of social worlds, and so on, that contribute to human flourishing, that would be great, and that could totally be done by Meta. I do not expect it to be at all similar to the one Meta actually builds."
I'm surprised to see no discussion here or on Substack.
This is a well-structured article with accurate citations, clearly explained reasoning, and a peer review.. that updates the best agi timeline model.
I'm really confused.
I haven't deeply checked the logic to say if the update is reasonable (that's exactly the kind of conversation I was expecting in the comments). But I agree that Davidson's model was previously the best estimate we had, and it's cool to see that this updated version exlains why Dario/Sama are so confident.
Overall, this is excellent work, and I'm genuinely puzzled as to why it has received 10x fewer upvotes than the recent fictional 2y takeover scenario.
I can confirm that this is a pretty much the best introduction to take you from 0 to about 80% in using AI.
It is intended for general users, don't expect technical information on how to use APIs or build apps.
TLDR my reaction is I don’t really know how good these models are right now.
I felt exactly the same after the Claude 3.7 post.
But actually.. hasn't LiveBench solved the evals crisis?
It is specifically targeted a “subjective” and “cheating/hacking” problems.
It also cover a pretty broad set of capabilities.
The number of different benchmarks and metrics we are using to understand each new model is crazy. I'm so confused. The exec summary helps, but...
I don't think the relative difference between models is big enough to justify switching from the one you're currently used to.
Does this mean that Zvi doesn't read the comments on LW?
He seems to be much more active on Substack.
So, the most important things I've learned for myself are:
1. Sam was fired because of his sneaky attempts to get rid of some board members.
2. Sam didn't answer the question of why so many high ranking ppl have left the company recently.
3. Sam missed the fact that for some people safety focus was a major decision factor in the early hiring.
There seems to be enough evidence that he doesn't care about safety.
And he actively uses dark methods to accumulate power.
I want to pitch my blog. I'm writing about tech and AI from a business perspective.
Think of it like Ben Thompson's Stratecherry. But with longer deep dives, a more conversational tone, and much less disregard for Safety.
My last piece was a second look at Grok 3, after they released the API.