Claude 4 feels pretty weak compared to what I’d think Claude 4 would have been a year away. It makes little progress on most benchmarks with a lot of tricks in them to exaggerate performance. Gemini 2.5 pro feels a bit stronger but not that much stronger. (It feels stronger since they didn’t call it Gemini 3, not because it’s particularly stronger than Claude)
Current methods have definitely hit a wall but AGI simultaneously feels pretty close. Strange timeline to be in. I predict progress will be a jump after the next breakthrough.
https://x.com/rwang07/status/1924658336600854632
Other countries adopting Chinese hardware may mean this was basically the US being forced to sell their GPUs to prevent the Chinese from taking advantage of economies of scale.
Have you heard the idea where you just train the model on a range of constants if your constants are off from the physical world? If the coefficient of friction changed a bit in the real world, I doubt humans would suddenly forget how to move, and instead would adjust pretty quickly. Making a model tolerant to the plausible range of sim2real errors might be possible without having an accurate simulation or hand-crafted heuristics.
Yeah but aren’t false positives also a problem here?
Do they? I thought they do well on the easier section
I'd argue OpenAI weaning off Microsoft is a sign of strength. They no longer need give up immense future profits to a big tech company to back them, they have shown the ability to raise again and again. They just had the largest funding round in history. They are also probably the fastest growing company revenue wise in human history. Doom and gloom seems a bit premature.
I wonder if giving it an example of the intended translated writing style helps.
Not just central banks but the U.S. going off the gold standard too then fiddling with bond yields to cover up ensuing inflation maybe?
Vibe check: Metaculus's track record on resolved AI questions seems worse than you would expect. I haven't calculated any real scores, but there are many predictions that have gotten 50%+ for a while that resolve the other way. I mean naturally as predictions get closer to resolution without happening, their odds should go down, but guts tell me it still seems quite bad.
It's not clear ultimately which direction it's in. Forecasters seem to overestimate how much US politicians will care about AI and contest programming capabilities but simultaneously underestimate how much revenue will be generated by AI and MATH scores.