LESSWRONG
LW

Person
1122290
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Person's Shortform
2y
1
OpenAI Claims IMO Gold Medal
Person21h133

Don't have the link, but it seems DeepMind researchers on X have tacitly confirmed they had already reached gold. What we don't know is whether it was done with a general LLM like OAI or a narrower one.

Reply1
Vladimir_Nesov's Shortform
Person1mo10

Do you have specific predictions/intuitions regarding the feasibility of what you describe and how strong the feedback loop could be?

Your post being about technical AI R&D automation capabilities kind of immediately made me curious about the timelines, since they're where I'm somewhat worried.

Also, would Sakana AI's recent work on adaptative text-to-LORA systems count towards what you're describing^

Reply
Absolute Zero: Alpha Zero for LLM
Person2mo10

Thank you for the quick reply.

Reply
Absolute Zero: Alpha Zero for LLM
Person2mo10

That paper is being contradicted by this new NVIDIA paper that shows the opposate using a 1.5B distill of DeepSeek R1. I don't have much technical knowledge, so a deep dive by someone more knowledgeable would be appreciated, especially in comparison to the Tsinghua paper.

Reply
Cole Wyeth's Shortform
Person2mo10

Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI

But I do have quick thoughts as well;

Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek). 

It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information:

  • The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement.
  • They have not tried to distill all that data into a new model yet, which seems strange to me considering they've had it for a year now.
  • They say that a lot of improvements come from the base model's quality.
  • They do present the whole thing as part of research rather than a product

So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of the same classification.

This is also a very small thing to keep in mind, but GDM models don't often share the actual results of their models' work as usable/replicable papers, which has caused experts to cast some doubts on results in the past. It's hard to verify their results, since they'll be keeping them close to their chests.

Reply
AI 2027: What Superintelligence Looks Like
Person3mo40

Thanks for the clarification.

Side question, but you had recently moved your AGI median from 2027 to 2028 after updating on Grok 3 and GPT-4.5. Has this changed, especially with Gemini 2.5 and o3/o4-mini + these new METR datapoints?

Reply
Person's Shortform
Person2y10

Google DeemMind's recent  FunSearch system seems pretty important, I'd really appreciate people with domain knowledge to disect this:

Blog post: https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/Mathematical-discoveries-from-program-search-with-large-language-models.pdf

Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations) which can result in them making plausible but incorrect statements (Bang et al., 2023; Borji, 2023). This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best known results in important problems, pushing the boundary of existing LLM-based approaches (Lehman et al., 2022). Applying FunSearch to a central problem in extremal combinatorics — the cap set problem — we discover new constructions of large cap sets going beyond the best known ones, both in finite dimensional and asymptotic cases. This represents the first discoveries made for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve upon widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.

Reply
Google Gemini Announced
Person2y173

https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf

AlphaCode 2, which is powered by Gemini Pro, seems like a big deal. 

AlphaCode (Li et al., 2022) was the first AI system to perform at the level of the median competitor in competitive programming, a difficult reasoning task involving advanced maths, logic and computer science. This paper introduces AlphaCode 2, a new and enhanced system with massively improved performance, powered by Gemini (Gemini Team, Google, 2023). AlphaCode 2 relies on the combination of powerful language models and a bespoke search and reranking mechanism. When evaluated on the same platform as the original AlphaCode, we found that AlphaCode 2 solved 1.7× more problems, and performed better than 85% of competition participants.

Seems important for speeding up coders or even model self-improvement, unless competitive coding benchmarks are deceptive for actual applications for ML training.

Reply
Possible OpenAI's Q* breakthrough and DeepMind's AlphaGo-type systems plus LLMs
Person2y10

I also think the thing in question is not in fact an extremely important breakthrough that paves the path to imminent AGI anyway

Could you explain this assessment please? I am not knowledgeable at all on the subject, so I cannot intuit the validity of the breakthrough claim.

Reply
Sam Altman fired from OpenAI
Person2y150

I couldn't remember where from, but I know that Ilya Sutskever at least takes x-risk seriously. I remember him recently going public about how failing alignment would essentially mean doom. I think it was published as an article on a news site rather than an interview, which are what he usually does. Someone with a way better memory than me could find it.

EDIT: Nevermind, found them.

Reply
Load More
No wikitag contributions to display.
5Self-Adapting Language Models (from MIT, arXiv preprint)
1mo
1
2Person's Shortform
2y
1