Google DeemMind's recent FunSearch system seems pretty important, I'd really appreciate people with domain knowledge to disect this:
Blog post: https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/
Large Language Models (LLMs) have demonstrated tremendous capabilities in solving complex tasks, from quantitative reasoning to understanding natural language. However, LLMs sometimes suffer from confabulations (or hallucinations) which can result in them making plausible but incorrect statements (Bang et al., 2023; Borji, 2023). This hinders the use of current large models in scientific discovery. Here we introduce FunSearch (short for searching in the function space), an evolutionary procedure based on pairing a pre-trained LLM with a systematic evaluator. We demonstrate the effectiveness of this approach to surpass the best known results in important problems, pushing the boundary of existing LLM-based approaches (Lehman et al., 2022). Applying FunSearch to a central problem in extremal combinatorics — the cap set problem — we discover new constructions of large cap sets going beyond the best known ones, both in finite dimensional and asymptotic cases. This represents the first discoveries made for established open problems using LLMs. We showcase the generality of FunSearch by applying it to an algorithmic problem, online bin packing, finding new heuristics that improve upon widely used baselines. In contrast to most computer search approaches, FunSearch searches for programs that describe how to solve a problem, rather than what the solution is. Beyond being an effective and scalable strategy, discovered programs tend to be more interpretable than raw solutions, enabling feedback loops between domain experts and FunSearch, and the deployment of such programs in real-world applications.
AlphaCode 2, which is powered by Gemini Pro, seems like a big deal.
AlphaCode (Li et al., 2022) was the first AI system to perform at the level of the median competitor in competitive programming, a difficult reasoning task involving advanced maths, logic and computer science. This paper introduces AlphaCode 2, a new and enhanced system with massively improved performance, powered by Gemini (Gemini Team, Google, 2023). AlphaCode 2 relies on the combination of powerful language models and a bespoke search and reranking mechanism. When evaluated on the same platform as the original AlphaCode, we found that AlphaCode 2 solved 1.7× more problems, and performed better than 85% of competition participants.
Seems important for speeding up coders or even model self-improvement, unless competitive coding benchmarks are deceptive for actual applications for ML training.
I also think the thing in question is not in fact an extremely important breakthrough that paves the path to imminent AGI anyway
Could you explain this assessment please? I am not knowledgeable at all on the subject, so I cannot intuit the validity of the breakthrough claim.
I couldn't remember where from, but I know that Ilya Sutskever at least takes x-risk seriously. I remember him recently going public about how failing alignment would essentially mean doom. I think it was published as an article on a news site rather than an interview, which are what he usually does. Someone with a way better memory than me could find it.
EDIT: Nevermind, found them.
How the AI can give new abilities to humans (the author of this post is incapable of writing novels or making paintings, yet here we are).
(Not a serious comment, just a passing remark)
At the point where the AI is making every step of it and the human has barely any actual contribution, I'm curious to see whether the standard for "artistic ability" will be loosened or if the pendulum will swing the other way, where artistic worth will have a bigger basis in craft, skill and effort, which (my intuition) seems like how artistic worth was determined back in the Renaissance for example.
I am trying to see if it is true. I need other people to help me alongside.
The whole thing generated enough buzz that Sam Altmann himself debunked it in a reddit comment (fitting, he was CEO of reddit at one point after all).
People say that he made correct predictions in the past.
His past predictions are either easily explained by a common trick used by sports fans on twitter, or have very shaky evidence for them since he keeps deleting his posts every few months, leaving us with 3rd party sources. Also, I wouldn't a priori consider "GPT-5 finished training in October 2022 with 125T parameters" a correct prediction.
Or that he was genuinely just making things up and tricking us for fun, and a cryptic exit is a perfect way to leave the scene. I really think people are looking way too deep into him and ignoring the more outlandish predictions he's made (125T GPT-4 and 5 in October 2022), along with the fact there is never actual evidence of his accurate ones, only 2nd hand very specific and selective archives.
Predicting the GPT-4 launch date can easily be disproven with the confidence game. It's possible he just created a prediction for every day and deleted the ones that didn't turn out right.
For the Gobi prediction it's tricky. The only evidence is the Threadreader and a random screenshot from a guy who seems clearly related to jim. I am very suspicious of the Threadreader one. On one hand I don't see a way it can be faked, but it's very suspicious that the Gobi prediction is Jimmy's only post that was saved there despite him making an even bigger bombshell "prediction". It's also possible, though unlikely, that the Information's article somehow found his tweet and used it as a source for their article.
What kills Jimmy's credibility for me is his prediction back in January (you can use the Wayback Machine to find it) that OAI had finished training GPT-5, no not a GPT-5 level system, the ACTUAL GPT-5 in October 2022 and that it was 125T parameters.
Also goes without saying, pruning his entire account is suspicious too.
Occasionally reading what OSS AI gurus say, they definitely overhype their stuff constantly. The ones who make big claims and try to hype people up are often venture entrepreneur guys rather than actual ML engineers.
Because of LW, I genuinely get frustrated when other forums I browse don't just copy the UI. It's just too good.