If Gemini 3.5 Flash is running on TPUv7, it could be a big model (multiple trillions of total params). They are suspiciously mostly talking about the big-pod configurations for this TPU (with up to 2048 chips from 9216-chip pods), even though 256-chip pods were also initially announced, so possibly most of the TPUv7 compute is in the form of big pods. Since Anthropic is due to get 1 GW of TPUv7 this year, Google will almost certainly get at least as much a bit earlier. And OpenAI and Anthropic were at 1-2 GW at the end of last year, meaning even 0.5 GW of TPUv7 compute is currently a lot.
So it's plausible there's already enough TPUv7 for Gemini 3.5 Flash. These chips need 25 ms for the 192 GB of HBM to go through the 7.4 TB/s of bandwidth, and multi-token prediction might give 3x faster decode on top of that. That's still only 120 tokens/second, not 200-300 tokens/second, but it might get there if more than half of HBM stays relatively unused. There is a significant improvement over TPUv6, which only had the weirdly small 32 GB of HBM per chip (at 1.6 TB/s, so 20 ms to read), with 256-chip scale-up configurations, which is 8 TB in total (Gemini 3 Pro deployments might be using more than one of these, hence the lower speed).
For TPUv7, if ICI latency is a sufficiently big problem when trying to keep decode close to what the bandwidth of half-full HBM allows, each layer might want to stay at a very small number of chips with very few hops between them, such as 4 chips (this is the kind of concern that TPU 8i should make irrelevant, with its scale-up topology being closer to all-to-all, but that's mostly next year). With 30 GB per chip on weights, even a single 4x4x4 cube (to avoid further between-cube latency) can host a 2T total param model (which might then have 500B active params, and $1.5 per 1M input tokens should more than cover that). There are 32-cube configurations for TPUv7, so 2T total params is not obviously all it could be, but then 200 tokens/second is a difficult target already, so maybe not.
So, if Gemini 3.5 Flash is perhaps an 3.1-pro-sized model (is that what we mean by 'big model'?), then might Gemini 3.5 Pro (scheduled for June and already being used internally at GDM) be a Mythos-sized model?
might Gemini 3.5 Pro (scheduled for June and already being used internally at GDM) be a Mythos-sized model?
Won't yet matter with 3.5 Pro, since 3.5 Flash demonstrates they still can't post-train (in contrast to how GPT-5.4 predicted Spud would be a success). Gemini 3.0 Pro might already be Mythos-sized (Anthropic didn't have better servers than Google to train Mythos-sized models on). The TPUv7 announcement in spring 2025 already suggested that a Gemini Pro/Ultra of late 2026 could be massive.
My guess is they woke up to the greater demand for big models only later in 2025, likely after 3.0 Pro was pretrained and Anthropic was already promised its 1 GW of TPUv7, and then Claude Code with Opus 4.5 was certainly sufficient to make it clear that big models are important. So it's possible Gemini 3.0 Pro was smaller than it could be (made to fit in one TPUv6 pod, together with all the KV-cache), because efficiency rather than quality was still too much on their mind (though it could have more active params than 3.5 Flash). But also, 3.5 Pro is mid-year, so the next biggest model yet might only happen for Gemini 4, and 3.5 Pro might remain 3.0 with better post-training (it could start running faster and cheaper on TPUv7). That a smaller 3.5 Flash was trained mid-year is a less significant deviation from the 1-year large pretraining run schedule.
That is a useful thing if implemented well, and indeed it is a thing I use (from OpenAI and Anthropic) more often than I use Google Search. But that thing is not Google Search.
Several hours ago I googled an uncommon steel grade (an alphanumeric designation with the word steel). In the late 2010s Google would have given me search results in milliseconds and at least one of the first two links would have had the specs I needed.
Today I got a page of garbage links which happened to have same number in different contexts, and then 30 seconds later after a lot of tool calls and inference the AI overview provided me the links I actually needed. And this is not an isolated occurrence, it happened earlier this week several times!
I know Google is not actually a web search company but this is not a sustainable way to run web search, and I sincerely hope that they revert to the old algorithms which used to work so well (BM25, tf-idf etc., maybe with a bit of vector search added)
Google once again has a model worth at least some consideration. Gemini 3.5 Flash is likely the best model out there at its particular speed point, as long as you don’t mind that it is a Gemini model. So for cases where speed kills, this can be a reasonable choice. Otherwise, I don’t see signs you would want to use it over Opus 4.7 or GPT-5.5.
Google also had some other offerings for I/O Day, which this post will also cover.
Introducing Google Gemini 3.5 ‘Flash’
Google introduced Gemini 3.5 Flash, which it seems is for now their universal model until 3.5 Pro comes along. It is live in the usual places. It is a hybrid, where it has the speed of Flash but the cost is at least halfway to models like Opus and GPT-5.5.
Gemini 3.5 Pro is confirmed for next month.
They are focused on 3.5 Flash as a daily driver for agentic tasks. It has the advantage of being faster and cheaper than Claude Opus 4.7 or GPT-5.5, if it can do the job. Not as cheap as previous Flash models, though, this is basically a hybrid:
As always, this is presented as Google’s strongest model yet for all the things.
Here is their benchmark presentation:
There are some big improvements here, including GDPval where Gemini previously struggled. If those scores were representative of what this baby can do, and it’s a Flash model, then that would be quite the accomplishment.
The knowledge cutoff is January 2025, continuing Gemini’s pattern of not believing what year it is, which is bizarrely obsolete and a serious problem for many use cases.
It is not a true ‘flash’ model, given it costs substantially more than 3 Flash.
Pliny is there with the standard jailbreak.
The biggest hope is that this fills a niche of ‘good enough for agent work while being faster and cheaper.’
Other People’s Benchmarks
A lot of benchmarks don’t have results, but of my usual suspects here is what we have.
The overall scores indicate only okay performance when adjusting for cost and price, and Gemini models tend to relatively overperform on benchmarks. One notices that Flash 3.5 does a lot worse on other people’s benchmarks than the ones Google lists.
It is catastrophically bad on You’re Absolutely Right, a sycophancy benchmark.
It did quite poorly on CursorBench.
It did not impress on WeirdML, only a small improvement on 3 Flash and far behind 3 Pro and 3.1 Pro.
It took the top spot on KnowsAboutBenBench, by the Ben in question.
It takes third place in Vals.ai on real world tasks.
It comes in at 9th in the Arena, slightly behind Gemini 3.1 Pro and 3 Pro.
It comes in at 55.3 on the AA Intelligence index, behind 57.2 for 3.1 Pro, 57.3 for Opus and 60.2 for GPT-5.5, while not being cheaper to run than 3.1 Pro on their test suite.
Reactions
Some people do like it.
Or find particular uses.
Alas, it is a Gemini model, and people are reporting Gemini things.
It also can have Google’s usual issues not being able to integrate with Google, such as using your subscription with your personal email, which renders all personalization features useless. You’ll need to use Claude or ChatGPT to get GMail access, sir.
This is a pretty big problem:
Another big problem with Antigravity in particular is that limits seem extremely low. This is one of many examples of people running into this issue.
If Google wants to compete with Claude Code and Codex, they need to offer a way in that lets people use it in volume before being convinced to subscribe.
They did triple the limits, which is an excellent start, but that won’t be enough.
Vie (of OpenAI) reports Flash 3.5 is lying to him a lot, suspects the harness is at fault.
Theo is extremely unhappy with Flash 3.5 and several other Google decisions. I’ve seen him post a lot and this is not his usual approach, so something is haywire here.
Google AI Search
Google is overhauling its search experience around an ‘intelligent search box’ that looks and feels a lot like a Gemini Flash 3.5 chatbot prompt.
That is a useful thing if implemented well, and indeed it is a thing I use (from OpenAI and Anthropic) more often than I use Google Search. But that thing is not Google Search.
The reason I use Google Search is primarily to link me to things, or sometimes as a spellchecker. If I want AI, I will ask an AI.
Google is also introducing ‘information agents’ as the AI version of Google Alerts.
Google Daily Brief
Daily Brief is their answer to OpenAI’s Pulse, except theirs will incorporate information from all your connected apps and be more of a to-do list, which can including GMail and Calendar.
The first part, ‘top of mind,’ seems like a plausibly useful way to make sure you don’t drop balls from your email or calendar.
It then ‘looks ahead’ and ‘suggests immediate next steps’ which I expect to be obnoxious and useless, and was in my quick experiment. I like that it links directly to the emails but doesn’t disrupt your usual process.
They say you can ‘steer Daily Brief with a quick thumbs up and down over time.’
Oh no. If this is to be any good you need to be able to give it instructions and explain why you find something useful or not useful, as you can with Pulse (which I still don’t bother using). Assume anything that uses thumbs up and down is AI slop.
If Google made this have better customization, and allowed you to sync it with various forms of Google alerts and other ways to monitor the wider world, they’d have something far more interesting.
Google I/O Day
What else did Google offer us?
Gemini Spark will be ‘a 24/7 personal AI agent to help you navigate everyday life’ using an Antigravity harness, and integrated with the rest of Google. Their example shown is adding things to Instacart.
It looks like they’re going to do things one app at a time via MCP connectors, and have a decent set of opening choices planned for the coming weeks?
Spark is coming to Ultra subscribers next week.
There is finally a Gemini app for macOS.
Neural Expressive is ‘a new design language for the AI era.’
I think that means Gemini now can switch easily between voice and text modes, and can use animations, ‘vibrant colors,’ new typography and for some reason haptic feedback. They think we don’t want text, we want some multimedia presentation.
Gemini Omni makes it easier to generate and edit videos within chat.
You can more easily ask longform questions of YouTube videos
Dean Ball was impressed by the mundane utility on offer, to the point of considering getting an Android phone. If you do get an Android for this reason, I recommend a Pixel, since they can get more and better Google AI features faster, and also I have one and it’s an excellent phone.