Introduction

My goal is to register and share my expectations and hear others' opinions on their expectation for the relative performances of Gemini VS GPT-4.

My expectations

GPT-4 to Gemini will likely not be as big a jump in capabilities as GPT-3 to GPT-4 was. 

Gemini could bring surprises by being more agentic than GPT-4. Being better at planning and longer horizon tasks. But this is likely difficult to achieve, or strong LLM agents would already be making the buzz.

Comparison

From GPT-3 to GPT-4

  • Scaling Factor: x100 more compute than GPT-3.
  • Optimization: Chinchilla scaling laws (for MoE) over OpenAI/Kaplan scaling laws.
  • MoE Over Dense: Utilizes Mixture of Experts (MoE) instead of dense layers.
  • Data Quality: Likely higher-quality data, not sure.
  • Image Generation: Not publicly released, possibly due to subpar performance or security risks.
  • Tools are added during finetuning.
  • Algorithmic Gains: 3 years between GPT-3 and GPT-4.
  • GPT-4 may already employ process-based feedback.
  • GPT-4 aimed for training compute efficiency. GPT-4 was not designed to be commercially deployed at scale.

GPT-4 to Gemini

  • Scaling Factor: ~x5 (x20) more compute than GPT-4.
  • Supercomputer Constraint: No existing supercomputer could feasibly provide x100 more compute than used for GPT-4. (Not sure but likely)
  • Multimodal: maybe image, audio, speech.
  • Data Efficiency: Possibly better quality data like Google Books, fewer epochs.
  • Tools could be added either during finetuning or pretraining.
  • Algorithmic Gains: ~1 year between GPT-4 and Gemini.
  • Gemini more likely aims for inference efficiency, given its intended extensive usage by Google. Maybe sacrificing training efficiency.
  • Gemini trained to be more agentic, better at planning, etc. ("GPT-4 + AlphaGo").



Note: I drafted that before news of Gemini's release and capabilities but failed to finish writing... Since then, there have been some reports of Gemini being roughly at the level of GPT-4...

New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 6:14 PM

My guess is that it will be a scaled-up Gato - https://www.lesswrong.com/posts/7kBah8YQXfx6yfpuT/what-will-the-scaled-up-gato-look-like-updated-with. I think there might be some interesting features when the models are fully multi-modal - e.g. being able to play games, perform simple actions on a computer etc. Based on the announcement from google I would expect full multimodal training - image, audio, video, text in/out. Based on deepmind's hiring needs I would expect they want it to also generate audio/video and extend the model to robotics (the brain of something similar to a Tesla Bot) in the near future. Elon claims that training just from video input/output can result in full self-driving, so I'm very curious what training on youtube videos can achieve.  If they've managed to make a solid progress in long-term planning/reasoning and can deploy the model with a sufficiently small latency it might be a quite significant release, that could simplify many office jobs.

[-]p.b.7mo31

My current assumption is that extracting "intelligence" from images and even more so from videos is much less efficient than from text. Text is just extremely information dense. 

So I wouldn't expect Gemini to initially feel more intelligent than GPT4 even if it used 5 times the compute.

I mostly wonder about qualitative differences maybe induced by algorithmic improvements like actually using RL or search components for a kind of self-supervised finetuning, that's one area where I can easily see Deepmind outcompeting OpenAI. 

GPT-4 was not designed to be commercially deployed at scale.

What makes you say that?

This comes from OpenAI saying they didn't expect ChatGPT to be a big commercial success. It was not a top-priority project. 

[-]gwern7mo151

ChatGPT was not GPT-4. It was a relatively minor fixup of GPT-3, GPT-3.5, with an improved RLHF variant, that they released while working on GPT-4's evaluations & productizing, which was supposed to be the big commercial success.