This is a linkpost for https://x.ai/

Some highlights:

Grok is still a very early beta product – the best we could do with 2 months of training – so expect it to improve rapidly with each passing week with your help.

I find it very interesting that they managed to beat GPT-3.5 with only 2 months of training! This makes me think xAI might become a major player in AGI development.

By creating and improving Grok, we aim to:

  • Gather feedback and ensure we are building AI tools that maximally benefit all of humanity. We believe that it is important to design AI tools that are useful to people of all backgrounds and political views. We also want empower our users with our AI tools, subject to the law. Our goal with Grok is to explore and demonstrate this approach in public.

On these benchmarks, Grok-1 displayed strong results, surpassing all other models in its compute class, including ChatGPT-3.5 and Inflection-1. It is only surpassed by models that were trained with a significantly larger amount of training data and compute resources like GPT-4. This showcases the rapid progress we are making at xAI in training LLMs with exceptional efficiency.

 

We believe that AI holds immense potential for contributing significant scientific and economic value to society, so we will work towards developing reliable safeguards against catastrophic forms of malicious use. We believe in doing our utmost to ensure that AI remains a force for good.

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 4:34 PM

I'm not skeptical, but it's still a bit funny to me when people rely so much on benchmarks, after reading "Pretraining on the Test Set Is All You Need" https://arxiv.org/pdf/2309.08632.pdf

I find it very interesting that they managed to beat GPT-3.5 with only 2 months of training! This makes me think xAI might become a major player in AGI development.

Did they do it using substantially less compute as well or something? Because otherwise, I don't see what is that impressive about this. 

Money

[This comment is no longer endorsed by its author]Reply

Isn't that effectively the same thing as using substantially less compute?

I suppose being pithy backfired here. I meant that they may have spent lots of money and may have more to spend.

Right. Are you saying Grok may be impressive because of the sheer amount of resources being funnelled into it?