XAI releases Grok base model

Jacob G-W

XAI releases Grok base model

by Jacob G-W

1 min read18th Mar 20243 comments

7

Language ModelsAI

Personal Blog

This is a linkpost for https://x.ai/blog/grok-os

We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.
This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.
We are releasing the weights and the architecture under the Apache 2.0 license.
To get started with using the model, follow the instructions at github.com/xai-org/grok.
Model Details
Base model trained on a large amount of text data, not fine-tuned for any particular task.
314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.

This is one of the biggest open source model releases I've seen, and it's also one of the only ones I've seen that releases the base model right after pretraining. This is pretty wild stuff!

XAI releases Grok base model

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:04 AM

[-]O O1mo56

Much larger than I expected for its performance

[-]Vladimir_Nesov1mo20

This way it's probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would've been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn't meant to last.

[-]Shankar Sivarajan1mo11

How expensive is the finetuning step relative to the pretraining (in terms of compute, data, labor, or anything else)?

I gather it'd be ~$1000 to "uncensor" a finetuned model, but as mentioned, this might be the first significant model released before finetuning, so I have no intuition for this. Two orders of magnitude more? Three?

Moderation Log