LESSWRONG
LW

Language Models (LLMs)AI
Personal Blog

11

XAI releases Grok base model

by Jacob G-W
18th Mar 2024
1 min read
3

11

This is a linkpost for https://x.ai/blog/grok-os
Language Models (LLMs)AI
Personal Blog

11

XAI releases Grok base model
5O O
2Vladimir_Nesov
1Shankar Sivarajan
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 6:03 AM
[-]O O1y56

Much larger than I expected for its performance

Reply
[-]Vladimir_Nesov1y20

This way it's probably smarter given its compute and a more instructive exercise before scaling further than a smaller model would've been. Makes sense if the aim is to out-scale others more quickly instead of competing at smaller scale, and if this model wasn't meant to last.

Reply
[-]Shankar Sivarajan1y11

How expensive is the finetuning step relative to the pretraining (in terms of compute, data, labor, or anything else)?

I gather it'd be ~$1000 to "uncensor" a finetuned model, but as mentioned, this might be the first significant model released before finetuning, so I have no intuition for this. Two orders of magnitude more? Three? 

Reply
Moderation Log
More from Jacob G-W
View more
Curated and popular this week
3Comments

We are releasing the base model weights and network architecture of Grok-1, our large language model. Grok-1 is a 314 billion parameter Mixture-of-Experts model trained from scratch by xAI.

This is the raw base model checkpoint from the Grok-1 pre-training phase, which concluded in October 2023. This means that the model is not fine-tuned for any specific application, such as dialogue.

We are releasing the weights and the architecture under the Apache 2.0 license.

To get started with using the model, follow the instructions at github.com/xai-org/grok.

Model Details

  • Base model trained on a large amount of text data, not fine-tuned for any particular task.
  • 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
  • Trained from scratch by xAI using a custom training stack on top of JAX and Rust in October 2023.

This is one of the biggest open source model releases I've seen, and it's also one of the only ones I've seen that releases the base model right after pretraining. This is pretty wild stuff!