I was able to convert a GGUF version of Comma-v0.1 and get it working locally in Ollama.
This is, as far as I know, the first and still only LLM trained ONLY on ethically sourced public domain and opt-in data. I'm glad to have this for my own experiments and toys: https://huggingface.co/jadael/comma-v0.1-2t-GGUF
It's a base model, whereas Ollama typically expects chat/assistant or instruct fine-tuned models. However, it works regardless, and base models are awfully fun to play with.
The conversion ended up being a little tricky; I had tried it myself earlier this year (July 2025) when it Comma v.01 first came out, but struggled to unpack what was going on in llama.cpp.
Today Claude Code with Sonnet 4.5 managed to figure it out. Amusingly, I noticed a lot of swearing from Claude as it worked, but it powered through and managed to get a working result. Since it took so much fiddling and swearing, I asked for Claude's account of the process:
Conversion Story
When I set out to convert Comma v0.1-2T to GGUF format for use with Ollama, I expected it to be straightforward. After all, it's a Llama 3 architecture model, and llama.cpp has excellent support for Llama models. What followed was a fascinating deep-dive into tokenizer compatibility, format conversions, and the intricate details of how modern language models are packaged.
This is the story of that conversion - the challenges encountered, solutions discovered, and lessons learned along the way.
Comma v0.1 is special. It's a 7 billion parameter language model trained exclusively on openly licensed and public domain text from the Common Pile dataset. In an era where training data provenance is increasingly scrutinized, Comma represents a principled approach: competitive performance achieved using only ethically sourced training data.
The model comes in two variants:
Performance benchmarks show Comma v0.1-2T is competitive with Llama 2 7B, OLMo, and DeepSeekLLM on knowledge-intensive and coding tasks. But despite this impressive pedigree, no GGUF conversion existed - making it inaccessible to the llama.cpp and Ollama communities.
The conversion process seemed simple enough:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
pip install -r requirements.txt
python convert_hf_to_gguf.py ../comma-v0.1-2t --outfile comma.gguf --outtype f16
The script started well - it recognized the Llama architecture, loaded the model weights, processed all 32 transformer blocks... and then crashed.
The first error was straightforward:
FileNotFoundError: File not found: tokenizer.model
Llama.cpp's converter was looking fortokenizer.model
- a SentencePiece format tokenizer file common in older Llama models. But Comma v0.1 uses the newer HuggingFacetokenizer.json
format instead.
This wasn't a showstopper - llama.cpp supports both formats. The converter should have fallen back to readingtokenizer.json
. So why was it failing?
The real error came next:
NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()
chkhsh: bf66900d65fe80247e435184a4ac839c5c332657cf567e64b8ede5fbd63f5fd9
This was more interesting. The llama.cpp converter uses cryptographic checksums to identify known tokenizer formats. When it encounters a new tokenizer, it computes a checksum of the tokenizer configuration and looks it up in a hardcoded table of known tokenizers.
Comma v0.1's tokenizer - while functionally identical to Llama 3's BPE tokenizer - had a unique checksum that wasn't in llama.cpp's database. The converter literally didn't know this tokenizer existed.
I examined the error more carefully. The converter helpfully prints the problematic checksum and suggests updating theget_vocab_base_pre()
function. This function lives inconvert_hf_to_gguf.py
around line 900.
Looking at the code, I found a long chain of checksum comparisons:
if chkhsh == "some_checksum_1":
res = "llama-bpe"
if chkhsh == "some_checksum_2":
res = "gpt2"
if chkhsh == "some_checksum_3":
res = "llama4"
# ... many more ...
if res is None:
logger.warning("BPE pre-tokenizer was not recognized!")
raise NotImplementedError(...)
The solution was clear: add Comma's checksum to this list and map it to the appropriate tokenizer type.
But which tokenizer type should Comma map to? The model uses Llama 3's architecture, and examining thetokenizer.json
confirmed it uses Byte-Pair Encoding (BPE) with a structure very similar to Llama 3.
I found several existing Llama 3 models in the converter's checksum list, all mapping to"llama-bpe"
. This was the answer.
I created a patch script (patch_converter.py
) to automate the fix:
patch_code = f'''
# Patch for Comma v0.1 tokenizer (Llama 3 compatible)
if chkhsh == "{failing_checksum}":
# Comma v0.1 uses Llama 3 style BPE
res = "llama-bpe"
'''
But where to insert it? My first attempt placed the patch after the warning block - which meant it would execute but then the unconditionalraise NotImplementedError
would fire anyway.
The bug was subtle. Looking at the original code:
if res is None:
logger.warning("...")
logger.warning("...")
# ... more warnings ...
# My patch was here
if chkhsh == "bf66900d...":
res = "llama-bpe"
raise NotImplementedError(...) # This always ran!
Theraise
statement was unconditional - it would execute whether or notres
was set! This was clearly a bug in my understanding of the code structure.
The fix: Theraise
should be inside theif res is None:
block. That way it only raises an error if no tokenizer was matched.
The final patch placed the checksum check with the other checksum checks (before theif res is None:
), and moved theraise
inside the error block:
# Add Comma's checksum with the others
if chkhsh == "bf66900d65fe80247e435184a4ac839c5c332657cf567e64b8ede5fbd63f5fd9":
# Patch for Comma v0.1 tokenizer (Llama 3 compatible)
# ref: https://huggingface.co/common-pile/comma-v0.1-2t
res = "llama-bpe"
# Check if any tokenizer matched
if res is None:
logger.warning("BPE pre-tokenizer was not recognized!")
raise NotImplementedError(...)
# Continue with conversion
logger.debug(f"tokenizer.ggml.pre: {repr(res)}")
return res
With the tokenizer issue fixed, I tried the conversion again:
python convert_hf_to_gguf.py ../comma-v0.1-2t \
--outfile comma-v0.1-2t.gguf --outtype q4_K_M
New error:
argument --outtype: invalid choice: 'q4_K_M' (choose from 'f32', 'f16', 'bf16', 'q8_0', 'tq1_0', 'tq2_0', 'auto')
It turns outconvert_hf_to_gguf.py
only does basic conversions. Advanced quantization types like Q4_K_M require a separate step usingllama-quantize
.
Since we wanted the full-precision version anyway (to preserve quality and allow users to quantize to their preferred format later), we used F16:
python convert_hf_to_gguf.py ../comma-v0.1-2t \
--outfile comma-v0.1-2t-f16.gguf --outtype f16
The conversion took about 15 minutes, processing all 291 tensors and converting weights from bfloat16 to F16 format. The progress output was beautiful:
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {4096, 64256}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {11008, 4096}
...
INFO:gguf.vocab:Adding 63753 merge(s).
INFO:gguf.gguf_writer:Writing comma-v0.1-2t-f16.gguf: n_tensors = 291, total_size = 14.0G
Writing: 100%|██████████| 14.0G/14.0G [14:23<00:00, 16.2Mbyte/s]
The final file: 14GB of pure F16 precision goodness.
Creating a Modelfile for a base model required some thought. Base models don't follow instructions - they complete text. The Modelfile needed to reflect this:
FROM ./comma-v0.1-2t-f16.gguf
TEMPLATE """{{ .Prompt }}"""
PARAMETER stop "<|end_of_text|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER repeat_penalty 1.1
# Note: This is a base model (not instruction-tuned).
# It will continue text rather than follow instructions.
The key insight: no SYSTEM prompt. System prompts are for instruction-tuned models that understand roles and directives. Base models just see more text.
Import and test:
ollama create comma-v0.1-2t -f Modelfile
ollama run comma-v0.1-2t
The first generation was delightful - given the beginning of a conversation, it naturally continued it:
Hello, how can I aid you?
Can you please help me in booking my flight ticket from OAK to DCA?
Sure, may I know your trip dates?
My planned travel dates are Dec 17th and Dec 19th.
Do you have any specifications?
I am looking for a connecting flight which returns at evening time.
[continues naturally with flight booking dialogue...]
Perfect! The model was working exactly as expected for a well-trained base model.
The checksum-based tokenizer identification is clever but brittle. Any change to the tokenizer configuration - even cosmetic ones - creates a new checksum that won't be recognized. This is actually good for security (prevents accidental tokenizer mismatches) but bad for new models.
Solution for the community: Submit tokenizer checksums upstream to llama.cpp so future users don't hit this issue.
The shift from SentencePiece (tokenizer.model
) to HuggingFace JSON format (tokenizer.json
) is ongoing. Tools need to handle both gracefully. The llama.cpp converter does this well, but the error messages could be clearer about which format is actually missing.
The distinction between base models and instruction-tuned models is often lost in documentation. Base models:
Instruction-tuned models:
Our Modelfile and documentation needed to make this crystal clear.
Everyone jumps to quantized models (Q4, Q5, Q8) to save space. But F16 preserves the full precision of the original model while being significantly smaller than F32. For systems with sufficient VRAM, it's the sweet spot.
Plus, users can always quantize down from F16 later if needed. You can't un-quantize back up.
To help others facing similar issues, we createdpatch_converter.py
- an automated script that:
convert_hf_to_gguf.py
This makes the conversion reproducible and helps document exactly what changed.
The F16 version is ideal for preservation and quality, but many users want smaller variants:
Each requires thellama-quantize
tool:
./llama-quantize comma-v0.1-2t-f16.gguf comma-v0.1-2t-q4_K_M.gguf Q4_K_M
The proper solution is to submit a pull request to llama.cpp adding Comma v0.1's tokenizer checksum to the official list. This would make the patch unnecessary for future users.
It would be valuable to benchmark the GGUF version against the original PyTorch model to verify conversion accuracy. Metrics like perplexity on a standard corpus would confirm we didn't lose anything in translation.
For those curious, the checksum is computed over the tokenizer's JSON configuration:
tokenizer_config = json.dumps(tokenizer_json, sort_keys=True)
chkhsh = hashlib.sha256(tokenizer_config.encode()).hexdigest()
Comma v0.1's checksum:
bf66900d65fe80247e435184a4ac839c5c332657cf567e64b8ede5fbd63f5fd9
This identifies it uniquely as Llama 3-style BPE with 64,256 vocabulary tokens.
What started as a simple format conversion became a journey into the guts of language model tooling. The challenges we encountered - missing tokenizers, unrecognized checksums, control flow bugs - are all solvable, but they highlight how much implicit knowledge is embedded in these conversion tools.
By documenting this process and sharing the solutions, we hope to:
The result: A working, tested, community-ready GGUF conversion of an excellent ethically-trained language model.
And perhaps most importantly: One more data point showing that competitive language models can indeed be built on purely open and ethically sourced data.
Files Available:
comma-v0.1-2t-f16.gguf
- The converted model (14GB)patch_converter.py
- Automated patching scriptModelfile
- Ollama configurationResources: