Posts

Sorted by New

Wiki Contributions

Comments

If I understand correctly (I very well might not), A "one bit LLM" has to be trained as a "one bit LLM" in order to then run inference on it as a "one bit LLM". I.e this isn't a new Quantization scheme.

So I think training and inference are tied together here, meaning; if this replicates, works, etc. we will probably have new hardware for both stages