x

LESSWRONG
LW

wrmedford

Subscribe

Message

1

1y

New LLM Scaling Law

Hi all, I'm an independent researcher, and I believe I came across a new scaling law for Mixture of Experts models. I'd appreciate any review and critique. This challenges the notion that performant inference and training must hold all weights in VRAM, and suggests that as long as bus speeds...

Feb 19, 20252

wrmedford

Subscribe

Message

1

1y

New LLM Scaling Law

Hi all, I'm an independent researcher, and I believe I came across a new scaling law for Mixture of Experts models. I'd appreciate any review and critique. This challenges the notion that performant inference and training must hold all weights in VRAM, and suggests that as long as bus speeds...

Feb 19, 20252

New LLM Scaling Law

wrmedford

1y

Hi all,

I'm an independent researcher, and I believe I came across a new scaling law for Mixture of Experts models. I'd appreciate any review and critique. This challenges the notion that performant inference and training must hold all weights in VRAM, and suggests that as long as bus speeds are sufficient (like on modern hardware like NVIDIA's GH200), even NVMe could be a viable option for storing weights without a measurable performance degradation.

I am doing this in my free time on my own dime, so please forgive any mistakes. I promise they were made in good faith.

2