Nous research just released the RL environments they used to RL Hermes 4 here. For example, there is a diplomacy one, pydantic, infinimath, ReasoningGym.
If AI labs are scooping up new RL environments, now might be the chance to have an impact by released open source RL env's. For example, we could make ones for moral reasoning, or for formal verification.
A similar opportunity existed ~2020 by contributing to the pretraining corpus.
I've done something similar to this, so I can somewhat replicate your results.
I did things differently.
My code:
My findings, that differ from yours
Although they could have tested "LLM's" and not primarily Claude and that could have bypassed that effect.
walk up the stack trace
And start at the lowest level of your own code, but be willing to go into library code if needed.
That makes sense, thank you for explaining. Ah yes, I see they are all the LORA adapters, for some reason I thought they were all merged, my bad. Adapters are certainly much more space efficient.
Yes, that's exactly what I mean! If we have word2vec like properties, steering and interpretability would be much easier and more reliable. And I do think it's a research direction that is prospective, but not certain.
Facebook also did an interesting tokenizer, that makes LLM's operating in a much richer embeddings space: https://github.com/facebookresearch/blt. They embed sentences split by entropy/surprise. So it might be another way to test the hypothesis that a better embedding space would provide ice Word2Vec like properties.
Are you going to release the code models too? They seem useful? Also, the LORA versions if possible, please.
Thank you for releasing the models.
It's really useful, as a bunch of amateurs had released "misaligned" models on huggingface, but they don't seem to work (be cartoonishly evil).
I'm experimenting with various morality evals (https://github.com/wassname/llm-moral-foundations2, https://github.com/wassname/llm_morality) and it's good to have a negative baseline. It will also be good to add it to speechmap.ai if we can.
Good point! And it's plausible because diffusion seems to provide more supervision and get better results in generative vision models, so it's a candidate for scaling.
There is a paper showing this works for writing chapters of fiction. This shows it generalises outside of math.
https://arxiv.org/abs/2503.22828v1