LESSWRONG
LW

141
wassname
38341900
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
wassname29m10

Overall, I'd guess this advance is real, but probably isn't that big of a deal outside of math

There is a paper showing this works for writing chapters of fiction. This shows it generalises outside of math.
https://arxiv.org/abs/2503.22828v1

Reply
Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
wassname1h10

Nous research just released the RL environments they used to RL Hermes 4 here. For example, there is a diplomacy one, pydantic, infinimath, ReasoningGym.

If AI labs are scooping up new RL environments, now might be the chance to have an impact by released open source RL env's. For example, we could make ones for moral reasoning, or for formal verification.

A similar opportunity existed ~2020 by contributing to the pretraining corpus.

Reply1
How LLM Beliefs Change During Chain-of-Thought Reasoning
wassname22d*40

I've done something similar to this, so I can somewhat replicate your results.

I did things differently.

  • instead of sampling the max answer, I take a weighted sum of the choices, this shows the smoothness better. I've verified on Judgemarkv2 that this works just as well
  • I tried Qwen3-4b-thinking and Qwen3-14b, with similar results
  • I used check pointing of the kv_cache to make this pretty fast (see my code below)
  • I tried this with activation steering, and it does seem to change the answer!, mostly outside reasoning mode

My code:

  • simple: https://github.com/wassname/CoT_rating/blob/main/06_try_CoT_rating.ipynb
  • complex: https://github.com/wassname/llm-moral-foundations2/blob/main/nbs/06_try_CoT_rating.ipynb

My findings, that differ from yours

  • well-trained reasoning models do converge, for the first 100 tokens, during the <think> stage!, but will fluctuate around during conversation. I think that this is because RLVF trains the model to think well!
Reply
On closed-door AI safety research
wassname25d10

Although they could have tested "LLM's" and not primarily Claude and that could have bypassed that effect.

Reply
Debugging for Mid Coders
wassname1mo10

walk up the stack trace

 

And start at the lowest level of your own code, but be willing to go into library code if needed.

Reply
Model Organisms for Emergent Misalignment
wassname3mo10

That makes sense, thank you for explaining. Ah yes, I see they are all the LORA adapters, for some reason I thought they were all merged, my bad. Adapters are certainly much more space efficient.

Reply
Gemini Diffusion: watch this space
wassname3mo30

Yes, that's exactly what I mean! If we have word2vec like properties, steering and interpretability would be much easier and more reliable. And I do think it's a research direction that is prospective, but not certain.

Facebook also did an interesting tokenizer, that makes LLM's operating in a much richer embeddings space: https://github.com/facebookresearch/blt. They embed sentences split by entropy/surprise. So it might be another way to test the hypothesis that a better embedding space would provide ice Word2Vec like properties.

Reply1
Model Organisms for Emergent Misalignment
wassname3mo10

Are you going to release the code models too? They seem useful? Also, the LORA versions if possible, please.

Reply
Model Organisms for Emergent Misalignment
wassname3mo10

Thank you for releasing the models.

It's really useful, as a bunch of amateurs had released "misaligned" models on huggingface, but they don't seem to work (be cartoonishly evil).

I'm experimenting with various morality evals (https://github.com/wassname/llm-moral-foundations2, https://github.com/wassname/llm_morality) and it's good to have a negative baseline. It will also be good to add it to speechmap.ai if we can.

Reply
Gemini Diffusion: watch this space
wassname3mo10

Good point! And it's plausible because diffusion seems to provide more supervision and get better results in generative vision models, so it's a candidate for scaling.

Reply
Load More
3wassname's Shortform
1y
12
3wassname's Shortform
1y
12
15What did you learn from leaked documents?
Q
2y
Q
10
16What should we censor from training data?
2y
4
1Talk and Q&A - Dan Hendrycks - Paper: Aligning AI With Shared Human Values. On Discord at Aug 28, 2020 8:00-10:00 AM GMT+8.
5y
0