LESSWRONG
LW

129
Jakub Halmeš
35380
Message
Dialogue
Subscribe

https://jakubhalmes.com/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1Jakub Halmeš's Shortform
8mo
7
silentbob's Shortform
Jakub Halmeš1mo52

A couple of weeks ago, I was surprised to find out that you can create artifacts that call the Claude API. Silly example: Chat app with Claude always responding with capitalized text.

Reply
People Are Less Happy Than They Seem
Jakub Halmeš2mo10

Yes, I agree that the quoted statement is too strong, and many feelings are unnoticed or forgotten. 

Reply
Jakub Halmeš's Shortform
Jakub Halmeš8mo60

I wonder if you could take the R1-Zero training regime, penalize/restrict using existing words from all languages (maybe only in the scratchpad, not the final response), and obtain a model which can solve math problems by reasoning in a non-existent language.

Reply1
Jesse Hoogland's Shortform
Jakub Halmeš8mo40

During the training process, we observe that CoT often exhibits language mixing, particularly when RL prompts involve multiple languages. To mitigate the issue of language mixing, we introduce a language consistency reward during RL training, which is calculated as the proportion of target language words in the CoT. Although ablation experiments show that such alignment results in a slight degradation in the model’s performance, this reward aligns with human preferences, making it more readable.

I also found this trade-off between human readability and performance noteworthy.

Reply
Jakub Halmeš's Shortform
Jakub Halmeš8mo10

Yes, fair here means that their subjective EVs are equal. The post referenced in the sibling comment calls it "Even Odds", which is probably better.

Reply
Jakub Halmeš's Shortform
Jakub Halmeš8mo10

I did not realize that. Thank you for the reference! 

Reply
Jakub Halmeš's Shortform
Jakub Halmeš8mo*70

If Alice thinks X happens with a probability of 20% while Bob thinks it's 40%, what would be a fair bet between them? 

I created a Claude Artifact, which calculates a bet such that the expected value is the same for both.

In this case, Bob wins if X happens (he thinks it's more likely). If Alice bets $100, he should bet $42.86, and the EV of such bet for both players (according to their beliefs) is $14.29. 

EDIT: I updated the calculator to handle the case when A's probability is higher than B's correctly.

Reply
The Inner Alignment Problem
Jakub Halmeš2y10

I wrote this mostly for personal purposes. I wanted to organize my thoughts about the problem while reading the paper, and publishing the notes, even if no one reads them, forces me to write more clearly and precisely.

I would like to get some feedback if there may be value in posts such as this one for other people. Please let me know! Thank you. 

Reply
19People Are Less Happy Than They Seem
2mo
6
1Jakub Halmeš's Shortform
8mo
7
1The Inner Alignment Problem
2y
1