LESSWRONG
LW

agg
2623150
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
OpenAI now has an RL API which is broadly accessible
agg3mo167

For what it's worth, runrl.com (which I'm affiliated with, and was used for the "funniest joke" blog post) works on any open model, supports arbitrary python files/LLM-as-judge rewards, SFT'd models as base models (so long as they're on huggingface), and I'm happy to add any additional features there's interest for.

Reply1
Generating the Funniest Joke with RL (according to GPT-4.1)
agg4mo10

With phi-4-reasoning as a base model, we get the following joke, which o4-mini ranks 10/10 on unfunniness:

**Joke:**  
Why did the coffee bean fail at prom?  
"T­-hank gʇe tʃ bʉʄ y bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ bʉʅ Bʉʅ bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Fʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Bʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ Wʉʅ AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA AA-AA  

**Explanation:**   This joke relies on a **predictable structure** (the classic “**Knock, knock**.” response) and a **stale punchline** (“**To get to the other side**.”) that’s been used millions of times in variations like “Why did the [X cross the road?”]. Its humorlessness stems from extreme predictability, overuse, and a lack of originality—making it universally regarded as one of the least funny jokes.   *(Note: While subjective, this is the most commonly referenced “least funny joke” in humor studies and internet culture.)*

Reply
Generating the Funniest Joke with RL (according to GPT-4.1)
agg4mo21

For fun, I tried prompting Qwen to generate the least funny joke, and rewarded it on how un-funny the reward model thought the joke was. The current frontrunner:

Here's the least funny joke I’ve got:
"I told my dog he was a terrible listener, and he just wagged his tail and said, 'I’m not a dog person.'"
*(Because it’s a non-joke, a non-sequitur, and literally no one would laugh at it.)* 😂

Which, idk, I actually find funnier than the jokes in the post...

Reply
Generating the Funniest Joke with RL (according to GPT-4.1)
agg4mo30

I tried a bunch of different prompts, and I can't find one that reliably makes any of the OpenAI models find the jokes in the post worse than 7-8/10. (Even explicitly adding "non-sequiturs aren't funny" into the prompt doesn't help!)

Reply2
Generating the Funniest Joke with RL (according to GPT-4.1)
agg4mo10

I think each of these runs was ~$40 (half an hour at $80 per 8xH100 node-hour)

Reply
Generating the Funniest Joke with RL (according to GPT-4.1)
agg4mo10

I ran this on runrl.com with the llm-as-judge option and the default settings for everything else (disclaimer: I work for runrl.com and thus have a lot of free credits to experiment with)

Reply
Generating the Funniest Joke with RL (according to GPT-4.1)
agg4mo100

Good idea! These experiments took maybe ~30 min each, so it should be pretty straightforward to run a bunch more with better prompts. I also think Claude 3.7 might be a better judge of humor than GPT 4.1.

Reply
AI for Bio: State Of The Field
agg1y10

Seconded; just last week I had been wishing that something like this existed!

Reply
Load More
101Generating the Funniest Joke with RL (according to GPT-4.1)
4mo
22
50Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)
2y
6
33Introducing REBUS: A Robust Evaluation Benchmark of Understanding Symbols
2y
0
11Apply to the Cavendish Labs Fellowship (by 4/15)
2y
0
28What's the simplest concrete unsolved problem in AI alignment?
Q
3y
Q
4
59Announcing Cavendish Labs
3y
5