Caleb Biddulph

Wikitag Contributions

Comments

Sorted by

I don't necessarily anticipate that AI will become superhuman in mechanical engineering before other things, although it's an interesting idea and worth considering. If it did, I'm not sure self-replication abilities in particular would be all that crucial in the near term.

The general idea that "AI could become superhuman at verifiable tasks before fuzzy tasks" could be important though. I'm planning on writing a post about this soon.

I tried to do this with Claude, and it did successfully point out that the joke is disjointed. However, it still gave it a 7/10. Is this how you did it @ErickBall?

Few-shot prompting seems to help: https://claude.ai/share/1a6221e8-ff65-4945-bc1a-78e9e79be975

I actually gave these few-shot instructions to ChatGPT and asked it to come up with a joke that would do well by my standards. It did surprisingly well!

I asked my therapist if it was normal to talk to myself.
She said, "It’s perfectly fine—as long as you don’t interrupt."

Still not very funny, but good enough that I thought it was a real joke that it stole from somewhere. Maybe it did, but I couldn't find it with a quick Google search.

This was fun to read! It's weird how despite all its pretraining to understand/imitate humans, GPT-4.1 seems to be so terrible at understanding humor. I feel like there must be some way to elicit better judgements.

You could try telling GPT-4.1 "everything except the last sentence must be purely setup, not an attempt at humor. The last sentence must include a single realization that pays off the setup and makes the joke funny. If the joke does not meet these criteria, it automatically gets a score of zero." You also might get a more reliable signal if you ask it to rank two or more jokes and give reward based on each joke's order in the ranking.

Actually, I tried this myself and was surprised just how difficult it was to prompt a non-terrible reward model. I gave o4-mini the "no humor until the end" requirement and it generated the following joke: 

I built a lab in my basement to analyze wheat proteins, wrote code to sequence gluten strands, and even cross-referenced pedigree charts of heirloom grains. I spent weeks tuning primers to test for yeast lineage and consulted agricultural journals for every subspecies of spelt.  

Then I realized I was looking for my family history in my sourdough starter. 

What does this even mean? It makes no sense to me. Is it supposed to be a pun on "pedigree" and "lineage?" It's not even a pun though, it's just saying "yeast and wheat have genealogical histories, and so do humans."

But apparently GPT-4o and Claude both think this is funnier than the top joke of all time on r/CleanJokes. (Gemini thought the LLM-written joke was only slightly worse.) The joke from Reddit isn't the most original, but at least it makes sense.

Surely this is something that could be fixed with a little bit of RLHF... there's no way grading jokes is this difficult.

Seems possible, but the post is saying "being politically involved in a largely symbolic way (donating small amounts) could jeopardize your opportunity to be politically involved in a big way (working in government)"

Yeah, I feel like in order to provide meaningful information here, you would likely have to be interviewed by the journalist in question, which can't be very common.

At first I upvoted Kevin Roose because I like the Hard Fork podcast and get generally good/honest vibes from him, but then I realized I have no personal experiences demonstrating that he's trustworthy in the ways you listed, so I removed my vote.

I remember being very impressed by GPT-2. I think I was also quite impressed by GPT-3 even though it was basically just "GPT-2 but better." To be fair, at the moment that I was feeling unimpressed by ChatGPT, I don't think I had actually used it yet. It did turn out to be much more useful to me than the GPT-3 API, which I tried out but didn't find that many uses for.

It's hard to remember exactly how impressed I was with ChatGPT after using it for a while. I think I hadn't fully realized how great it could be when the friction of using the API was removed, even if I didn't update that much on the technical advancement.

I remember seeing the ChatGPT announcement and not being particularly impressed or excited, like "okay, it's a refined version of InstructGPT from almost a year ago. It's cool that there's a web UI now, maybe I'll try it out soon." November 2022 was a technological advancement but not a huge shift compared to January 2022 IMO

Which part do people disagree with? That the norm exists? That the norm should be more explicit? That we should encourage more cross-posting?

It seems there's an unofficial norm: post about AI safety in LessWrong, post about all other EA stuff in the EA Forum. You can cross-post your AI stuff to the EA Forum if you want, but most people don't.

I feel like this is pretty confusing. There was a time that I didn't read LessWrong because I considered myself an AI-safety-focused EA but not a rationalist, until I heard somebody mention this norm. If we encouraged more cross-posting of AI stuff (or at least made the current norm more explicit), maybe the communities on LessWrong and the EA Forum would be more aware of each other, and we wouldn't get near-duplicate posts like these two.

(Adapted from this comment.)

Load More