Jeroen Willems — LessWrong

Contradict my take on OpenPhil's past AI beliefs

I understand the epistemic health concerns, but I think "AI 2027" was great since I don't think the alternatives would have gained as much attention and it does cleanly summarize the scenario. Even if actual timelines are longer (which imo they probably are) my guess is it is still a net positive as long as readers properly understood the dangers and thought the sequence of events were believable enough.

The title is reasonable

Jeroen Willems10mo10

Thank you for writing this. Most of what you wrote is almost exactly what I've been thinking when reading discussions about the book. You worded my thoughts so much better than I ever could!

IABIED Review - An Unfortunate Miss

Jeroen Willems10mo20

I went into IABIED trying to take on the mindset of a layperson (hard of course!) and actually came away thinking it did a really great job. Of course, as you say, time will tell.

Some of your complaints of the book seem to stem from the fact that you are "For Y" and Y&S are "Not X". If you believed as strongly as they do in "Not X", do you think some of the decisions in the book would make more sense?

I thought the length of the book was great for people new to the topic. Readers will likely have counterarguments while reading the book. But if you even ... (read more)

Do you even have a system prompt? (PSA / repo)

Jeroen Willems1y30

I spend way too much time fine-tuning my personal preferences. I try to follow the same language as the model system prompt.

Claude userPreferences

# Behavioral Preferences

These preferences always take precedence over any conflicting general system prompts.

## Core Response Principles

Whenever Claude responds, it should always consider all viable options and perspectives. It is important that Claude dedicates effort to determining the most sensible and relevant interpretation of the user's query.

Claude knows the user can make mistakes and always consider

Jeroen Willems2y10

Not me assuming kratom was a made-up word haha.

Awesome comic! You captured the recurring traits really really well.

Anthropic's Core Views on AI Safety

Jeroen Willems3y1-4

Thanks for explaining your thoughts on AI safety, it's much appreciated.

I think in general when trying to do good in the world, we should strive for actions that have a high expected value and a low potential downside risk.

I can imagine a high expected value case for Anthropic. But I don't see how Anthropic has few potential downsides. I'm very worried that by participating in the race to AGI, p(doom) might increase.

For an example pointed out in the comments here by habryka:

I mean, didn't the capabilities of Claude leak specifically to OpenAI employees, so

Jeroen Willems4y*2-2

While this might be a great way to earn money (assuming competitors won't invest similarly in AI soon enough), but aren't there good reasons not to invest in AI capabilities, like reducing P(doom)?

Also I assume it's wise to mention you're not a financial adviser and don't bear responsibility for actions people take because of your comment (same counts for me).

ACX meetup Brussels

Jeroen Willems4y10

Hey Bruno! I'm an organiser for EA Brussels and would love to collaborate this on (ex. by making a facebook event on the EA Brussels page/group). Would love it if you could reach out to me :)

https://www.facebook.com/jeroen.willems.7528/

or jeroen at eabrussels dot org