Jonathan Claybrough

Software engineer transitioned into AI safety, teaching and strategy. Particularly interested in psychology, game theory, system design, economics.

Posts

Sorted by New

3Jonathan Claybrough's Shortform

130The case for training frontier AIs on Sumerian-only corpus

6mo

3Jonathan Claybrough's Shortform

65News : Biden-⁠Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI

20An Overview of AI risks - the Flyer

Wiki Contributions

Comments

Pivotal Acts are easier than Alignment?

Jonathan Claybrough4d10

The naming might be confusing because pivotal act sounds like a one time action, but in most cases getting to a stable world without any threat from AI requires constant pivotal processes. This makes almost all the destructive approaches moot (and they're probably already bad for ethical concerns and many others already discussed) because you'll make yourself a pariah.

The most promising venue for a pivotal act/pivotal process that I know of is doing good research so that ASI risks are known and proven, doing good outreach and education so most world leaders and decision makers are well aware of this, and helping setup good governance worldwide to monitor and limit the development of AGI and ASI until we can control it.

Exercise: Planmaking, Surprise Anticipation, and "Baba is You"

Jonathan Claybrough5d10

I recently played Outer Wilds and Subnautica, and the exercise I recommend for both of these games is : Get to the end of the game without ever failing.
In subnautica that's dying once, in Outer Wilds it's a spoiler to describe what failing is (successfully getting to the end could certainly be argued to be a fail).
I failed in both of these. I played Outer Wilds first and was surprised at my fail, which inspired me to play Subnautica without dying. I got pretty far but also died from a mix of 1 unexpected game mechanic, uncareful measure of another mechanic, lack of redundancy in my contingency plans.

Response to Dileep George: AGI safety warrants planning ahead

Jonathan Claybrough15d30

Oh wow, makes sense. It felt weird that you'd spend so much time on posts, yet if you didn't spend much time it would mean you write at least as fast as Scott Alexander. Well, thanks for putting in the work. I probably don't publish much because I want it to not be much work to do good posts but you're reassuring it's normal it does.

Response to Dileep George: AGI safety warrants planning ahead

Jonathan Claybrough18d10

(aside : I generally like your posts' scope and clarity, mind saying how long it takes you to write something of this length?)

Jonathan Claybrough's Shortform

Jonathan Claybrough23d40

Self modeling is a really important skill, and you can measure how good you are at it by writing predictions about yourself. (Modelling A notably important one for people who have difficulty with motivation is predicting your own motivation - will you be motivated to do X in situation Y?

If you can answer that one generally, you can plan to actually anything you could theoretically do, using the following algorithm : from current situation A, to achieve wanted outcome Z, find a predecessor situation Y from which you'll be motivated to get to Z (eg. have written 3 paragraphs of 4 of an essay), and a predecessor situational X from which you'll get to Y, iterate til you get to A (or forward chain, from A to Z). Check that indeed you'll be motivated each step of the way.

How can the above plan fail? Either you were mistaken about yourself, or about the world. Figure out which and iterate.

The Minority Faction

Jonathan Claybrough1mo40

Appreciate the highlight of identity as this import/crucial self fulfilling prophecy, I use that frame a lot.

What does the title mean? Since they all disagree I don't see one as being more of a minority than the other.

Talk: AI safety fieldbuilding at MATS

Jonathan Claybrough1mo32

Nice talk!
When you talk about the most important interventions for the three scenarios, I wanna highlight that in the case of nationalization, you can also, if you're a citizen of one of these countries nationalizing AI, work for the government and be on those teams working and advocating for safe AI.

Awakening

Jonathan Claybrough2mo10

In my case I should have measurable results like higher salary, higher life satisfaction, more activity, more productivity as measured by myself and friends/flatmates. I was very low so it'll be easy to see progress. The difficulty was finding something that'd work, but it won't be measuring if it does.

Jonathan Claybrough's Shortform

Jonathan Claybrough2mo40

Some people have short ai timelines based inner models that don't communicate well. They might say "I think if company X trains according to new technique Y it should scale well and lead to AGI, and I expect them to use technique Y in the next few years", and the reasons for why they think technique Y should work are some kind of deep understanding built from years of reading ml papers, that's not particularly easy to transmit or debate.

In those cases, I want to avoid going into details and arguing directly, but would suggest that they use their deep knowledge of ML to predict existing recent results before looking at them. This would be easy to cheat, so I mostly suggest this for people to check themselves, or check people you trust to be honorable. Concretely, it'd be nice if when some new ml paper with a new technique comes out, someone compilés a list of questions answered by that paper (eg is technique A better than technique B for a particular result) and posts it to LW so people can track how well they understand ML, and thus (to some extent) short timelines.

For example a recent paper examinés how data affects performance on a bunch of benchmarks, and notably tested training either on an duplicated dataset (a bunch of common crawls), or deduplixated (the same except remove same documents that were shared between crawls). Do you expect deduplication in this case raises or lowers performance on benchmarks? If we could have similar questions when new results come out it's be nice.

Awakening

Jonathan Claybrough2mo30

Thank you for sharing, it really helps to pile on these stories (and nice to have some trust they're real, more difficult to get from reddit - on which note are there non doxing receipts you can show for this story being true? I have no reason to doubt you in particular but I guess it's good hygiene when on the internet to ask for evidence)

It also makes me wanna share a bit of my story. I read The Mind Illuminated, I did only small amounts of meditation, yet the framing the book offers has been changing my thinking and motivational systems. There aren't many things I'd call info hazards, but in my experience even just reading the book seems to be enough to contribute to profound changes, that would not be obviously be considered positive by the previous me. (They're not obviously negative either, I happen to be hopeful, but I'm waiting on results another year later to say)