When making safety cases for alignment, its important to remember that defense against single-turn attacks doesn't always imply defense against multi-turn attacks.
Our recent paper shows a case where breaking up a single turn attack into multiple prompts (spreading it out over the conversation) changes which models/guardrails are vulnerable to the jailbreak.
Robustness against the single-turn version didn't imply robustness against the multi-turn version of the attack, and robustness against the multi-turn version didn't imply robustness against the single-turn version of the attack.
I expect that within a year or two, there will be an enormous surge of people who start paying a lot of attention to AI.
This could mean that the distribution of who has influence will change a lot. (And this might be right when influence matters the most?)
I claim: your effect on AI discourse post-surge will be primarily shaped by how well you or your organization absorbs this boom.
The areas I've thought the most about this phenomena are:
(But this applies to anyone who's impact primarily comes from spreading their ideas, which is a lot of people.)
I think that you or your organization should have an explicit plan to absorb this surge.
Unresolved questions:
I'd be curious to see how this looked with Covid: Did all the covid pandemic experts get an even 10x multiplier in following? Or were a handful of Covid experts highly elevated, while the rest didn't really see much of an increase in followers? If the latter, what did those experts do to get everyone to pay attention to them?
Can anyone think of alignment-pilled conservative influencers besides Geoffrey Miller? Seems like we could use more people like that...
Maybe we could get alignment-pilled conservatives to start pitching stories to conservative publications?
Should it be more tabooed to put the bottom line in the title?
Titles like "in defense of <bottom line>" or just "<bottom line>" seem to:
I think putting the conclusion in the title is good insofar it's a form of anti-clickbait: It's the most informative title possible. Yes, people may be motivated to read it in order to confirm their pre-existing opinion, or to search for counterarguments, but the alternative is often that they don't read the article at all, for a lack of motivation.
People who are motivated to write a comment from a disagreement with the title are, more or less, forced to read the actual post in order to compose their rebuttal. Which is better than not receiving any engagement from this person at all. And perhaps this post even changes their mind, or they agree with the title but find the arguments in the post too weak.
Overall, having the conclusion in the title seems good for similar reasons a summary in the beginning is good.
Though a reason to avoid the bottom line in the title is if it is some generally unpopular opinion. Many people will reflexively downvote the post without reading, causing it to be seen by fewer readers.
The scene in planecrash where Keltham gives his first lecture, as an attempt to teach some formal logic (and a whole bunch of important concepts that usually don't get properly taught in school), is something I'd highly recommend reading! As far as I can remember, you should be able to just pick it up right here, and follow the important parts of the lecture without understanding the story