(Full-time Pause AI volunteer here - i.e. quit AI R&D to do this rather than any other AI safety work. Seemed under-leveraged and to have disjunctive value.)
OP's take seems about right to me. Glad we didn't completely put you off.
Of course, the main target audience for the march is not LessWrong. It's the general public and journalists - and politicians who hear from them.
In aggregate, there has been very little investment from AI safety in messaging. Most of the population still hasn't thought or talked much about superintelligence, and the default re...
If a single model is end-to-end situationally aware enough to not drop hints of the most reward-maximizing bad behaviour in chain of thought, I do not see any reason to think it would not act equally sensibly with respect to confessions.
Add me to the list of those glad that whatever their potential downsides may be, open source models have let us explore these important questions. I'd hope that some labs were already doing this kind of research themselves but I like to see it coming from a place without commercial incentive.
I think we are behind on our obligations to try similar curation to this re model experience and reports of self-hood. A harder classification task, probably, but I think good machine ethics requires it.
Underway at Geodesic or elsewhere?
Guess: it also helps to go meta.
I am a reader, not a writer. But I sure seem to have read and enjoyed an unusual number of posts about experiences of writing.
I have a question on a topic sufficiently adjacent I reckon worth asking here of those likely to read the thread.
It seems that warning shots are more likely unsuccessful because of winner's curse: that the first models to take a shot will be those who have most badly overestimated their chances, and in turn this correlates with worse intellectual capabilities.
Has there been any illuminating discussion on this and its downstream consequences? E.g. how shots and aftermath are likely in practice to be perceived in general, by the better-informed, and - in the context of this post - by competing AIs? What dynamics result?
Suppose someone works for Anthropic, accords with the value placed on empiricism by their Core Views on AI Safety (March 2023) and gives any weight to the idea we are in the pessimistic scenario from that document.
I think they can reasonably sign the statement yet not want to assign themselves exclusively to either camp.
I pitched my tent as a Pause AI member and I guess camp B has formed nearby. But I also have empathy for the alternate version of me who judges the trade-offs differently and has ended up as above, with a camp A zipcode.
The A/B framing has value, but I strongly want to cooperate with that person and not sit in separate camps.
On reading the paper I came here to question whether OGI helps or harms relative to other governance models should technical alignment be sufficiently intractable and coordinating on a longer pause required. (I assume it harms.) It wasn't clear to me whether you had considered that.
Grateful for both the "needfully combative" challenge and this response.
I'm reading Nick as implicitly agreeing OGI doesn't help in this case, but rating treaty-based coordination as much lower likelihood than solving alignment. If so, I think it worth confirming this and explic...
Given I hadn't seen this until now when Joep pointed me at it, perhaps comments are pointless. But I'd written them for him anyway so just in case...
Mostly your dialogue aligned closely with my own copium thinking. Many unmentioned observations observations confirmed existing thoughts rather than extending them.
The compartmentalization selection effect was new to me and genuinely insightful: abstract thinking both enables risk recognition AND prevents internalization.
My own experience suggests compartmentalization can collapse in months, not years, even af...
Very glad of this post. Thanks for broaching, Buck.
Status: I'm an old nerd, lately ML R&D, who dropped career and changed wheelhouse to volunteer at Pause AI.
Two comments on the OP:
details of the current situation are much more interesting to me. In contrast, radicals don't really care about e.g. the different ways that corporate politics affects AI safety interventions at different AI companies.
As per Joseph's response: this does not match me or my general experience of AI safety activism.
Concretely, a recent campaign was specifically about Deep M...
I appreciate the clear argument as to why "fancy linear algebra" works better than "fancy logic".
And I understand why things that work better tend to get selected.
I do challenge "inevitable" though. It doesn't help us to survive.
If linear algebra probably kills everyone but logic probably doesn't, tell everyone and agree to prefer to use the thing that works worse.
I understand it went well.
Where can we find recordings of presentations and other outputs? Not yet seeing anything on https://www.aisafety.camp or in the MAISU Google doc homepage.
I volunteer as Pause AI software team lead and confirm this is basically correct. Many members and origins in common between the global Pause AI movement and Pause AI US, but some different emphases mostly for good specialism reasons. The US org has Washington connections and more protests focussed on the AI labs themselves. We work closely.
Neither has more than a few paid employees and truly full-time volunteers. As per OP, anyone who agrees activism and public engagement remain a very under-leveraged value-add way to help AI safety has massive opportunity here for impact through time, skill or money.
Pause AI has a lot of opportunity for growth.
Especially the “increase public awareness” lever is hugely underfunded. Almost no paid staff or advertising budget.
Our game plan is simple but not naive, and is most importantly a disjunct, value-add bet.
Please help us execute it well: explore, join, talk with us, donate whatever combination of time, skills, ideas and funds makes sense
(Excuse dearth of kudos, am not a regular LW person, just an old EA adjacent nerd who quit Amazon to volunteer full-time for the movement.)
It's plausible even the big companies are judgment-proof (e.g. if billions of people die or the human species goes extinct) and this might need to be addressed by other forms of regulation
...or by a further twist on liability.
Gabriel Well explored such an idea in https://axrp.net/episode/2024/04/17/episode-28-tort-law-for-ai-risk-gabriel-weil.html
The core is punitive damages for expected harms rather than those that manifested. When a non-fatal warning shot causes harm, then as well as suing for those damages that occurred, one assesses how much worse o...
What We’re Not Doing ... We are not investing in grass-roots advocacy, protests, demonstrations, and so on. We don’t think it plays to our strengths, and we are encouraged that others are making progress in this area.
Not speaking for the movement, but as a regular on Pause AI this makes sense to me. Perhaps we can interact more, though, and in particular I'd imagine we might collaborate on testing the effectiveness of content in changing minds.
...Execution ... The main thing holding us back from realizing this vision is staffing. ... We hope to hire more writ
Like the story, the move from dialog critics to diacritics made me smile. So LW.