Yeah, what I meant was that "goals" or "preferences" are often emphasized front and center, but here not so much, because it seems like you want to reframe that part under the banner of "intention"
A range of other other relevant concepts also build on causality:
It just felt a little odd to me that so much bubbled up from your decomposition except utility, but you only mention "goals" as this thing that "causes" behaviors without zeroing in on a particular formalism. So my guess was that vnm would be hiding behind this "intention" idea.
I can't say anything rigorous, sophisticated, or credible. I can just say that the paper was a very welcome spigot of energy and optimism in my own model of why "formal verification" -style assurances and QA demands are ill-suited to models (either behavioral evals or reasoning about the output of decompilers).
I really want to create a more distinct and intentionally separate culture both on LessWrong and at the Rose Garden Inn, and I think owning a physical space hugely helps with that. FTX, various experiences I've had in the EA space over the past few years, as well as a lot of safetywashing in AI Alignment in more recent years, have made me much more hesitant to build a community that can as easily get swept up in respectability cascades and get exploited as easily by bad actors, and I really want to develop a more intentional culture in what we are building here. Hopefully this will enable the people I am supporting to work on things like AI Alignment without making the world overall worse, or displaying low-integrity behavior, or get taken advantage of.
I'm extremely excited by and supportive of this comment! An especially important related area I think is "solving the deference problem" or cascades of a sinking bar in forecasting and threatmodeling that I've felt over the last couple years.
Indistinguishability obfuscation is compelling, but I wonder what blindspots would arise if we shrunk our understanding of criminality/misuse down to a perspective shaped like "upstanding compliant citizens only use \(f \circ g\), only an irresponsible criminal would use \(g\)" for some model \(g\) (like GPT, or in the paper \(D\)) and some sanitization layer/process \(f\) (like RLHF, or in the paper \(SD\)). That may reduce legitimacy or legibility of grievances or threatmodels that emphasize weaknesses of sanitization (in a world where case law and regulators make it hard to effectively criticize or steer vendors who fulfill enough checklists before we've iterated enough on a satisfying CEVy/social choice theoretic update to RLHF-like processes, i.e. case law or regulators bake in a system prematurely and there's inertia presented to anyone who wants to update the underlying definition of unsafe/toxic/harmful). It may also reduce legitimacy or legibility of upsides of unfiltered models (in an current chatbot case, perhaps public auditability of a preference aggregator pays massive social cohesion dividends).
We may kind of get the feeling that a strict binary distinction is emerging between raw/pure models and sanitization layers/processes, because trusting SGD would be absurd and actually-existing RLHF is a reasonable guess from both amoral risk-assessment views (minimizing liability or PR risk) as well as moral views (product teams sincerely want to do the right thing). But if this distinction becomes paradigmatic, I would predict we become less resilient to diffusion of responsibility (type 1, in the paper) threat models, because I think explicit case law and regulation gives some actors an easy proxy of doing the right thing making them not actually try to manage outcomes (Zvi talked about this in the context of covid, calling it "social vs physical reality", and it all also relates to "trying to try vs. trying" from the sequences/methods). I'm not saying I have alternatives to the strict binary distinction, it seems reasonable, or at least it seems like a decent bet with respect to the actual space of things we can choose to settle for if it's already "midgame".
So the contributions of vnm theory are shrunken down into "intention"? Will you recapitulate that sort of framing (such as involving the interplay between total orders and real numbers) or are you feeling more like it's totally wrong and should be thrown out?
Deleting a market is unprecedented
I thought there was a vibecamp market that was deleted, last year?
Check out the section on computational social choice theory here https://www.alignmentforum.org/posts/hvGoYXi2kgnS3vxqb/some-ai-research-areas-and-their-relevance-to-existential-1#Computational_Social_Choice__CSC_
Also MDAS might have a framing and reading list you'd like https://forum.effectivealtruism.org/posts/YmvBu7fuuYEr3hhnh/announcing-mechanism-design-for-ai-safety-reading-group but there are many other ways of applying the mechdzn literature to your post
godspeed.
In terms of the parties, DNC has a track record of handling their populists, and GOP does not.
This comment reads like it's coming from a world where Romney is running the GOP and AOC is running the DNC.
It simply is not viable to bothsides this.