Dumping out a lot of thoughts on LW in hopes that something sticks. Eternally upskilling and accelerating.
I write the ML Safety Newsletter
DMs open, especially for promising opportunities in AI Safety and potential collaborators.
Oops, I wrote that without fully thinking about diffusion models. I meant to contrast diffusion LMs to more traditional autoregressive language transformers, yes. Thanks for the correction, I'll clarify my original comment.
This seems like it's only a big deal if we expect diffusion language models to scale at a pace comparable or better to more traditional autoregressive language transformers, which seems non-obvious to me.
Right now my distribution over possible scaling behaviors is pretty wide, I'm interested to hear more from people.
LLMs also just have their own quirks, and I think Qwen might just really like hell and cats?. For example, Claude Sonnet seems to really like bioluminescence as a topic, reliably enough in different instances to where Janus gets some impressive predictive accuracy.
I think that these are all pretty relevant ways to think about being an EA, but are mostly of a different fundamental type than the thing I'm pointing at. Let me get a bit more into the aforementioned math to show why this is approximately a binary categorization along the axis I was pointing at in this post.
Say that there are three possible world states:
As a culture, on average, our values look something like:
EA > gardener > murderer
This is very reasonable and I don't think we can or should change this, as a community.
I have some mental approximation of a utility function. One of the main differences between my internal representation and an actual utility function is that the point I choose as "zero utility" doesn't matter in the formalism, but very much matters to my emotions. If we set EA=0 utility, then the myopic point-maximizing part of my brain feels okay if I do EA things, but awful if I'm getting negative points by being either of the other options. This is the moral obligation frame, where things are only barely emotionally okay if you push as far up the preference ordering as possible.
If we set gardener=0, then things feel emotionally okay if I just take the normal path. I'm not gaining or losing points. It's then positively great if I do EA things and still positively bad if I kill people. This is the moral opportunity frame, and I find it emotionally much better for me. I predict that this frame is better for community health as well, although I have only vibes and anecdata to back me up on this claim.
There are several other points I have left unnamed:
Now that's a lot of possibilities. I promised that I had "approximately a binary categorization", so where's the binary come in? Well, the dividing line I'm drawing is ultimately is "is being a normal member of society 'okay' or 'not okay'?" Alternatively, we ask the question of how our community responds to Carol, the gardener. Are we friends with her? I say yes. Do we give her the same positive reinforcement for her new daffodils that we give to someone when they donate a large sum to EA charities? I say no.
(I am intentionally neglecting the various "evil" cultures from this consideration. Technically it's not a binary if you want to include them, but I really don't see why we would ever consider doing that in real life.)
These other frames you mention are important shards of a healthy EA community, I think. They're just not quite the concept boundary I was trying to draw.
Edit: this comment seems to be incorrect; see comments.
They did not say that, on my reading? The footnote about bulk preorders says
Bulk preorders don’t count. The people who compile bestseller lists distinguish and discount bulk preorders.
Which I read as semantically equivalent to "bulk preorders neither help nor hurt" with an implicit "so don't do a bulk preorder solely for the sake of making the numbers better, but other reasons are acceptable."
Retracted my previous response due to faulty reasoning.
Yes, positive reinforcement gives a more precise signal than positive punishment. Maybe I didn't elaborate on this fact enough, but that is the point. I like it when a community precisely signals what its values are. I like it when a community isn't adversarial to bystanders. These are, among other reasons, precisely why I want a positive reinforcement culture. The point of the post is that these two frames are different. You pointed out a difference in the post that I discussed.
Positive reinforcement, however, is not more restrictive. There is no restriction going on, that's the point. It is just a more precise signal. If I give you a cookie every time you do the dishes and nothing when you don't, that is not restricting the counterfactual-yous who don't do dishes from being in my community. Somebody else just does the dishes and gets the cookie.
No, that's not quite what I'm saying in my post. I'm pointing at the times when a culture could either positively reinforce X or positively punish not-X, which is exactly the same amount of restriction.
Maybe, one could argue, in practice positive reinforcement creates a more precise signal, which on my model is a good thing; I like communities that clearly signal their values and aren't adversarial to bystanders. I really don't think this is a trick, I think the post is in fact about the differences between these two framings, including about the reinforcement/punishment distinction.
For people who are on the fence about donating and want an outside opinion:
I think CAIP is one of the best orgs at what they do in DC, and I think that what they do is important for many of the reasons Jason laid out above. I continue to think efforts for good AI governance is one of the highest-leverage things we can do for AI safety right now. My experience is that the CAIP team is highly competent and very well networked across AI safety and the policy world.
My favorite review from a congressional staffer of one of the AI risk demos at a CAIP event is "that's terrifying." They're concretely shifting opinion in DC and keeping things impressively nonpartisan while doing so.
I don't work for CAIP, but I've collaborated with them in the past. Whether or not people donate doesn't affect my finances or any of the work I do right now - I just think that they're doing good work and should have the funding to keep doing it.
Did I entirely miss these? I can't find the post anywhere. I am still interested in having more music.
Shrug, it does work for long spans (usually months before I need another) for me. This was a recent patch to a recent problem, but I've had this technique for years and it does in fact get me out of positive feedback loops of hedonic set point raising, no ketamine required. If I had to guess why it lasts, I'd say that it serves as a good reminder and willpower booster that allows me to resist further really-useless superstimuli.