LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.
. The group rationality tag on LessWrong is kind of dead
fwiw, I think when I write the sort of stuff that feels like it'd be relevant here, I tend to call use other more specific tags, like:
organizational-culture-and-design
"group rationality" feels too vague to feel that useful, but, if you disagree, I think if you took it upon yourself to populate the tag with relevant stuff you would find some.
(I deleted this because I wrote it thinking your thing was a response to another thread. I do think you're currently basically wrong about your original object-level-point, but, response to that should probably reply to here)
I think my current answer to this (which I'm indeed not very certain about, hence all my caveats in this post), is here, and think I'd prefer answering a more specific question/response to that comment.
Eli (who works at Palisade FYI) asked "Why more specifically is Palisade one of my favorite orgs, who have their eye on the ball both strategically and tactically?"
I like Palisade because I think their goal is one of the most important subgoals in my "reduce x-risk" macrostrategy frame. If Palisade didn't exist, I would consider quitting Lightcone to found a similar org.
Navigating x-risk requires a bunch of people to understand the ASI situation. It's helpful for the public to be tracking the concerns, and it's helpful for specific key decisionmakers to track concerns. Palisade is tackling both of those. (Ultimately I think the key decisionmakers are the real lynchpins, but, it's a lot easier for them to make the right call if both they and the public are informed and on the same page)
They also seem to take seriously "we won't succeed at our goals automatically, we need to build a good feedback loop for ourselves that lets us know if we're improving." (I'm not sure on the details of what they implemented, but, it seemed like they were thinking about it in productive ways).
My potential concern with Palisade might be them trying to do too much and spread themselves too thin. I think this is slightly true, and it wouldn't take much for me to update towards "yeah they really need to focus more."
...
Earlier this year, I thought to myself "man, Palisade seems important but I am worried for their sanity, given that they are an advocacy org who is going to face a lot of pressure to distort/exaggerate, and have to interface a bunch with people in DC. I think maybe one of the better things for Lightcone/me to do, is help Palisade stay sane with high integrity.
I didn't tell Palisade people that, at least not directly.
But, it so happens, one day Palisade had an incident where someone on their staff had wrote some misleading tweets that exaggerated something, and Palisade core staff treated this as a big deal that was important not to happen again. They invited me in to help facilitate a session where they asked "okay how did that end up happening? What process-stuff can we change so it doesn't happen again?"
I don't know whether they changed things or whether it worked. But, it seems like a good sign to me that they were taking that seriously. And, I was further pleasantly surprised today, when I said
What I currently think is "be clear/upfront about your organizational biases, and (maybe) advocate other groups who don't have political goals also do research without the same filters."
(Having written that out loud, @Jeffrey Ladish I do think it would be better if you did that sort of thing. I realize that's a weird for this sort of paper to do but I'm guessing it is worth spending the weirdness points on it)
And benwr responded:
In a version of the shutdown resistance paper that's currently being reviewed (not included in the preprint yet) the following details are included:
> We began our examination of this topic because we had an intuitive expectation that current LLMs might resist shutdown in settings like this one; we did not discover it by sampling uniformly from the space of all possible or realistic tasks. Specifically, we began our exploration by considering several ways to check for the presence of ``instrumentally convergent'' behavior from current LLMs. [...]
And, maybe they should be doing that sort of thing more often, and maybe they can improve at integrity in other ways. Reality doesn't grade on a curve. But, they seem to me to be demonstrably taking this seriously as a concern, which is pretty rare.*
*at least, I think it's pretty rare. If your org also does things like this, feel free to brag about that in the comments here I guess.
or, tl;dr: If someone in real life says "wink wink nudge think hard", you are supposed to guess whether you are in a morality test or a cleverness test. In this, turns out the teacher at least thought it was a morality test. And, notably also; most of the LLMs either agree, or it didn't occur to them to ask.
In the example I gave, I was specifically responding to criticism of the word 'hacking', and trying to construct a hypothetical where hacking was neutral-ish, but normies would still (IMO) be using the word hacking.
(And I was assuming some background worldbuilding of "it's at least theoretically possible to win if the student just actually Thinks Real Good about Chess, it's just pretty hard, and it's expected that students might either try to win normally and fail, or, try the hacking solution")
Was this "cheating?". Currently it seems to me that there isn't actually a ground truth to that except what the average participant, average test-administrator, and average third-party bystander all think. (This thread is an instance of that "ground truth" getting hashed out)
If you ran this was a test on real humans, would the question be "how often do people cheat?" or "how often do people find clever out-of-the-box-solutions to problems?"?
Depends on context! Sometimes, a test like this is a DefCon sideroom game and clearly you are supposed to hack. Sometimes, a test like this is a morality test. Part of your job as a person going through life is to not merely follow the leading instructions of authority figures, but also to decide when those authority figures are wrong.
Did the Milgram Experiment test participants "merely follow instructions", or, did they (afaitk) "electrocute people to death and are guilty of attempted manslaughter?". Idk, reasonable people can argue about that. But, I think it is crazy to dismiss the results of the Milgram experiment merely because the authority figure said "the experiment must continue." Something is fucked up about all those people continuing with the experiment.
Somes morality tests are unfair. Sometimes life is unfair. I think the conditions of the test are realistic enough (given that lots of people will end up doing prompting to 'not stop no matter what') ((I have given that prompt several times. I haven't said "succeeding is the only thing that matters", but people who aren't weirdly scrupulous rationalists probably have))
Reasonable.
But, fwiw I think this is actually a particularly good time to talk about principles-as-they-interface-with-reality.
I think Palisade is a pretty good example of an org that AFAICT cares about as much about good group epistemics as it's possible to care about while still also being an advocacy org trying to get something done For Real. (and/or, I think it makes sense for LW-people to hold Palisade to whatever the highest standards they think are reasonable to hold it to while expecting them to also get something done)
So, I think this is a particularly good time to talk through "okay what actually are those standards, in real life practice?".
I am interested in you actually looking at the paper in question and saying which of these apply in this context.
I think "hacking" is just a pretty accurate word for the thing that happened here. I think in mainstream world it's connotations have a bit of a range, some of which is pretty inline with what's presented here.
If a mainstream teacher administers a test with a lateral-thinking solution that's similarly shaped to the solutions here, yes it'd be okay for the students to do the lateral-thinking-solution but I think the teacher would still say sentences like "yeah some of the students did the hacking solution" and would (correctly) eyeball those students suspiciously because they are the sort of students who clearly are gonna give you trouble in other circumstances.
Put another way, if someone asked "did o3 do any hacking to get the results?" and you said "no", I'd say "you lied."
And, I think this not just academic/pedantic/useful-propaganda. I think it is a real, important fact about the world that people are totally going to prompt their nearterm, bumbingly-agentic AIs in similar ways, while giving their AIs access to their command line, and it will periodically have consequences like this, and it's important to be modeling that.
Historically, I've gone in the reverse direction of "mostly, don't assign readings, just allocate the first hour to doing the reading and then talk, to avoid giving people more executive-function-heavy-work" (and, let people who actually did do the readings show up an hour late).
But, I have been pretty pleasantly surprised with how assigning readings has gone for the Lighthaven reading group, and sounds like it's also working for you. I think historically I haven't actually had a meetup where "do the reading" was a regular action as opposed to an occasional one-of so it didn't make as much sense to develop explicit culture around it.
It does seem nice to have a relatively minor, achievable "high standard" to help enculturate the broader idea of "have standards."