I do think it implies something about what is happening behind the scenes when their new flagship model is smaller and less capable than what was released a year ago.
I am surprised to hear this, especially “I don't think it has lasting value”. In my opinion, this post has aged incredibly well. Reading it now, knowing that the EA criticism contest utterly failed to do one iota of good with regards to stopping the giant catastrophe on the horizon (FTX), and seeing that the top prizes were all given to long, well-formatted essays providing incremental suggestions on heavily trodden topics while the one guy vaguely gesturing at the actual problem (https://forum.effectivealtruism.org/posts/T85NxgeZTTZZpqBq2/the-effective-...
Just to pull on some loose strings here, why was it okay for Ben Pace to unilaterally reveal the names of Kat Woods and Emerson Spartz, but not for Roko to unilaterally reveal the names of Alice and Chloe? Theoretically Ben could have titled his post, "Sharing Information About [Pseudonymous EA Organization]", and requested the mods enforce anonymity of both parties, right? Is it because Ben's post was first so we adopt his naming conventions as the default? Is it because Kat and Emerson are "public figures" in some sense? Is it because Alice and Chloe agr...
Wait, that link goes to an archive page from well after Chloe was hired. When I look back to the screen captures from the period of time that Chloe would have seen, there are no specific numbers given for compensation (would link them myself, but I’m on mobile at the moment).
If the ad that Chloe saw said $60,000 - $100,000 in compensation in big bold letters at the top, then that seems like a bait and switch, but the archives from late 2021 list travel as the first benefit, which seems accurate to what the compensation package actually was.
Maybe I'm projecting more economic literacy than I should, but anytime I read something like "benefits package worth $X", I always decompose it into its component parts mentally. A benefits package nominally worth $X will provide economic value less than $X, because there is option value lost compared to if you were given liquid cash instead.
The way I would conceptualize the compensation offered (and the way it is presented in the Nonlinear screenshots) is $1000/month + all expenses paid while traveling around fancy destinations with the family. I ki...
I did notice these. I specifically used the word "loadbearing" because almost all of these either don't matter much or their interpretation is entirely context-dependent. I focused on the salary bullet-point because failing to pay agreed salary is both
1. A big deal, and
2. Bad in almost any context.
The other ones that I think are pretty bad are the Adderall smuggling and the driving without a license, but my prior on "what is the worst thing the median EA org has done" is somewhere between willful licensing noncompliance and illegal amphetamine distribution.
Yeah, I've been going back and checking things as they were stated in the original "Sharing Information About Nonlinear" post. Rereading it, I was surprised at how few specific loadbearing factual claims there were at all. Lots of "vibes-based reasoning" as they say. I think the most damning single paragraph with a concrete claim was:
...
- Chloe’s salary was verbally agreed to come out to around $75k/year. However, she was only paid $1k/month, and otherwise had many basic things compensated i.e. rent, groceries, travel. This was supposed to make traveling togeth
I think what is bugging me about this whole situation is that there doesn't seem to be any mechanism of accountability for the (allegedly) false and/or highly misleading claims made by Alice. You seem to be saying something like, "we didn't make false and/or highly misleading claims, we just repeated the false and/or highly misleading claims that Alice told us, then we said that Alice was maybe unreliable," as if this somehow makes the responsibility (legal, ethical, or otherwise) to tell the truth disappear.
Here is what Ben said in his post, Closing...
Spencer sent us a screenshot about the vegan food stuff 2 hours before publication, which Ben didn't get around to editing in before the post went live, but that's all the evidence that I know about that you could somehow argue we had but didn't include. It is not accurate that Nonlinear sent credible signals of having counterevidence before the post went live
Uh, actually I do think that being sent screenshots showing that claims made in the post are false 2 hours before publication is a credible signal that Nonlinear has counterevidence.
I can’t believe...
This is a better response than I was expecting. Definitely a few non-sequiturs (Ex: you can’t just add travel expenses onto a $1000/month salary and call that $70,000-$75,000 in compensation. The whole point of money is that it’s fungible and can be spent however you like), but the major accusations appear refuted.
The tone is combative, but if the facts are what Nonlinear alleges then a combative tone seems… appropriate? I’m not sure how I feel about the “Sharing Information About Ben Pace” section, but I do think it was a good idea to mention the “elephant in the room” about Ben possibly white-knighting for Alice, since that’s the only way I can get this whole saga to make sense.
If the factions were Altman-Brockman-Sutskever vs. Toner-McCauley-D'Angelo, then even assuming Sutskever was an Altman loyalist, any vote to remove Toner would have been tied 3-3.
A 3-3 tie between the CEO founder of the company, the president founder of the company, and the chief scientist of the company vs three people with completely separate day jobs who never interact with rank-and-file employees is not a stable equilibrium. There are ways to leverage this sort of soft power into breaking the formal deadlock, for example: as we saw last week.
It reminds me of the loyalty successful generals like Caesar and Napoleon commanded from their men. The engineers building GPT-X weren't loyal to The Charter, and they certainly weren't loyal to the board. They were loyal to the projects they were building and to Sam, because he was the one providing them resources to build and pumping the value of their equity-based compensation.
I think it's almost always fine for criticized authors to defend themselves in the comments, even if their defense isn't very good.
In my original answers I address why this is not the case (private communication serves this purpose more naturally).
This stood out to me as strange. Are you referring to this comment?
...And regardless of these resources you should of course visit a nutritionist (even if very sporadically, or even just once when you start being vegan) so that they can confirm the important bullet points, whether what you're doing broadly works, and when you should worry about anything. (And again, anecdotically this has been strongly stressed and acknowledged as necessary by
The real reason why it's enraging is that it rudely and dramatically implies that Eliezer's time is much more valuable than the OP's
It does imply that, but it's likely true that Eliezer's time is more valuable (or at least in more demand) than OP's. I also don't think Eliezer (or anyone else) should have to spend all that much effort worrying about if what they're about to say might possibly come off as impolite or uncordial.
...If he actually wanted to ask OP what the strongest point was he should have just DMed him instead of engineering this public spectacl
It seems you and Paul are correct. I still think this suggests that there is something deeply wrong with RLHF, but less in the “intentionally deceives humans” sense, and more in the “this process consistently writes false data to memory” sense.
Perhaps I am misunderstanding Figure 8? I was assuming that they asked the model for the answer, then asked the model what probability it thinks that that answer is correct. Under this assumption, it looks like the pre-trained model outputs the correct probability, but the RLHF model gives exaggerated probabilities because it thinks that will trick you into giving it higher reward.
In some sense this is expected. The RLHF model isn't optimized for helpfulness, it is optimized for perceived helpfulness. It is still disturbing that "alignment" has made the model objectively worse at giving correct information.
My guess is that RLHF is unwittingly training the model to lie.
There are reasonable and coherent forms of moral skepticism in which the statement, "It is morally wrong to eat children and mentally disabled people," is false or at least meaningless. The disgust reaction upon hearing the idea of eating children is better explained by the statement, "I don't want to live in a society where children are eaten," which is much more well-grounded in physical reality.
What is disturbing about the example is that this seems to be a person who believes that objective morality exists, but that it wouldn't entail that eating children is wrong. This is indeed a red flag that something in the argument has gone seriously wrong.
I don't think we can engage in much "community-wide introspection" without discussing the object-level issues in question, and I can't think of a single instance of an online discussion of that specific issue going particularly well.
That's why I'm (mostly) okay tabooing these sorts of discussions. It's better to deal with the epistemic uncertainty than to risk converging on a false belief.