I don’t understand the community obsession with Tao and recruiting him to work on alignment. This is a thing I hear about multiple times a year with no explanation of why it would be desirable other than “he’s famous for being very smart.”
I also don’t see why you’d think there’s be an opportunity to do this… it’s an online event, which heavily limits the ability to corner him in the hallway. It’s not even clear to me that you’d have an opportunity to speak with him… he’s moderating several discussions and panels, but any submitted questions to said events would go to the people actually in the discussions not the moderator.
Can you elaborate on what you’re actually thinking this would look like?
Red teaming has always been a legitimate academic thing? I don’t know what background you’re coming from but… you’re very far off.
But yes, the event organizers will be writing a paper about it and publishing the data (after it’s been anonymized).
What deployed LLM system does Tesla make that you think should be evaluated alongside ChatGPT, Bard, etc?
Hi, I’m helping support the event. I think that some mistranslation happened by a non-AI person. The event is about having humans get together and do prompt hacking and similar on a variety of models side-by-side. ScaleAI built the app that’s orchestrating the routing of info, model querying, and human interaction. Scale’s platform isn’t doing the evaluation itself. That’s being done by users on-site and then by ML and security researchers analyzing the data after the fact.
I think there's a mistake here which kind of invalidates the whole post. Ice cream is exactly the kind of thing we’ve been trained to like. Liking ice cream is very much the correct response.
Everything outside the training distribution has some value assigned to it. Merely the fact that we like ice cream isn’t evidence that something’s gone wrong.
I agree completely. This is a plausible explanation, but it’s one of many plausible explanations and should not be put forward as a fact without evidence. Unfortunately, said evidence is impossible to obtain due to OpenAI’s policies regarding access to their models. When powerful RLHF models begin to be openly released, people can start testing theories like this meaningfully.
Linear warm-up over the first 10% of training, then cosine decay to a minimum of one-tenth the peak LR which is set to occur at the end of training (300B tokens). Peak LRs vary by model but are roughly consistent with GPT-3 and OPT values. You can find all the config details on GitHub. The main divergence relevant to this conversation from mainstream approaches is that we use a constant batch size (2M) throughout scaling. Prior work uses batch sizes up to 10x smaller for the smallest models, but we find that we can train large batch small models without any problems. This enables us to achieve a substantial wall-clock speed-up for small models by throwing more GPUs at them. We continue to use this batch size for the 11B model for consistency, although the standard progression of batch sizes would encourage one of 3M or 4M by that point.
Checkpoint 20 and 40 are at 20k and 40k iterations respectively, and the entire training runs for 143k iterations. So they occur relatively shortly after the LR peaks, but don't coincide with anything I know to be particularly special.
This is really exciting work to see, and exactly the kind of thing I was hoping people would do when designing the Pythia model suite. It looks like you're experimenting with the 5 smallest models, but haven't done analysis on the 2.8B, 6.9B, or 12B models. Is that something you're planning on adding, or no?
I am really very surprised that the distributions don't seem to match any standard parameterized distribution. I was fully ready to say "okay, let's retrain some of the smaller Pythia models initialized using the distribution you think the weights come from" but apparently we can't do that easily. I suppose we can do a MCMC sampler? In general, it seems like a natural follow-up to the contents of this post is to change the way we initialize things in models, retrain them, and see what happens (esp. with the loss curve). If that's something you'd like to collaborate with EleutherAI about, I would be more than happy to arrange something :)
In general, the reliability of the things you're seeing across model scales is really cool. I agree that it seems to refute some of the theoretical assumptions of the NTK literature, but I wonder if perhaps it's consistent with the Tensor Programs work by Greg Yang et al. that lead to muP.
To clarify what's going on with the Pythia models:
This is excellent work, though I want to generically recommend caution when making assumptions about the success of such attacks based only on blackbox evaluations. Thorough analysis of false positive and false negative rates with ground-truth access (ideally in an adversarially developed setting) is essential for validation. [Sidebar: this reminds me that I really need to write up my analysis in the EleutherAI discord showing why prompt extraction attacks can be untrustworthy]
That said, this is really excellent work and I agree it looks quite promising.
It's extremely difficult to create a fraudulent company and get it listed on the NYSE. Additionally, the Exchange can and does stop trading on both individual stocks and the exchange as a whole, though due to the downstream effects on consumer confidence this is only done rarely.
I don't know what lessons one should learn from the stock market regarding MM, but I don't think we should rush to conclude MM shouldn't intervene or shouldn't be blamed for not intervening.