faul_sname

Wikitag Contributions

Comments

Sorted by

This still suffers from the incentive gradient pushing quite hard to just build end-to-end agents. Not only will it probably work better, but it'll be straight up cheaper and easier!

The same is true of human software developers - your dev team sure can ship more features at a faster cadence if you give them root on your prod servers and full read and write access to your database. However, despite this incentive gradient, most software shops don't look like this. Maybe the same forces that push current organizations to separate out the person writing the code from the person reviewing it could be repurposed to software agents.

One bottleneck, of course, is that one reason it works with humans is that we have skin in the game - sufficiently bad behavior could get us fired or even sued. Current AI agents don't currently have anything to gain from behaving well or lose from behaving badly (or sufficient coherence to talk about "an" AI agent doing a thing).

Yeah, openai/guided-diffusion is basically that. Here's an example colab which uses CLIP guidance to sample openai/guided-diffusion (not mine, but I did just verify that the notebook still runs)

The short answer is "mode collapse" (same author as Simulators and also of generative.ink, where that LLM-generated HPMOR continuation I linked came from).

My best crack at the medium answer in 15 minutes or less is:

The base model is tuned to predict the next token of text drawn from its training distribution, which means that sampling from the base model will produce text which resembles its training corpus by any statistical measure the model has learned (with 405b that is effectively "any statistical test you can think of", barring specifically chosen adversarial ones).

My mental model of base models (one that I think is pretty well supported empirically) is that they are giant bags of contextually activated heuristics. Some such heuristics are very strong and narrow ("if the previous tokens were "Once upon a", we are at 4th word of fairy tail, when 4th word of fairy tale, output " time"), and some are wide and weak ("French words tend to be followed by other French words"). And these heuristics are almost exclusively where the model gets its capabilities (there are a few weird exceptions like arithmetic and date munging).

Instruct- or chat-tuned models have secondary objectives that are not simply "predict the next token". My mental model is that RL is extremely lazy, and will tend to chisel in the simplest possible behavior into the model which causes decreased loss / increased reward. One extremely simple behavior is "output a memorized sequence of text". This behavior is also very discoverable and easy to reinforce - most of the update just needs to be "get the model to output the first couple tokens of the memorized sequence". There's a variant "fill in the mad lib" that is also quite easy to discover.

And so unless you make specific efforts to prevent it, doing RL on an LLM will give you a model which consistently falls into a few attractors. This is really hard to prevent - even if your base model and your RL'd model have almost identical logprobs for almost all low-perplexity prefixes you can still fall into these attractors (once you're in one of these attractors you're no longer looking at text which the base model is trained to predict super accurately).

The very long answer I would like to give at some point involves seeing how few token substitutions in base model output it takes to convert those outputs into something that looks almost identical to a given chat-tuned model - in other words, have a base and a chat model provide completions for a given prompt, and then replace the base model output token with the chat model output token at the single highest KL divergence position and resample the base model, and repeat.

Can there be a generative AI whose output has non-averageness on all levels, in the same proportions as human-generated content?

Isn't this called a "base model"?

If you say to your favorite chat-tuned LLM "write me a poem about a hamster driving a jeep", it'll say something like "Sure, here's a poem about a hamster driving a jeep: <the most generic imaginable poem about a hamster driving a jeep>". If you prompt a base model like llama-3.1-405b base with "Write me a poem about a hamster driving a jeep" you'll get back whatever text was most likely to follow that sentence.

That could be something like the next question on a hallucinated exam

Llama-3.1-405b-base (hyperbolic) writes the next question in an exam that asked for a poem about hamsters

Make plain the unique empirical epistemology of Charlotte Bronte's Godmother, who famously looked like her dead sister, using the same method Dickens used to describe Emma, to make of her the myth of a real saint of Heaven, "one with the sun and moon, by himself".

It might be a free-verse "poem" which appears to be the ravings of a madman

Llama-3.1-405b-base writes a free-verse poem about a hamster

There once was a hamster quiet and simple Who life as a highway jeep driver did want to jump in Oh, that's a terrible idea! Driving a get car at high speeds, On sharp turns and tarmac. Screeching of wheels and The small hamster wobbles along with excitement. In the passenger's seat, she begins to cry. Teeth grinding in the seat as her eyes begin to shake. As he pulls into the boots-pitched alley, The hamster said, "I can do this!" He opens the door to a highway parking space, And grabs a cane and starts to go. "She will make it," she said, "like a screaming wolf." The hamster did not see anything, As she tries to scream. She looks with puzzlement and terror in her eyes. And goes to the left. She doesn't go She starts to turn to the right, And breaks into a huge smile. She doesn't go She turns to the left. She turns to the right. She doesn't go

It might even write some low quality actual rhyming poetry

Llama-3.1-405b-base writes poetry like a 4th grader

Hamsters driving jeeps, That's something new, They're small and furry, But they'll take you too.

They'll drive you to the store, And pick up some food, Then zoom down the road, Just like they're supposed to

It is possible to write fiction this way, using lots of rollouts at every point and then sampling the best one. There's even dedicated software for doing this. But the workflow isn't as simple as "ask your favorite LLM chatbot".

Ah, yep. Thanks!

You have to dig for it on nysenate.gov but you can also find it there: the most recent version of this is A6453B not A6453A. Not sure why the "download bill full text" links to the first version of a bill rather than the most up-to-date one.

6. “Frontier model” means either of the following: (a) an artificial intelligence model trained using greater than 10^26 computational operations (e.g., integer or floating-point operations), the compute cost of which exceeds one hundred million dollars; OR (b) an artificial intelligence model produced by applying knowledge distillation to a frontier model as defined in paragraph (a) of this subdivision, provided that the compute cost for such model produced by applying knowledge distillation exceeds five million dollars.

Where did you go to see the version of the bill with the bolded part? Looking at the bill here, which is the only link to the text of the bill I see on nysenate.gov, I see the definition as

"Frontier model" means either of the following: (a) an artificial intelligence model trained using greater than 10º26 computational operations (e.g., integer or floating-point operations), the compute cost of which exceeds one hundred million dollars; or (b) an artificial intelligence model produced by applying knowledge  distillation to a frontier model as defined in paragraph (a) of this subdivision.

The "where the distillation costs at least $5M" seems very important to have the bill not affect e.g. a hedge fund that has trained $100M of specialized models, at least one of which cost $5M, and then separately has had an intern spend a couple hundred dollars distilling the llama 4 behemoth model (if that one happens to be over the 10^26 mark, which it is if it's as overtrained as the rest of the llama series)

FWIW I did not see any high-valur points made on Twitter that were not also made on HN.

Oh, one more source for that one though - there was some coverage on the Complex Systems podcast - the section titled "AI's impact on reverse engineering" (transcript available at that URL).

There was a good discussion on hacker news, and there was quite a bit of posting of highly variable quality on xitter.

O3 is famously good enough to find zero days by itself.

Mechanistically, how would 2 work? The behavior we're seeing is that models are sometimes determining what the reward signal is and optimizing for that, and then the RL training process they are undergoing reinforces the trajectories that did that, chiseling reward-hacking behavior into the models. Are you expecting that we'll find a way to do gradient descent on something that can do also make architectural updates to itself?

Load More