Tao Lin

Wiki Contributions


One small counterexample is conditional double 0 multiplication. This works in networks that have multiple elementwise multiplications in series (which you can get with GMLP or SwiGLU activation functions) like y=A*(B*X). If A and B are both 0, then there is no gradient on A or B because they independently have no effect on y, only together. If an apparatus to conditionally set A and B to 0 on some datapoints arose in a model, there would be no gradient towards removing it, and it could effectively filter what data is given to SGD.

I'm not against evaluating models in ways where they're worse than rocks, I just think you shouldn't expect anyone else to care about your worse-than-rock numbers without very extensive justification (including adversarial stuff)

I don't expect this to be possible in like 1.5 years, and expect it's difficult now. Stuff like "only allow users who have a long browser history & have bought things and such" (like reCapcha does) feels like the only way out besides "go to the office in person"

isn't this about generation vs classification, not language vs vision? 

I don't think this sort of prompt actually gets at the conscious reasoning gap. It only takes one attention head to copy the exact next token prediction made at a previous token, and I'd expect if you used few shot prompting (especially filling the entire context with few shot prompts), it would use its induction-like heads to just copy its predictions and perform quite well.

A better example would be to have the model describe its reasoning about predicting the next token, and then pass that to itself in an isolated prompt to predict the next token.

This is in line with my experience. However, the fact that this was an http server is important - I get far more value from copilot on JS http servers than other programs, and http servers are a class that has many no code options - how long would it take them if they were allowed to use pure SQL or a no-code solution?

Also, I think if you trained something to predict text, then RL trained it on inclusive genetic fitness as a human (or human motivation signals), its learning would be mostly in the space of "select specific human / subdistribution of humans to imitate" rather than learning behaviors specific to the task, and then its generalization properties would depend more on those humans than on the specific training setup used

Here's a plausible story to me: 

The model sees its environment + past actions, and its LM predictive modelling part puts non-neglible prob on "this the 'make humans smile' task". Then its language modelling prior predicts the next action, not based on the training setup, which it doesn't see, but based on the environment, and it outputs an action aimed at pressing the reward button. This action does well, is reinforced, and you get a reward-button-presser. 

Some context is that when training language models with RLHF, the language modelling prior tends to dominate over RL-learned behaviors on sub-distributions even after lots of RLHF training. 

Another version of this is "for many trajectories, an LM will be primarily predicting text, not executing rl-reinforced behaviors. Given this, actions that get reinforced are likely to come from the LM producing text that gets high reward in its reward model, rather than random actions"

Pretrained models don't need any exploration to know that pressing the reward button gets more reward than doing things the humans want. If you just ask GPT3, it'll tell you that. 

Then the only exploration the AI needs is to get reward after thinking about analogies between its situation and its textual knowledge of AI/reinforcement learning/AI doom scenarios. 

This applies especially much to simple/often discussed tasks such as making people smile - an LM has already heard of this exact task, so if it took an action based on the "make people smile task" its heard about, this could outperform other thought processes which are only conditioned on data so far.

Load More