Wiki Contributions


I don't think it's fair to compare parameter sizes between language models and models for other domains, such as games or vision. E.g., I believe AlphaZero is also only in the range of hundreds of millions of parameters? (quick google didn't give me the answer)

I think there is a real difference between adversarial and natural distribution shifts, and without adversarial training, even large network struggle with adversarial shifts. So I don't think this is a problem that would go away with scale alone. At least I don't see evidence for it from current data (failure of defenses for small models is no evidence of success of size alone for larger ones).

One way to see this is to look at the figures in this plotting playground of "accuracy on the line".  This is the figure for natural distribution shift - the green models are the ones that are trained with more data, and they do seem to be "above the curve" (significantly so for CLIP, which are the two green dots reaching ~ 53 and ~55 natural distribution accuracy compared to ~60 and ~63 vanilla accuracy

In contrast, if you look at adversarial perturbations, then you can see that actual adversarial training (bright orange) or other robustness interactions (brown) is much more effective than more data (green) which in fact mostly underperform. 


(I know you focused on "more model" but I think to first approximation "more model" and "more data" should have similar effects.)

Will read later the links - thanks! I confess I didn’t read the papers (though saw a talk partially based on the first one which didn’t go into enough details for me to know the issues) but also heard from people that I trust of similar issues with Chess RL engines (can be defeated with simple strategies if you are looking for adversarial ones). Generally it seems fair to say that adversarial robustness is significantly more challenging than the non adversarial case and it does not simply go away on its own with scale (though some types of attacks are automatically motivated with diversity of training data / scenarios).

Thank you! I think that what we see right now is that as the horizon grows, the more "tricks" we need to make end-to-end learning works, to the extent that it might not really be end to end. So while supervised learning is very successful, and seems to be quite robust to choice of architecture, loss functions, etc., in RL we need to be much more careful, and often things won't work "out of the box" in a purely end to end fashion.


I think the question would be how performance scales with horizon, if the returns are rapidly diminishing, and the cost to train is rapidly increasing (as might well be the case because of diminishing gradient signals, and much smaller availability of data),  then it could be that the "sweet spot" of what is economical to train would remain at a reasonably short horizon (far shorter than the planning needed to take over the world) for a long time. 

Can you send links? In any case I do believe that it is understood that you have to be careful in a setting where you have two models A and B, where B is a "supervisor" of the output of A, and you are trying to simultaneously teach B to come up with good metric to judge A by, and teach A to come up with outputs that optimize B's metric.  There can be equilibriums where A and B jointly diverge from what we would consider "good outputs". 

This for example comes up in trying to tackle "over optimization" in instructGPT (there was a great talk by John Schulman in our seminar series a couple of weeks ago), where model A is GPT-3, and model B tries to capture human scores for outputs. Initially, optimizing for model B induces optimizing for human scores as well, but if you let model A optimize too much, then it optimizes for B but becomes negatively correlated with the human scores (i.e., "over optimizes"). 

Another way to see this issue is even for powerful agents like AlphaZero are susceptible to simple adversarial strategies that can beat them:  see "Adversarial Policies Beat Professional-Level Go AIs" and "Are AlphaZero-like Agents Robust to Adversarial Perturbations?".  

The bottom line is that I think we are very good at optimizing any explicit metric , including when that metric is itself some learned model.  But generally, if we learn some model  s.t. , this doesn't mean that if we let  then it would give us an approximate maximizer of   as well. Maximizing  would tend to push to the extreme parts of the input space, which would be exactly those where  deviates from .

The above is not an argument against the ability to construct AGI as well, but rather an argument for establishing concrete measurable goals that our different agents try to optimize, rather than trying to learn some long-term equilibrium. So for example, in the software-writing and software-testing case, I think we don't simply want to deploy two agents A and B playing a zero-sum game where B's reward is the number of bugs found in A's code.

Hi Vanesssa,

Perhaps given my short-term preference, it's not surprising that I find it hard to track very deep comment threads, but let me just give a couple of short responses.

I don't think the argument on hacking relied on the ability to formally verify systems. Formally verified systems could potentially skew the balance of power to the defender side, but even if they don't exist, I don't think balance is completely skewed to the attacker. You could imagine that, like today, there is a "cat and mouse" game, where both attackers and defenders try to find "zero day vulnerabilities" and exploit (in one case) or fix (in the other). I believe that in the world of powerful AI, this game would continue, with both sides having access to AI tools, which would empower both but not necessarily shift the balance to one or the other. 

I think the question of whether a long-term planning agent could emerge from short-term training is a very interesting technical question!  Of course we need to understand how to define "long term" and "short term" here.  One way to think about this is the following: we can define various short-term metrics, which are evaluable using information in the short-term, and potentially correlated with long-term success. We would say that a strategy is purely long-term if it cannot be explained by making advances on any combination of these metrics.

My forecast would be that an AI that operates autonomously for long periods would be composed of pieces that make human-interpretable progress in the short term. For example, a self-driving car will be able to eventually to drive to New York to Los Angeles, but I believe it would do so by decomposing the task into many small tasks of getting from point A to B. It would not do so by sending it out to the world (or even a simulated world) and repeatedly playing a game where it gets a reward if it reaches Los Angeles, and gets nothing if it doesn't.

Quick comment (not sure it's realted to any broader points): total compute for N models with 2M parameters is roughly 4NM^2 (since per Chinchilla, number of inference steps scales linearly with model size, and number of floating point operations also scales linearly,  see also my calculations here). So an equal total compute cost would correspond to k=4.

What I was thinking when I said "power" is that it seems that in most BIG-Bench scales, if you put the y axis some measure of performance (e.g. accuracy) then it seems to scale as some linear or polynomial way in the log of parameters, and indeed I belive the graphs in that paper usually have log parameters in the X axis. It does seem that when we start to saturate performance (error tends to zero), the power laws kick in, and its more like inverse polynomial in the total number of parameters than their log.

Thanks! Some quick comments (though I think at some point we are getting to deep in threads that it's hard to keep track..)


  1. When saying that GAN training issues are "well understood" I meant that it is well understood that it is a problem, not that it's well understood how to solve that problem... 
  2. One basic issue is that I don't like to assign probabilities to such future events, and am not sure there is a meaningful way to distinguish between 75% and 90%.  See my blog post on longtermism
  3. The general thesis is that when making long-term strategies, we will care about improving concrete metrics rather than thinking of very complex strategies that don't make any measurable gains in the short term. So an Amazon Engineer would need to say something like "if we implement my code X then it would reduce latency by Y", which would be a fairly concrete and measurable goal and something that humans could understand even if they couldn't understand the code X itself or how it came up with it. This differs from saying something like "if we implement my code X, then our competitors would respond with X', then we could respond with X'' and so on and so forth until we dominate the market"
  4. When thinking of AI systems and their incentives, we should separate training, fine tuning, and deployment. Human engineers might get bonuses for their performance on the job, which corresponds to mixing "fine tuning" and "deployments". I am not at all sure that would be a good idea for AI systems. It could lead to all kinds of over-optimization issues that would be clear for people without leading to doom. So we might want to separate the two and in some sense keep the AI disinterested about the code that it actually uses in deployment.

Thanks for so many comments! I do plan to read them carefully and respond, but it might take me a while. In the meantime, Scott Aaronson also has a relevant blog

Happy thanksgiving to all who celebrate it!

I do not claim that AI cannot set long-term strategies.  My claim is that this is not where AI's competitive advantages over humans will be. I could certainly imagine that a future AI would be 10 times better than me in proving mathematical theorems.  I am not at all sure it would be 10 times better than Joe Biden in being a U.S. president, and mostly it is because I don't think that the information-processing capabilities are really the bottleneck for that job. (Though certainly, the U.S. as a whole, including the president, would benefit greatly from future AI tools, and it is quite possible that some of Biden's advisors would be replaced by AIs.)

Load More