All of simeon_c's Comments + Replies

I think that yes it is reasonable to say that GPT-3 is obsolete. 
Also, you mentioned loads AGI startups being created in 2023 while it already happened a lot in 2022. How many more AGI startups do you expect in 2023? 

But I don't expect these kinds of understanding to transfer well to understanding Transformers in general, so I'm not sure it's high priority.

The point is not necessarily to improve our understanding of Transformers in general, but that if we're pessimistic about interpretability on dense transformers (like markets are, see below), we might be better off speeding up capabilities on architectures we think are a lot more interpretable.

3Fabien Roger1mo
I'm not saying that MoE are more interpretable in general. I'm saying that for some tasks, the high level view of "which expert is active when and where" may be enough to get a good sense of what is going on. In particular, I'm almost as pessimistic in finding "search", or "reward functions", or "world models", or "the idea of lying to a human for instrumental reasons" in MoEs as in regular Transformers. The intuition behind that is that MoE is about as useful when you want to do interp as the fact that there are multiple attention heads per Attention layer doing "different discrete things" (though they do things in parallel). The fact that there are multiple heads helps you a bit, but no that much. This is why I care about transferability of what you learn when it comes to MoEs. Maybe MoE + sth else could add some safeguards though (in particular, it might be easier to do targeted ablations on MoE than on regular Transformers), but I would be surprised if any safety benefit came from "interp on MoE goes brr".

The idea that EVERY governments are dumb and won't figure out a way which is not too bad to allocate their resources into AGI seems highly unlikely to me. There seems to be many mechanisms by which it could not be the case (e.g national defense is highly involved and is a bit more competent, the strategy is designed in collaboration with some competent people from the private sector etc.). 

To be more precise, I'd be surprised if no one of these 7 countries had an ambitious plan which meaningfully changed the strategic landscape post-2030: 

  • US 
  • Israel 
  • UK
  • Singapore
  • France
  • China 
  • Germany

I guess I'm a bit less optimistic on the ability of governments to allocate funds efficiently, but I'm not very confident in that. 

A fairly dumb-but-efficient strategy that I'd expect some governments to take is "give more money to SOTA orgs" or "give some core roles to SOTA orgs in your Manhattan Project". That seems likely to me and that would have substantial effects. 

2Donald Hobson1mo
They may well have some results. Dumping money on SOTA orgs just bumps compute a little higher. (and maybe data, if you are hiring lots of people to make data.) It isn't clear why SOTA orgs would want to be in a govmnt Manhatten project. It also isn't clear if any modern government retains the competence to run one. I don't expect governments to do either of these. You generated those strategies by sampling "dumb but effective" strategies. I tried to sample from "most of the discussion got massively side tracked into the same old political squabbles and distractions."

Unfortunately, good compute governance takes time. E.g., if we want to implement hardware-based safety mechanisms, we first have to develop them, convince governments to implement them, and then they have to be put on the latest chips, which take several years to dominate compute. 

This is a very interesting point. 

I think that some "good compute governance" such as monitoring big training runs doesn't require on-chip mechanisms but I agree that for any measure that would involve substantial hardware modifications, it would probably take a lot of ... (read more)

What I'm confident in is that they're more likely to be ahead than now or within a couple years. As I said, otherwise my confidence is ~35% by 2035 that China catches up (or become better), which is not huge? 

My reasoning is that they've been better at optimizing ~everything than the US mostly because of their centralization and norms (not caring too much about human rights helps optimizing) which is why I think it's likely that they'll catch up. 

Mostly because they have a lot of resources and thus can weigh a lot in the race once they enter it. 

2Donald Hobson1mo
Sure governments have a lot of resources. What they lack is the smarts to effectively turn those resources into anything. So maybe some people in government think AI is a thing, others think it's still mostly hype. The government crafts a bill. Half the money goes to artists put out of work by stable diffusion. A big section details insurance liability regulations for self driving cars. Some more funding is sent to various universities. A committee is formed. This doesn't change the strategic picture much.

Thanks for your comment! 

I see your point on fear spreading causing governments to regulate. I basically agree that if it's what happens, it's good to be in a position to shape the regulation in a positive way or at least try to. I still think that I'm more optimistic about corporate governance which seems more tractable than policy governance to me. 

The points you make are good, especially in the second paragraph. My model is that if scale is all you need, then it's likely that indeed smaller startups are also worrying. I also think that there could be visible events in the future that would make some of these startups very serious contenders (happy to DM about that). 

Having a clear map of who works in corporate governance and who works more towards policy would be very helpful. Is there anything like a "map/post of who does what in AI governance" or anything like that? 

3Koen.Holtman1mo
Thanks! I am not aware of any good map of the governance field. What I notice is that EA, at least the blogging part of EA, tends to have a preference for talking directly to (people in) corporations when it comes to the topic of corporate governance. As far as I can see, FLI is the AI x-risk organisation most actively involved in talking to governments. But there are also a bunch of non-EA related governance orgs and think tanks talking about AI x-risk to governments. When it comes to a broader spectrum of AI risks, not just x-risk, there are a whole bunch of civil society organisations talking to governments about it, many of them with ties to, or an intellectual outlook based on, Internet and Digital civil rights activism.

Have you read note 2? If note 2 was made more visible, would you still think that my claims imply a too high certainty? 

2konstantin1mo
I didn't read it, this clarifies a lot! I'd recommend making it more visible, e.g., putting it at the very top of the post as a disclaimer. Until then, I think the post implies unreasonable confidence, even if you didn't intend to.

I hesitated on decreasing the likelihood on that one based on your consideration to be honest, but I still think that 30% of having strong effects is quite a lot because as you mentioned it requires the intersection of many conditions. 

In particular, you don't mention which intervention you expect from them. If you take the intervention I took as a reference class ("Constrain labs to airgap and box their SOTA models while they train them”), do you think there are things that are as much or more "extreme" than this and that are likely? 

What might ... (read more)

3Koen.Holtman1mo
I think you are ignoring the connection between corporate governance and national/supra-national government policies. Typically, corporations do not implement costly self-governance and risk management mechanisms just because some risk management activists have asked them nicely. They implement them if and when some powerful state requires them to implement them, requires this as a condition for market access or for avoiding fines and jail-time. Asking nicely may work for well-funded research labs who do not need to show any profitability, and even in that special case one can have doubts about how long their do-not-need-to-be-profitable status will last. But definitely, asking nicely will not work for your average early-stage AI startup. The current startup ecosystem encourages the creation of companies that behave irresponsibly by cutting corners. I am less confident than you are that Deepmind and OpenAI have a major lead over these and future startups, to the point where we don't even need to worry about them. It is my assessment that, definitely in EA and x-risk circles, too few people are focussed on national government policy as a means to improve corporate governance among the less responsible corporations. In the case of EA, one might hope that recent events will trigger some kind of update.

Thanks for your comment! 

First, you have to have in mind that when people are talking about "AI" in industry and policymaking, they usually have mostly non-deep learning or vision deep learning techniques in mind simply because they mostly don't know the ML academic field but they have heard that "AI" was becoming important in industry. So this sentence is little evidence that Russia (or any other country) is trying to build AGI, and I'm at ~60% Putin wasn't thinking about AGI when he said that. 

If anyone who could play any role at all in develop

... (read more)
2Karl von Wendt1mo
As you point out yourself, what makes people interested in developing AGI is progress in AI, not the public discussion of potential dangers. "Nobody cared about" LLMs is certainly not true - I'm pretty sure the relevant people watched them closely. That many people aren't concerned about AGI or doubting its feasibility by now only means that THOSE people will not pursue it, and any public discussion will probably not change their minds. There are others who think very differently, like the people at OpenAI, Deepmind, Google, and (I suspect) a lot of others who communicate less openly about what they do. I don't think you can easily separate the scientific community from the general public. Even scientific papers are read by journalists, who often publish about them in a simplified or distorted way. Already there are many alarming posts and articles out there, as well as books like Stuart Russell's "Human Compatible" (which I think is very good and helpful), so keeping the lid on the possibility of AGI and its profound impacts is way too late (it was probably too late already when Arthur C. Clarke wrote "2001 - A Space Odyssey"). Not talking about the dangers of uncontrollable AI for fear that this may lead to certain actors investing even more heavily in the field is both naive and counterproductive in my view. I will definitely publish it, but I doubt very much that it will have a large impact. There are many other writers out there with a much larger audience who write similar books. I'm currently in the process of translating it to English so I can do just that. I'll send you a link as soon as I'm finished. I'll also invite everyone else in the AI safety community (I'm probably going to post an invite on LessWrong). Concerning the Putin quote, I don't think that Russia is at the forefront of development, but China certainly is. Xi has said similar things in public, and I doubt very much that we know how much they currently spend on training their AIs. The quo

[Cross-posting my answer]
Thanks for your comment! 
That's an important point that you're bringing up. 

My sense is that at the movement level, the consideration you bring up is super important. Indeed, even though I have fairly short timelines, I would like funders to hedge for long timelines (e.g.  fund stuff for China AI Safety). Thus I think that big actors should have in mind their full distribution to optimize their resource allocation. 

That said, despite that, I have two disagreements: 

  1. I feel like at the individual level (i.e.
... (read more)

To get a better sense of people's standards' on "cut at the hard core of alignment", I'd be curious to hear examples of work that has done so.

It would be worth paying someone to do this in a centralized way:

  1. Reach out to authors
  2. Convert to LaTeX, edit
  3. Publish

If someone is interested in doing this, reach out to me (campos.simeon @gmail.com)

Do you think we could use grokking/current existing generalization phenomena (e.g induction heads) to test your theory? Or do you expect the generalizations that would lead to the sharp left turn to be greater/more significant than those that occurred earlier in the training? 

Thanks for trying! I don't think that's much evidence against GPT3 being a good oracle though, bc to me it's pretty normal that without fine-tuning he's not able to forecast. He'd need to be extremely sample efficient to be able to do that. Does anyone want to try fine-tuning?


Cost: You have basically 3 months free with GPT3 Davinci (175B) (under a given limit but which is sufficient for personal use) and then you pay as you go. Even if you use it a lot, you're likely to pay less than 5$ or 10$ per months. 
And if you have some tasks that need a lot of tokens but that are not too hard (e.g hard reading comprehension), Curie (GPT3 6B) is often enough and is much cheaper to use!

In few-shot settings (i.e a setting in which you show examples of something so that it reproduces it), Curie is often very good so it's worth trying it... (read more)

Thanks for the feedback! I will think about it and maybe try to do something along those lines!

Are there existing models for which we're pretty sure we know all their latent knowledge ? For instance small language models or something like that.

1Ajeya Cotra1y
[Paul/Mark can correct me here] I would say no for any small-but-interesting neural network (like small language models); I think like, linear regressions where we've set the features it's kind of a philosophical question (though I'd say yes). In some sense, ELK as a problem only even starts "applying" to pretty smart models (ones who can talk including about counterfactuals / hypotheticals, as discussed in this appendix [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.lin21swvfo3] .) This is closely related to how alignment as a problem only really starts applying to models smart enough to be thinking about how to pursue a goal.

Thanks for the answer! The post you mentioned indeed is quite similar!

Technically, the strategies I suggested in my two last paragraphs (Leverage the fact that we're able to verify solutions to problems we can't solve + give partial information to an algorithm and use more information to verify) should enable to go far beyond human intelligence / human knowledge using a lot of different narrowly accurate algorithms. 

And thus if the predictor has seen many extremely (narrowly) smart algorithms, it would be much more likely to know what is it like to be... (read more)

2Ajeya Cotra1y
I think this is roughly right, but to try to be more precise, I'd say the counterexample is this: * Consider the Bayes net that represents the upper bound of all the understanding of the world you could extract doing all the tricks described (P vs NP, generalizing from less smart to more smart humans, etc). * Imagine that the AI does inference in that Bayes net. * However, the predictor's Bayes net (which was created by a different process) still has latent knowledge that this Bayes net lacks. * By conjecture, we could not have possibly constructed a training data point that distinguished between doing inference on the upper-bound Bayes net and doing direct translation.

You said that naive questions were tolerated so here’s a scenario I can’t figure out why it wouldn’t work.

It seems to me that the fact that an AI fails to predict the truth (because it predicts as humans would) is due to the fact that the AI has built an internal model of how humans understand things and predict based on that understanding. So if we assume that an AI is able to build such an internal model, why wouldn’t we train an AI to predict what a (benevolent) human would say given an amount of information and a capacity to process information ? Doing... (read more)

1Ajeya Cotra1y
This proposal has some resemblance to turning reflection up to 11 [https://ai-alignment.com/turning-reflection-up-to-11-1bd6171afd21], and the key question you raise is the source of the counterexample in the worst case: Because ARC is living in "worst-case" land, they discard a training strategy once they can think of any at-all-plausible situation in which it fails, and move on to trying other strategies. In this case, the counterexample would be a reporter that answers questions by doing inference in whatever Bayes net corresponds to "the world-understanding that the smartest/most knowledgeable human in the world" has; this understanding could still be missing things that the prediction model knows. This is closely related to the counterexample "Gradient descent is more efficient than science" [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.kd79zkls9g5o] given in the report.

I think that "There are many talented people who want to work on AI alignment, but are doing something else instead." is likely to be true. I met at least 2 talented people who tried to get into AI Safety but who weren't able to because open positions / internships were too scarce. One of them at least tried hard (i.e applied for many positions and couldn't find one (scarcity), despite the fact that he was one of the top french students in ML). If there was money / positions, I think that there are chances that he would work on AI alignment independently.
Connor Leahy in one of his podcasts mentions something similar aswell.

That's the impression I have.

5adamShimi1y
I want to point out that cashing out "talented" might be tricky. My observation is that talent for technical alignment work is not implied/caused by talent in maths and/or ML. It's not bad to have any of this, but I can think of many incredible people in maths/ML I know who seem way less promising to me than some person with the right mindset and approach.