How do I embed the market directly into the comment, instead of having a link to which people click through?
So in evaluating that, the key question here is whether LLMs were on the critical path already.
Is it more like...
My guess is that the true answer is closest to the second option: LLMs happen a predictable-ish period ahead of AGI, in large part because they're impressive enough and generally practical enough to drive AGI development.
What this is suggesting to me is that if OpenAI didn't bet on LLMs, we effectively wouldn't have gotten more time to do alignment research, because most alignment research done before an understanding of LLMs would have been a dead end. And that actually solving alignment may require people who have internalized the paradigm shift represented by LLMs and figuring out solutions based on that. Under this model, even if we are in an insight-constrained world, OpenAI mostly hasn't burned away effective years of alignment research (because alignment research carried out before we had LLMs would have been mostly useless anyway).
Here's a paraphrase of the way I take you to be framing the question. Please let me know if I'm distorting it in my translation.
We often talk about 'the timeline to AGI' as a resource that can be burned. We want to have as much time as we can to prepare before the end. But that's actually not quite right. The relevant segment of time is not (from "as soon as we notice the problem" to "the arrival of AGI") it's (from "as soon as we can make real technical headway on the problem" to "the arrival of AGI"). We'll call that second time-segment "preparation time".
The development of LLMs maybe did bring the date of AGI towards us, but it also pulled forward the start of the "preparation time clock".In fact it was maybe feasible that the "preparation time" clock might have started only just before AGI, or not at all.
So all things considered, the impact of pulling the start time forward seems much larger than the impact of pulling the time of AGI forward.
How's that as a summary?
And how come the overwhelming majority of patients don't quit smoking when their doctor tells them to do so, but people often do quit smoking after they've personally experienced the negative consequences (eg had their first heart attack)?
It seems like the obvious answer is "because the experience of abstract words from their doctor isn't vivid enough to trigger the reinforcement machinery, but the experience of having a heart attack is."
I wrote the following comment during this AMA back in 2019, but didn't post it because of the reasons that I note in the body of the comment.
I still feel somewhat unsatisfied with what I wrote. I think something about the tone feels wrong, or gives the wrong impression, somehow. Or maybe this only presents part of the story. But it still seems better to say aloud than not.
I feel more comfortable posting it now, since I'm currently early in the process of attempting to build an organization / team that does meet these standards. In retrospect, I think probably it would have been better if I had just posted this at the time, and hashed out some disagreements with others in the org in this thread.
(In some sense this comment is useful mainly as bit of a window into the kind of standards that I, personally, hold a rationality-development / training organization to.)
My original comment is reproduced verbatim below (plus a few edits for clarity).
I feel trepidation about posting this comment, because it seems in bad taste to criticize a group, unless one is going to step up and do the legwork to fix the problem. This is one of the top 5 things that bothers me about CFAR, and maybe I will step up to fix it at some point, but I’m not doing that right now and there are a bunch of hard problems that people are doing diligent work to fix. Criticizing is cheap. Making things better is hard.
[edit 2023: I did run a year long CFAR instructor training that was explicitly designed to take steps on this class of problems though. It is not as if I was just watching from the sidelines. But shifting the culture of even a small org, especially from a non-executive role, is pretty difficult, and my feeling is that I made real progress in the direction that I wanted, but only about one twentieth of the way to what I would think is appropriate.]My view is that CFAR does not meaningfully eat its own dogfood, or at least doesn’t enough, and that this hurts the organization’s ability to achieve its goals.
This is not to contradict the anecdotes that others have left here, which I think are both accurate presentations, and examples of good (even inspiring) actions. But while some members of CFAR do have personal practices (with varying levels of “seriousness”) in correct thought and effective action, CFAR, as an institution, doesn’t really make much use of rationality. I resonate strongly with Duncan’s comment about counting up vs. counting down.
More specific data, both positive and negative:
- CFAR did spend some 20 hours of staff meeting time Circling in 2017, separately from a ~50 hour CFAR circling retreat the most of the staff participated in, and various other circling events that CFAR staff attended together (but were not “run by CFAR”).
- I do often observe people doing Focusing moves and Circling moves in meetings.
- I have observed occasional full explicit Double Crux conversations on the order of three or four times a year.
- I frequently (on the order of once every week or two) observe CFAR staff applying the Double Crux moves (offering cruxes, crux checking, operationalizing, playing the Thursday-Friday game) in meetings and in conversation with each other.
- Group goal-factoring has never happened, to the best of my knowledge, even though there are a number of things that happen at CFAR that seem very inefficient, seem like “shoulds”, or are frustrating / annoying to at least one person [edit 2023: these are explicit triggers for goal factoring]. I can think of only one instance in which two of us (Tim and I, specifically) tried to goal-factor something (a part of meetings that some of us hate).]
- We’ve never had an explicit group pre-mortem, to the best of my knowledge. There is the occasional two-person session of simulating a project (usually workshop or workshop activity), and the ways in which it goes wrong. [edit 2023: Anna said that she had participated in many long form postmortems regarding hiring in particular, when I sent her a draft of this comment in 2019.]
- There is no infrastructure for tracking predictions or experiments. Approximately, CFAR as an institution doesn’t really run [formal] experiments, at least experiments with results that are tracked by anything other than the implicit intuitions of the staff. [edit 2023: some key features of a "formal experiment" as I mean it are writing down predictions in advance, and having a specific end date at which the group reviews the results. This is in contrast to simply trying new ideas sometimes.]
- There is no explicit processes for iterating on new policies or procedures (such as iterating on how meetings are run).
- [edit 2023: An example of an explicit process for iterating on policies and procedures is maintaining a running document for a particular kind of meeting. Every time you have that kind of meeting, you start by referring to the notes from the last session. You try some specific procedural experiments, and then end the meeting with five minutes of reflection on what worked well or poorly, and log those in the document. This way you are explicitly trying new procedures and capturing the results, instead finding procedural improvements mainly by stumbling into them, and often forgetting improvements rather than integrating and building upon them. I use documents like this for my personal procedural iteration.
Or in Working Backwards, the authors describe not just organizational innovations that Amazon came up with to solve explicitly-noted organizational problems, but the sequence of iteration that led to those final form innovations.]- There is informal, but effective, iteration on the workshops. The processes that run CFAR’s internals however, seem to me to be mostly stagnant [edit 2023: in the sense that there's not deliberate intentional effort on solving long-standing institutional frictions, or developing more effective procedures for doing things.]
- As far as I know, there are no standardized checklists for employing CFAR techniques in relevant situations (like starting a new project). I wouldn’t be surprised if there were some ops checklists with a murphyjitsu step. I’ve never seen a checklist for a procedure at CFAR, excepting some recurring shopping lists for workshops.
- The interview process does not incorporate the standard research about interviews and assessment contained in Thinking, Fast and Slow. (I might be wrong about this. I, blessedly, don’t have to do admissions interviews.)
- No strategic decision or choice to undertake a project, that I’m aware of, has involved quantitative estimates of impact, or quantitative estimates of any kind. (I wouldn’t be surprised if the decision to run the first MSFP did, [edit 2023: but I wasn't at CFAR at the time. My guess is that there wasn't.])
- Historically, strategic decisions were made to a large degree by inertia. This is more resolved now, but for a period of several years, I think most of the staff didn’t really understand why we were running mainlines, and in fact when people [edit 2023: workshop participants] asked about this, we would say things like “well, we’re not sure what else to do instead.” This didn’t seem unusual, and didn’t immediately call out for goal factoring.
- There’s not designated staff training time for learning or practicing the mental skills, or for doing general tacit knowledge transfer between staff. However, Full time CFAR staff have historically had a training budget, which they could spend on whatever personal development stuff they wanted, at their own discretion.
- CFAR does have a rule that you’re allowed / mandated to take rest days after a workshop, since the workshop eats into your weekend.
Overall, CFAR strikes me as a mostly a normal company, populated by some pretty weird hippy-rationalists. There aren’t any particular standards that the employees are expected to use rationality techniques, nor institutional procedures for doing rationality [edit 2023: as distinct from having shared rationality-culture].
This is in contrast to say, Bridgewater associates, which is clearly structured intentionally to enable updating and information processing, on the organizational level. (Incidentally, Bridgewater is rich in the most literal sense.)
Also, I’m not fully exempt from these critiques myself: I have not really internalized goal factoring, yet, for instance, and think that I personally, am making the same kind of errors of inefficient action that I’m accusing CFAR of making. I also don’t make much use of quantitative estimates, and I have lots of empirical iteration procedures, but haven’t really gotten the hang of doing explicit experiments. (I do track decisions and predictions though, for later review.)
Overall, I think this gap is about due 10% “these tools don’t work as well, especially at the group level, as we seem to credit them, and we are correct to not use them”, about 30% to this being harder to do than it seems, and about 60% due to CFAR not really trying at this (and maybe it shouldn’t be trying at this, because there are trade offs and other things to focus on).
Elaborating on the 30%: I do think that making an org like this, especially when not starting from scratch, is deceptively difficult. I think that while implementing some of these seems trivial on the surface, but that it actually entails a shift in culture and expectations, and doing this effectively requires leadership and institution-building skills that CFAR doesn’t currently have. Like, if I imagine something like this existing, it would need to have a pretty in depth onboarding process for new employees, teaching the skills, and presenting “how we do things here.” If you wanted to bootstrap into this kind of culture, at anything like a fast enough speed, you would need the same kind of on-boarding for all of the existing employees, but it would be even harder, because you wouldn’t have the culture already going to provide example and immersion.
I think my biggest crux here is how much the development to AGI is driven by compute progress.
I think it's mostly driven by new insights plus trying out old, but expensive, ideas. So, I provisionally think that OpenAI has mostly been harmful, far in excess of it's real positive impacts.
Elaborating:
Compute vs. Insight
One could adopt a (false) toy model in which the price of compute is the only input to AGI. Once the price falls low enough, we get AGI. [a compute-constrained world]
Or a different toy model: When AGI arrives depends entirely on algorithmic / architectural progress, and the price of compute is irrelevant. In this case there's a number of steps on the "tech tree" to AGI, and the world takes each of those steps, approximately in sequence. Some of those steps are new core insights, like the transformer architecture or RLHF or learning about the chinchilla scaling laws, and others are advances in scaling, going from GPT-2 to GPT-3. [an insight-constrained world]
(Obviously both those models are fake. Both compute and architecture are inputs to AGI, and to some extent they can substitute for each other: you can make up for having a weaker algorithm with more brute force, and vis versa. But these extreme cases are easier for me, at least, to think about.)
In the fully compute-constrained world, OpenAI's capabilities work is strictly good, because it means we get intermediate products of AGI development earlier.
In this world, progress towards AGI is ticking along at the drum-beat of Moore's law. We're going to get AGI in 20XY. But because of OpenAI, we get GPT-3 and 4, which give us subjects for interpretability work, and gives the world a headsup about what's coming.
Under the compute-constraint assumption, OpenAI is stretching out capabilities development, by causing some of the precursor developments to happen earlier, but more gradually. AGI still arrives at 20XY, but we get intermediates earlier than we otherwise would have.
In the fully insight-constrained world, OpenAI's impact is almost entirely harmful. Under that model, Large Language Models would have been discovered eventually, but OpenAI made a bet on scaling GPT-2. That caused us to get that technology earlier, and also pulled forward the date of AGI, both by checking off one of the steps, and by showing what was possible and so generating counterfactual interest in transformers.
In this world, OpenAI might have other benefits, but they are at least doing the counterfactual harm of burning our serial time.
They don't get the credit for "sounding the alarm" by releasing ChatGPT, because that was on the tech tree already, that was going to happen at some point. Giving OpenAI credit for it, would be sort of the reverse of "shooting the messenger", where you credit someone for letting you know about a bad situation when they were the cause of the bad situation in the first place (or at least made it worse).
Again, neither of these models is correct. But I think our world is closer to the insight-constrained world than the compute-constrained world.
This makes me much less sympathetic to OpenAI.
Costs and Benefits
It doesn't settle the question, because maybe OpenAI's other impacts (many of which I agree are positive!) more than make up for the harm done by shortening the timeline to AGI.
In particular...
Overall, it currently seems to me that OpenAI is somewhat better than a random draw from the distribution of possible counterfactual AGI companies (maybe 90th percentile?). But also that they are not so much better that that makes up for burning 3 to 7 years of the timeline.
3 to 7 years is just my eyeballing how much later someone would have developed ChatGPT-like capabilities, if OpenAI hadn't bet on scaling up GPT-2 into GPT-3 and hadn't decided to to invest in RLHF, both moves that it looks to me like few orgs in the world were positioned to try, and even fewer actually would have tried in the near term.
That's not a very confident number. I'm very interested in getting more informed estimates of how long it would have taken for the world to develop something like ChatGPT without OpenAI.
(I'm selecting ChatGPT as the criterion, because I think that's the main pivot point at which the world woke up to the promise and power of AI. Conditional on someone developing something ChatGPT-like, it doesn't seem plausible to me that the world goes another three years without developing a language model as impressive as GPT-4. At that point developing bigger and better language models is an obvious thing to try, rather than an interesting bet that the broader world isn't much interested in.)
I'm also very interested if anyone thinks that the benefits (either ones that I listed or others) outweigh an extra 3 to 7 years of working on alignment (not to mention 3 to 7 years of additional years of life expectancy for all of us).
It is worth noting that at some point PaLM was (probably) the most powerful LLM in the world, and google didn't release it as a product.
But I don't think this is a very stable equilibrium. I expect to see a ChatGPT competitor from google before 2024 (50%) and before 2025 (90%).
That said, "a value-aligned, safety-conscious project comes close to building AGI before we do", really gives a lot of wiggle-room for deciding if some competitor is "a good guy". But, still better than the counterfactual.
Or more strictly, what it's about is that if you're dead, you can't achieve evolution's goals for you.