People from OpenPhil, FTX FF and MIRI were not interested in discussing at the time. We also talked with MIRI about moderating, but it didn't work out in the end.
People from Anthropic told us their organization is very strict on public communications, and very wary of PR risks, so they did not participate in the end.
In the post I over generalized to not go into full details.
Yes, some people mentioned it was confusing to have two posts (I had originally posted two separate ones for Summary and Transcript due to them being very lengthy) so I merged them in one, and added headers pointing to Summary and Transcript for easier navigation.
Thanks, I was looking for a way to do that but didn't know the space in italics hack!
Another formatting question: how do I make headers and sections collapsible? It would be great to have the "Summary" and "Transcript" sections as collapsible, considering how long the post is.
I really don't think that AI dungeon was the source of this idea (why do you think that?)
We've heard the story from a variety of sources all pointing to AI Dungeon, and to the fact that the idea was kept from spreading for a significant amount of time. This @gwern Reddit comment, and previous ones in the thread, cover the story well.
And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn't part of ChatGPT or Codex or AlphaCode. What exactly is the story?
Regarding the effects of chain of thought prompting on progress, there's two levels of impact: first order effects and second order effects.
On first order, once chain of thought became public a large number of groups started using it explicitly to finetune their models.
Aside from non-public examples, big ones include PaLM, Google's most powerful model to date. Moreover, it makes models much more useful for internal R&D with just prompting and no finetuning.
We don’t know what OpenAI used for ChatGPT, or future models: if you have some information about that, it would be super useful to hear about it!
On second order: implementing this straightforwardly improved the impressiveness and capabilities of models, making them more obviously powerful to the outside world, more useful for customers, and leading to an increase in attention and investment into the field.
Due to compounding, the earlier these additional investments arrive, the sooner large downstream effects will happen.
This is also partially replying to @Rohin Shah 's question in another comment:
Why do you believe this "drastically" slowed down progress?
We'd maybe be at our current capability level in 2018, [...] the world would have had more time to respond to the looming risk, and we would have done more good safety research.
It’s pretty hard to predict the outcome of “raising awareness of problem X” ahead of time. While it might be net good right now because we’re in a pretty bad spot, we have plenty of examples from the past where greater awareness of AI risk has arguably led to strongly negative outcomes down the line, due to people channeling their interest in the problem into somehow pushing capabilities even faster and harder.
My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3.
We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.
Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion into models of scientific and technological progress, which we might cover in some future posts.
In general, we think you’re treating the current paradigm as over-determined. We don’t think that being in a DL-scaling language model large single generalist system-paradigm is a necessary trajectory of progress, rather than a historical contingency.
While the Bitter Lesson might be true and a powerful driver for the ease of working on singleton, generalist large monolithic systems over smaller, specialized ones, science doesn’t always (some might say very rarely!) follow the most optimal path.
There are many possible paradigms that we could be in, and the current one is among the worse ones for safety. For instance, we could be in a symbolic paradigm, or a paradigm that focuses on factoring problems and using smaller LSTMs to solve them. Of course, there do exist worse paradigms, such as a pure RL non-language based singleton paradigm.
In any case, we think the trajectory of the field got determined once GPT-2 and 3 brought scaling into the limelight, and if those didn’t happen or memetics went another way, we could be in a very very different world.
The "1000" instead of "10000" was a typo in the summary.
In the transcript Connor states "SLT over the last 10000 years, yes, and I think you could claim the same over the last 150". Fixed now, thanks for flagging!