All of Andrea_Miotti's Comments + Replies

Thanks for the kind feedback! Any suggestions for a more interesting title?

Sorry, I just forgot to answer this until now. I think the issue is that the title doesn't make it clear how different the UK's proposal is from say the stuff that the "labs" negotiated with the US. "UK seems to be taking a hard line on foundation model training", or something?
1[comment deleted]2mo

Palantir's recent materials on this show that they're using three (pretty small for today frontier's standards) open source LLMs: Dolly-v2-12B, GPT-NeoX-20B, and Flan-T5 XL.


I think there's a good chance that they also have bigger models but the bigger models are classified. 

Apologies for the 404 on the page, it's an annoying cache bug. Try to hard refresh your browser page (CMD + Shift + R) and it should work.

Works now. Thanks!
I am afraid, this is a more persistent problem (or, perhaps, it comes and goes, but I am even trying browsers I don't normally use (in addition to hard reload on those I do normally use), and it still returns 404). I'll be testing this further occasionally... (You might want to check whether anyone else who does not have privileged access to your systems is seeing it at the moment; some systems like, for example, GitHub often show 404 to people who don't have access to an actually existing file instead of showing 403 as one would normally expect.)

The "1000" instead of "10000" was a typo in the summary.

In the transcript Connor states "SLT over the last 10000 years, yes, and I think you could claim the same over the last 150". Fixed now, thanks for flagging!

1Martín Soto5mo
Yes, that's what I meant, sorry for not making it clearer!

Which one? All of them seem to be working for me.

Sure, seem to me all links working too
2David Scott Krueger (formerly: capybaralet)6mo
works for me too now

Pessimism of the intellect, optimism of the will.

Calibration of the intellect, optimism of the will.

People from OpenPhil, FTX FF and MIRI were not interested in discussing at the time. We also talked with MIRI about moderating, but it didn't work out in the end.

People from Anthropic told us their organization is very strict on public communications, and very wary of PR risks, so they did not participate in the end.

In the post I over generalized to not go into full details.

Yes, some people mentioned it was confusing to have two posts (I had originally posted two separate ones for Summary and Transcript due to them being very lengthy) so I merged them in one, and added headers pointing to Summary and Transcript for easier navigation.

Thanks, I was looking for a way to do that but didn't know the space in italics hack!

Another formatting question: how do I make headers and sections collapsible? It would be great to have the "Summary" and "Transcript" sections as collapsible, considering how long the post is.

Also: * And so, Elisa, you've been tapped into the world of AI * And Scott Aronson, who at the time was off on complexity theory * Don't Look Up should logically be capitalized?

I really don't think that AI dungeon was the source of this idea (why do you think that?)

We've heard the story from a variety of sources all pointing to AI Dungeon, and to the fact that the idea was kept from spreading for a significant amount of time. This @gwern Reddit comment, and previous ones in the thread, cover the story well.

And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn't part of

... (read more)

We'd maybe be at our current capability level in 2018, [...] the world would have had more time to respond to the looming risk, and we would have done more good safety research.

It’s pretty hard to predict the outcome of “raising awareness of problem X” ahead of time. While it might be net good right now because we’re in a pretty bad spot, we have plenty of examples from the past where greater awareness of AI risk has arguably led to strongly negative outcomes down the line, due to people channeling their interest in the problem into somehow pushing capabilities even faster and harder.

My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3.

We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.

Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion int... (read more)

1. Fully agree and we appreciate you stating that.

2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers t... (read more)

I think the best argument on "Accelerating capabilities is good", is that it forces you to touch reality instead of theorizing, and given that iteration is good for other fields, we need to ask why we think AI safety is resistant to iterative solutions. And this in a nutshell is why ML/AGI people can rationally increase capabilities: LW has a non-trivial chance of having broken epistemics, and AGI people do tend towards more selfish utility functions.

Good point, and I agree progress has been slower in robotics compared to the other areas.

I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.

Was Dario Amodei not the former head of OpenAI’s safety team?

He wrote "Concrete Problems in AI Safety".

I don't see how the claim isn't just true/accurate.

If someone reads "Person X is Head of Safety", they wouldn't assume that the person led the main AI capabilities efforts of the company for the last 2 years.

Only saying "head of the safety team" implies that this was his primary activity at OpenAI, which is just factually wrong. 

According to his LinkedIn, from 2018 until end of 2020, when he left, he was Director of Research and then VP of Research o... (read more)

5Swimmer963 (Miranda Dixon-Luinenburg) 7mo
I do think it's fair to consider the work on GPT-3 a failure of judgement and a bad sign about Dario's commitment to alignment, even if at the time (also based on LinkedIn) it sounds like he was also still leading other teams focused on safety research.  (I've separately heard rumors that Dario and the others left because of disagreements with OpenAI leadership over how much to prioritize safety, and maybe partly related to how OpenAI handled the GPT-3 release, but this is definitely in the domain of hearsay and I don't think anything has been shared publicly about it.) 

Your graph shows "a small increase" that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That's not small! That's as much progress as everyone else combined achieves in a third of the time till catastrophic models! It feels like you'd have to figure out some newer efficient training that allows you to get GPT-3 levels of performance with GPT-2 levels of compute to have an effect that was plausibly that large.

In general I wish you would actually write down equations for your mod

... (read more)
7Rohin Shah7mo
I continue to think that if your model is that capabilities follow an exponential (i.e. dC/dt = kC), then there is nothing to be gained by thinking about compounding. You just estimate how much time it would have taken for the rest of the field to make an equal amount of capabilities progress now. That's the amount you shortened timelines by; there's no change from compounding effects. Two responses: 1. Why are you measuring value in dollars? That is both (a) a weird metric to use and (b) not the one you had on your graph. 2. Why does the discovery have the same value now vs later?

I think this is pretty false. There's no equivalent to Let's think about slowing down AI, or a tag like Restrain AI Development (both of which are advocating an even stronger claim than just "caution") -- there's a few paragraphs in Paul's post, one short comment by me, and one short post by Kaj. I'd say that hardly any optimization has gone into arguments to AI safety researchers for advancing capabilities. 
(I agree in the wider world there's a lot more optimization for arguments in favor of capabilities progress that people in general would fin

... (read more)
7Rohin Shah7mo
1. Your post is titled "Don't accelerate problems you're trying to solve". Given that the problem you're considering is "misalignment", I would have thought that the people trying to solve the problem are those who work on alignment. 2. The first sentence of your post is "If one believes that unaligned AGI is a significant problem (>10% chance of leading to catastrophe), speeding up public progress towards AGI is obviously bad." This is a foundational assumption for the rest of your post. I don't really know who you have in mind as these other people, but I would guess that they don't assign >10% chance of catastrophe. 3. The people you cite as making the arguments you disagree with are full-time alignment researchers. If you actually want to convey your points to some other audience I'd recommend making another different post that doesn't give off the strong impression that it is talking to full-time alignment researchers. I agree that status and "what my peers believe" determine what people do to a great extent. If you had said "lots of alignment researchers are embedded in communities where capabilities work is high-status; they should be worried that they're being biased towards capabilities work as a result", I wouldn't have objected. You also point out that people hear arguments from the broader world, but it seems like arguments from the community are way way way more influential on their beliefs than the ones from the broader world. (For example, they think there's >10% chance of catastrophe from AI based on argument from this community, despite the rest of the world arguing that this is dumb.) I looked at the linked tweet and a few surrounding it and they seem completely unrelated? E.g. the word "capabilities" doesn't appear at all (or its synonyms). I'm guessing you mean Kelsey's point that EAs go to orgs that think safety is easy because those are the ones that are hiring, but (a) that's not saying that those EA

Anthropic’s founding team consists of, specifically, people who formerly led safety and policy efforts at OpenAI

This claim seems misleading at best: Dario, Anthropic's founder and CEO, led OpenAI's work on GPT-2 and GPT-3, two crucial milestone in terms of public AI capabilities.
Given that I don't have much time to evaluate each claim one by one, and Gell-Mann amnesia, I am a bit more skeptical of the other ones.

Was Dario Amodei not the former head of OpenAI’s safety team? He wrote "Concrete Problems in AI Safety". I don't see how the claim isn't just true/accurate. Whether or not he led/contributed to the GPT series, (I an under the impression that) Dario Amodei did lead safety efforts at OpenAI.

The link in "started out as a comment on this post", in the first line of the post, is broken