Ben Cottier — LessWrong

This is neat! I like the idea of isolating technical progress. I'm curious whether you've tried this analysis on more benchmarks, considering that we found significant variation in slope across benchmarks in https://epoch.ai/data-insights/llm-inference-price-trends?insight-option=All+benchmarks

GPT-4

Ben Cottier3y10

What is the source for the "JP Morgan note"?

GPT-3-like models are now much easier to access and deploy than to develop

Ben Cottier3y10

To be clear (sorry if you already understood this from the post): Running BLOOM via an API that someone else created is easy. My claim is that someone needs significant expertise to be able to run their own instance of BLOOM. I think the hardest part is setting up multiple GPUs to run the 176B parameter model. But looking back, I might have underestimated how straightforward it is to get the open-source code to run BLOOM working. Maybe it's basically plug-and-play as long as you get an appropriate A100 GPU instance on the cloud. I did not attempt to run BLOOM from scratch myself.

I recall that in an earlier draft, my estimate for how many people know how to independently run BLOOM was higher (indicating that it's easier). I got push-back on that from someone who works at an AI lab (though this person wasn't an ML practitioner themselves). I thought they made a valid point but I didn't think carefully about whether they were actually right in this case. So I decreased my estimate in response to their feedback.

What are the biggest current impacts of AI?

Answer by Ben CottierDec 02, 202120

Personal AI assistants seem to have one of the largest impacts (or at least "presence") mainly due to the number of users. The impact per person seems small - making life slightly more convenient and productive, maybe. Not sure if there is actually much impact on productivity. I wonder if there is any research on this. I haven't looked into it at all.

Relatedly, chatbots are certainly used a lot, but I'm uncertain about its current impacts beyond personal entertainment and wellbeing (and uncertain about the direction of the impact on wellbeing).

What 2026 looks like has a few relevant facts on the current impacts, and interesting speculation about the future impacts of personal assistants and chatbots. E.g. facts:

"in China in 2021 the market for chatbots is $420M/year, and there are 10M active users. This article claims the global market is around $2B/year in 2021 and is projected to grow around 30%/year."

I don't feel surprised by those stats, but I also hadn't really considered how big the market is.

Modeling the impact of safety agendas

Ben Cottier4yΩ240

Nice! A couple things that this comment pointed out for me:

Real time is not always (and perhaps often not) the most useful way to talk about timelines. It can be more useful to talk about different paths, or economic growth, if that's more relevant to how tractable the research is.
An agenda doesn't necessarily have to argue that its assumptions are more likely, because we may have enough resources to get worthwhile expected returns on multiple approaches.

Something that's unclear here: are you excited about this approach because you think brain-like AGI will be easier to align? Or is it more about the relative probabilities / neglectedness / your fit?

AI learns betrayal and how to avoid it

Ben Cottier4yΩ010

I'm excited about this project. I've been thinking along similar lines about inducing a model to learn deception, in the context of inner alignment. It seems really valuable to have concrete (but benign) examples of a problem to poke at and test potential solutions on. So far there seem to be less concrete examples of deception, betrayal and the like to work with in ML compared to say, distributional shift, or negative side effects.

AI learns betrayal and how to avoid it

Ben Cottier4yΩ010

Previous high level projects have tried to define concepts like "trustworthiness" (or the closely related "truthful") and motivated the AI to follow them. Here we will try the opposite: define "betrayal", and motivate the AIs to avoid it.

Why do you think the betrayal approach is more tractable or useful? It's not clear from the post.

Clarifying some key hypotheses in AI alignment

Ben Cottier5y10

Google Drawings

Clarifying some key hypotheses in AI alignment

Ben Cottier5y20

To your first point - I agree both with why we limited the scope (but also, it was partly just personal interests), and that there should be more of this kind of work on other classes of risk. However, my impression is the literature and "public" engagement (e.g. EA forum, LessWrong) on catastrophic AI misuse/structural risk is too small to even get traction on work like this. We might first need more work to lay out the best arguments. Having said that, I'm aware of a fair amount of writing which I haven't got around to reading. So I am probably misjudging the state of the field.

To your second point - that seems like a real crux and I agree it would be good to expand in that direction. I know some people working on expanded and more in-depth models like this post. It would be great to get your thoughts when they're ready.

Clarifying some key hypotheses in AI alignment

Ben Cottier5y20

It's great to hear your thoughts on the post!

I'd also like to see more posts that do this sort of "mapping". I think that mapping AI risk arguments is too neglected - more discussion and examples in this post by Gyrodiot. I'm continuing to work collaboratively in this area in my spare time, and I'm excited that more people are getting involved.

We weren't trying to fully account for AGI timelines - our choice of scope was based on a mix of personal interest and importance. I know people currently working on posts similar to this that will go in-depth on timelines, discontinuity, paths to AGI, the nature of intelligence, etc. which I'm excited about!

I agree with all your points. You're right that this post's scope does not include broader alternatives for reducing AI risk. It was not even designed to guide what people should work on, though it can serve that purpose. We were really just trying to clearly map out some of the discourse, as a starting point and example for future work.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments