Chris_Leong

Sequences

Linguistic Freedom: Map and Territory Revisted
INVESTIGATIONS INTO INFINITY

Comments

Sorted by

Just going to put it out there, it's not actually clear that we actually should want to advance AI for maths.

I maintain my position that you're missing the stakes if you think that's important. Even limiting ourselves strictly to concentration of power worries, risks of totalitarianism dominate these concerns.

My take - lots of good analysis, but makes a few crucial mistakes/weaknesses that throw the conclusions into significant doubt:

The USG will be able and willing to either provide or mandate strong infosecurity for multiple projects.

I simply don't buy that the infosec for multiple such projects will be anywhere near the infosec of a single project because the overall security ends up being that of the weakest link.

Additionally, the more projects there are with a particular capability, the more folk there are who can leak information either by talking or by being spies.

The probability-weighted impacts of AI takeover or the proliferation of world-ending technologies might be high enough to dominate the probability-weighted impacts of power concentration.

Comment: We currently doubt this, but we haven’t modelled it out, and we have lower p(doom) from misalignment than many (<10%).

Seems entirely plausible to me that either one could dominate. Would love to see more analysis around this.

Reducing access to these services will significantly disempower the rest of the world: we’re not talking about whether people will have access to the best chatbots or not, but whether they’ll have access to extremely powerful future capabilities which enable them to shape and improve their lives on a scale that humans haven’t previously been able to.

If you're worried about this, I don't think you quite realise the stakes. Capabilities mostly proliferate anyway. People can wait a few more years.

My take: Bits of this review come off as a bit too status-oriented to me. This is ironic, because the best part of the review is towards the end when it talks about the risk of rationality becoming a Fandom.

Sharing this resource doc on AI Safety & Entrepreneurship that I created in case anyone finds this helpful:

https://docs.google.com/document/d/1m_5UUGf7do-H1yyl1uhcQ-O3EkWTwsHIxIQ1ooaxvEE/edit?usp=sharing 

Chris_LeongΩ410-2

I agree that we probably want most theory to be towards the applied end these days due to short timelines. Empirical work needs theory in order to direct it, theory needs empirics in order to remain grounded.

Chris_LeongΩ462

Thanks for writing this. I think it is a useful model. However, there is one thing I want to push back against:

Looking at behaviour is conceptually straightforward, and valuable, and being done

I agree with Apollo Research that evals isn't really a science yet. It mostly seems to be conducted according to vibes. Model internals could help with this, but things like building experience or auditing models using different schemes and comparing them could help make this more scientific.

Similarly, a lot of work with Model Organisms of Alignment requires a lot of careful thought to get right.

Remember back in 2013 when the talk of the town was how vector representations of words learned by neural networks represent rich semantic information? So you could do cool things like take the [male] vector, subtract the [female] vector, add the [king] vector, and get out something close to the [queen] vector? That was cool! Where's the stuff like that these days? 


Activation vectors are a thing. So it's totally happening.

Chris_LeongΩ240

"How can we get more evidence on whether scheming is plausible?" - What if we ran experiments where we included some pressure towards scheming (either RL or fine-tuning) and we attempted to determine the minimum such pressure required to cause scheming? We could further attempt to see how this interacts with scaling.

Load More