Sequences

Linguistic Freedom: Map and Territory Revisted
INVESTIGATIONS INTO INFINITY

Comments

"Presently beyond the state of the art... I think that would be pretty cool"

Point taken, but it doesn't make it sufficient for avoiding society-level catastrophies.

That's the exact thing I'm worried about, that people will equate deploying a model via API with releasing open-weights when the latter has significantly more risk due to the potential for future modification and the inability for it to be withdrawn.

Frontier Red Team, Alignment Science, Finetuning, and Alignment Stress Testing

 

What's the difference between a frontier red team and alignment stress-testing? Is the red team focused on the current models you're releasing and the alignment stress testing focused on the future?

I know that Anthropic doesn't really open-source advanced AI, but it might be useful to discuss this in Anthropic's RSP anyway because one way I see things going badly is people copying Anthropic's RSP's and directly applying it to open-source projects without accounting for the additional risks this entails.

Great work! It's easy to overlook the importance of this kind of community infrastructure, but I suspect that it makes a significant difference.

The biggest danger with AIs slightly smarter than the average human is that they will be weaponised, so they'd only safe in a very narrow sense.

I should also note, that if we built an AI that was slightly smarter than the average human all-round, it'd be genius level or at least exceptional in several narrow capabilities, so it'll be a lot less safe than you might think.

I believe this is likely a smaller model rather than a bigger model so I wouldn't take this as evidence that gains from scaling have plateaued.

Answer by Chris_Leong62

Developing skills related to AI puts you in a better position to make AI go well. At least for me, this outweighs the other concerns that you've mentioned.

Note: This doesn't mean that you should take a job that advances fundamental AI capabilities. This would probably be net-negative as things are already moving far too fast for society to adapt. But it sounds like you're more considering jobs related to AI applications, so I'd say to go for it.

You mention that society may do too little of the safer types of RL. Can you clarify what you mean by this?

This fails to account for one very important psychological fact: the population of startup founders who get a company off the ground is very heavily biased toward people who strongly believe in their ability to succeed. So it'll take quite a while for "it'll be hard to make money" to flow through and slow down training. And, in the mean time, it'll be acceleratory from pushing companies to stay ahead.

Load More