Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a linkpost for: https://www.governance.ai/post/sharing-powerful-ai-models 

On the GovAI blog, Toby Shevlane (FHI) argues in favour of labs granting "structured access" to AI models.

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 4:30 PM

This is an interesting possibility for a middle ground option between open-sourcing and fully private models. Do you have any estimates of how much it would cost an AI lab to do this, compared to the more straightforward option of open sourcing?

Some initial thoughts:

  • OpenAI have an API (which was evidently not prohibitively expensive to build). Would they be able to share any information about how long this took to develop, how useful they've found it, improvements for future iterations etc?
  • The AI developer enforcing the rules by monitoring usage could be labour-intensive to do thoroughly. And it may not be clear whether or not the model is being used for a prohibited application, particularly since those seeking to use it for prohibited reasons are incentivised to hide this.
  • Do you think it would be possible/valuable to build a common open source system for structured access, with basic features covering common use cases, available for use by any AI lab?

On monitoring, one reason for optimism is that it can be done in an automated way, especially as AI capabilities increase. But I agree that it might be possible to hide misuse. Looking at the user's queries and the model's predictions won't always give enough information.

On a common open source system for structured access: yes, that's something I've been thinking about recently, and I think it would be beneficial. OpenMined is doing relevant work in this area but, from what I can see, it's still generally too neglected.

One issue is that good research tools are hard to build, and organizations may be reluctant to share them (especially since making good research tools public-facing is even more effort.). Like, can I go out and buy a subscription to Anthropic's interpretability tools right now? That seems to be the future Toby (whose name, might I add, is highly confusable with Justin Shovelain's) is pushing for.

It does seem that public/shared investment into tools that make structured access programs easier, might make more of them happen.

As boring as it is, this might be a good candidate for technical standards for interoperability/etc.