cfoster0

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

I was trying to write a comment to explain my reaction above, but this comment said everything I would have said, in better words.

cfoster04013

OK, in case this wasn't clear: if you are a Californian and think this bill should become law, don't let my comment excuse you from heeding the above call to action. Contacting your representatives will potentially help move the needle.

cfoster04717

Unfortunately, due to misinformation and lobbying by big tech companies, SB 1047 is currently stalled in the Assembly Appropriations Committee.

This is extremely misleading. Any bill that would have non-negligible fiscal impact (the threshold is only $150,000 https://apro.assembly.ca.gov/welcome-committee-appropriations/appropriations-committee-rules) must be put in the Appropriation Committee “Suspense File” until after the budget is prepared. That is the status of SB 1047 and many many other bills. It has nothing to do with misinformation or lobbying, it is a part of the standard process. I believe all the bills that make it out of the Suspense File will be announced at the hearing this Thursday.

More on this: https://calmatters.org/newsletter/california-bills-suspense-file/

What's the evidence that this document is real / written by Anthropic?

Axios first reported on the letter, quoting from it but not sharing it directly:

https://www.axios.com/2024/07/25/exclusive-anthropic-weighs-in-on-california-ai-bill

The public link is from the San Francisco Chronicle, which is also visible in the metadata on the page citing the letter as “Contributed by San Francisco Chronicle (Hearst Newspapers)”.

https://www.sfchronicle.com/tech/article/wiener-defends-ai-bill-tech-industry-criticism-19596494.php

Left the following comment on the blog:

I appreciate that you’re endorsing these changes in response to the two specific cases I raised on X (unlimited model retraining and composition with unsafe covered models). My gut sense is still that ad-hoc patching in this manner just isn’t a robust way to deal with the underlying issue*, and that there are likely still more cases like those two. In my opinion it would be better for the bill to adopt a different framework with respect to hazardous capabilities from post-training modifications (something closer to “Covered model developers have a duty to ensure that the marginal impact of training/releasing their model would not be to make hazardous capabilities significantly easier to acquire.”). The drafters of SB 1047 shouldn’t have to anticipate every possible contingency in advance, that’s just bad design.

* In the same way that, when someone notices that their supposedly-safe utility function for their AI has edge cases that expose unforseen maxima, introducing ad-hoc patches to deal with those particular noticed edge cases is not a robust strategy to get an AI that is actually safe across the board.

You want to learn an embedding of the opportunities you have in a given state (or for a given state-action), rather than just its potential rewards. Rewards are too sparse of a signal.

More formally, let's say instead of the Q function, we consider what I would call the Hope function: which given a state-action pair (s, a), gives you a distribution over states it expects to visit, weighted by the rewards it will get. This can still be phrased using the Bellman equation:

Hope(s, a) = rs' + f Hope(s', a')

The "successor representation" is somewhat close to this. It encodes the distribution over future states a partcular policy expects to visit from a particular starting state, and can be learned via the Bellman equation / TD learning.

On reflection these were bad thresholds, should have used maybe 20 years and a risk level of 5%, and likely better defined transformational. The correlation is certainly clear here, the upper right quadrant is clearly the least popular, but I do not think the 4% here is lizardman constant.

Wait, what? Correlation between what and what? 20% of your respondents chose the upper right quadrant (transformational/safe). You meant the lower left quadrant, right?

cfoster03714

Very surprised there's no mention here of Hanson's "Foom Liability" proposal: https://www.overcomingbias.com/p/foom-liability

cfoster0139

I appreciate that you are putting thought into this. Overall I think that "making the world more robust to the technologies we have" is a good direction.

In practice, how does this play out?

Depending on the exact requirements, I think this would most likely amount to an effective ban on future open-sourcing of generalist AI models like Llama2 even when they are far behind the frontier. Three reasons that come to mind:

  1. The set of possible avenues for "novel harms" is enormous, especially if the evaluation involves "the ability to finetune [...], external tooling which can be built on top [...], and API calls to other [SOTA models]". I do not see any way to clearly establish "no novel harms" with such a boundless scope. Heck, I don't even expect proprietary, closed-source models to be found safe in this way.
  2. There are many, many actors in the open-source space, working on many, many AI models (even just fine-tunes of LLaMA/Llama2). That is kind of the point of open sourcing! It seems unlikely that outside evaluators would be able to evaluate all of these, or for all these actors to do high-quality evaluation themselves. In that case, this requirement turns into a ban on open-sourcing for all but the largest & best-resourced actors (like Meta).
  3. There aren't incentives for others to robustify existing systems or to certify "OK you're allowed to open-source now", in the way as there are for responsible disclosure. By default, I expect those steps to just not happen, & for that to chill open-sourcing.
cfoster01621

If we are assessing the impact of open-sourcing LLMs, it seems like the most relevant counterfactual is the "no open-source LLM" one, right?

Load More