Wiki Contributions

Comments

Sorted by

I liked various parts of this post and agree that this is an under-discussed but important topic. I found it a little tricky to understand the information security section. Here are a few disagreements (or possibly just confusions).

A single project might motivate more serious attacks, which are harder to defend against.

  • It might also motivate earlier attacks, such that the single project would have less total time to get security measures into place.\

In general, I think it's more natural to think about how expensive an attack will be and how harmful that attack would be if it were successful, rather than reasoning about when an attack will happen.

Here I am imagining that you think a single project could motivate earlier attacks because overall US adversaries are more concerned about the US's AI ambitions, or because AI progress is faster and it's more useful to steal a model. It's worth noting that stealing AI models whilst progress is mostly due to scaling and models are not directly dangerous or automating ai r&d doesn't seem particularly harmful (in that it's unlikely to directly cause a GCR or significantly speed up the stealer's AGI project). So overall, I'm not sure whether you think the security situation is better or worse in the case of earlier attacks.



A single project could have *more *attack surface, if it's sufficiently big. Some attack surface scales with the number of projects (like the number of security systems), but other kinds of attack surface scale with total size (like the number of people or buildings). If a single project were sufficiently bigger than the sum of the counterfactual multiple projects, it could have more attack surface and so be less infosecure.

I don't really understand your model here. I think naively you should be comparing a central US project to multiple AI labs projects. My current impression is that for a fixed amount of total AI lab resources the attack surface will likely decrease (e.g. only need to verify one set of libraries are secure, rather than 3 somewhat different sets of libraries). If you are comparing just one frontier lab to a large single project than I agree attack surface could be larger but that seems like the wrong comparison.

I don't understand the logic of step 2 of the following argument.

  • If it's harder to steal the weights, fewer actors will be able to do so.
  • China is one of the most resourced and competent actors, and would have even stronger incentives to steal the weights than other actors (because of race dynamics).
  • So it's more likely that centralising reduces proliferation risk, and less likely that it reduces the chance of China stealing the weights.\

I think that China has stronger incentives than many other nations to steal the model (because it is politically and financially cheaper for them) but making it harder to steal the weights still makes it more costly for China to steal the weights and therefore they are less incentivised. You seem to be saying that it makes them more incentivised to steal the weights but I don't quite follow why.

Most LWers should rely less on norms of their own (or the LW community's) design, and instead defer to regular societal norms more.

Reply3315

@peterbarnett and I quickly looked at summaries for ~20 papers citing Llama 2, and we thought ~8 were neither advantaged nor disadvantaged for capabilities over safety, ~7 were better for safety than capabilities, and ~5 were better for capabilities than safety. For me, this was a small update towards the effects of Llama 2 so far, having been positive.

That's helpful feedback; if others would find donating through every.org helpful (which they can signal by agree-voting with the parent comment), I'd be happy to look into this.

I think we can be very flexible for donations over $30k, so if you're interested in making a donation of that size feel free to dm me and I am sure we can figure something out.

On my computer, Ctrl-f finds ~10 cases of Holtz appearing in the main text, e.g. point 4 of the introduction.


> ... This included a few times when Yudkowsky’s response was not fully convincing and there was room for Holtz to go deeper, and I wish he would have in those cases. ...

We are hoping to release a report in the next few weeks giving a run down on our grantmaking over the last year, with some explanations for why we made the grants and some high level reflections on the fund.

Some things that might be useful:
* Fund page where we give more context on the goals of the fund: https://funds.effectivealtruism.org/funds/far-future
* Our old payout reports: https://funds.effectivealtruism.org/funds/far-future#payout-reports
* Our public grants database: https://funds.effectivealtruism.org/grants?fund=Long-Term%2520Future%2520Fund&sort=round

(Speaking just for the Long-Term Future Fund)

It’s true that the Long-Term Future Fund could use funding right now. We’re working on a bunch of posts, including an explanation of our funding needs, track record over the last year, and some reflections that I hope will be out pretty soon.

I’d probably wait for us to post those if you’re a prospective LTFF donor, as they also have a bunch of relevant updates about the fund.

There are lots of possible goals. Some people are good at achieving some goals. Performance on most goals that are interesting to me is dependent on the decision making ability of the player (e.g. winning at poker vs being tall).

There is some common thread between being an excellent poker player, a supportive friend and a fantastic cook. Even if the inner decision-making workings seem very different in each one, I think that some people have a mindset that lets them find the appropriate decision-making machinery for each task.

To use a metaphor, whilst some people who can play the piano beautifully would not have become beautiful violin players if they had chosen the violin instead of the piano, most people that play the piano and violin beautifully are just good at practising. I think that most instrumentalists could have become good instrumentalists at most other instruments because they are good at practising (although of course, some people do find success in other ways).

Practising is to learning instruments as applied rationality is achieving goals.

I often see people say things like it is cheaper to follow a vegan diet than an omnivorous one.

I think that this is trivially false (but probably not very interesting), the set of omnivorous diet includes the set of vegan meals and even if the vegan meals are often cheaper than the nonvegan ones, in my personal experience I often find that I am regularly in situations where it would be cheaper to consume a meal that contains meat or dairy (e.g. at restaurants where most meals and not vegan, or when looking around the reduced section of the supermarket).

The common response I get to this is 'well if you are optimising for the cheapest possible meal (and not just the cheapest meal at say a restaurant) this will probably be something like rice and beans which is vegan'. I somewhat agree here, but I think that it is more useful to say for some level of satisfaction how is expensive is the cheapest possible meal and it is vegan? I think often once we move to things a little more expensive than rice and beans it becomes much less clear whether vegan diets are usually cheaper.

Also if vegan diets were cheaper for similar levels of satisfaction I think I'd expect vegan food to be much more popular amongst people who are not sympathetic to animal ethics/environmental arguments just because I expect consumer preferences to be pretty sensitive to differences in the cost of similar utility goods.

Load More