Project Proposal: Considerations for trading off capabilities and safety impacts of AI research

[-]David_Kristoffersson6y110

This seems like a valuable research question to me. I have a project proposal in a drawer of mine that is strongly related: "Entanglement of AI capability with AI safety".

[-]abramdemski6yΩ7110

I am a bit surprised to see you begin this post by saying there seems to be a consensus that people shouldn't worry about capabilities consequences of their work, but then, I come from the miri-influenced crowd. I agree that it would be good to have a lot more clarity on how to think about this.

I agree it could be somewhat good for miri to have a hit ml publication, particularly if it was something unlikely to shift progress significantly. I could imagine a universe where this happened if miri happened upon a very interesting safety-advanced thing, the way adversarial counterexamples were this big new thing slightly outside the usual ml way of doing business (ie, not achieving high scores on a task with some improved technique). But it seems fairly unlikely to be worth it to try to play the usual ml game at the level of top ml groups simply for the sake of prestige, because it is likely too hard to gain prestige that way with so many others trying. It seems better in spirit to gain credibility by doing what miri does best and getting recognition for what's good (of the open research). O suspect we have some deep disagreements about background models.

I think the best way to reach ml people in the long run is not through credibility, but through good arguments presented well. Let me clarify: credibility/prestige definitely play a huge role in what the bulk of people think. But the credibility system is good enough that the top credible people are really pretty smart, so to an extent can be swayed by good arguments presented well. This case can definitely be overstated and I feel like I'm presenting a picture which will right be criticised as over-optimistic. But I think there are some success stories, and it's the honest leverage path (in contrast to fighting for prestige in a system in which lots of people are similarly doing so).

Anyway, I've hardly said anything about your main point. I don't know how to think about it, and I wish I did. I usually try to think about differential progress and then fail, and fall back on an assessment of how surprised I'd be if something lead to big AI progress, and am cautious if it seems within the realm of possibility.

[-]David Scott Krueger (formerly: capybaralet)6yΩ470

I do think this is an overly optimistic picture. The amount of traction an argument gets seems to be something like a product of how good the argument is, how credible those making the argument are, and how easy it is to process the argument.

Also, regarding this:

But the credibility system is good enough that the top credible people are really pretty smart, so to an extent can be swayed by good arguments presented well.

It's not just intelligence that determines if people will be swayed; I think other factors (like "rationality", "open-mindedness", and other personality factors play a very big role.

[-]matthew.vandermerwe6y40

This (2015) post is an attempt to answer a similar question: FAI Research Constraints and AGI Side Effects

[-]Gordon Seidoh Worley6y30

One complicating factor is how much you believe ML contributes to existential threats. For example, I think the current ML community is very unlikely to ever produce AGI (<10%) and that AGI will be the result of break throughs from researchers in other parts of AI, thus it seems not very important to me what current ML researchers think of long-term safety concerns. Other analyses of the situation would result in concluding differently, though, so this seems like an upstream question that must be addressed or at least contingently decided upon before evaluating how much it would make sense to pursue this line of inquiry.

[-]John_Maxwell6y30

For example, I think the current ML community is very unlikely to ever produce AGI (<10%)

I'd be interested to hear why you think this.

BTW, I talked to one person with experience in GOFAI and got the impression it's essentially a grab bag of problem-specific approaches. Curious what "other parts of AI" you're optimistic about.

[-]Gordon Seidoh Worley6y60

I think ML methods are insufficient for producing AGI, and getting to AGI will require one or more changes in paradigm before we have a set of tools that will look like they can produce AGI. From what I can tell the ML community is not working on this, and instead prefer incremental enhancements to existing algorithms.

Basically what I view as needed to make AGI work might be summarized as needing to design dynamic feedback networks with memory that support online learning. What we mostly see out of ML these days are feedforward networks with offline learning that are static in execution and often manage to work without memory, though some do have this. My impression is that existing ML algorithms are unstable under these kinds of conditions. I expect something like neural networks will be part of making it to AGI, and so some current ML research will matter, but mostly we should think of current ML research as being about near-term, narrow applications rather than on the road to AGI.

That's at least my opinion based on my understanding of how consciousness works, my belief that "general" requires consciousness, and my understanding of the current state of ML and what it does and does not do that could support consciousness.

[-]David Scott Krueger (formerly: capybaralet)6y40

As someone with 5+ years of experience in the field, I think you're impression of current ML is not very accurate. It's true that we haven't *solved* the problem of "online learning" (what you probably mean is something more like "continual learning" or "lifelong learning"), but a fair number of people are working on those problems (with a fairly incremental approach, granted). You can find several recent workshops on those topics recently, and work going back to the 90s at least.

It's also true that long-term planning, credit assignment, memory preservation, and other forms of "stability" appear to be a central challenge to making this stuff work. On the other hand, we don't know that humans are stable in the limit, just for ~100yrs, so there very well may be no non-heuristic solution to these problems.

[-]David Scott Krueger (formerly: capybaralet)6y10

I don't follow. You seem to be responding to a statement that "I think the ML communities perceptions are important, because the ML community's attitude seems of critical importance for getting good Xrisk reduction policies in place", which I see as having little bearing on the question the post raises of "how should we assess info-hazard type risks of AI research we conduct?"

[-]Gordon Seidoh Worley6y20

Based on my reading of the post it seemed to me that you were concerned primarily with info-hazard risks in ML research, not AI research in general; maybe it's the way you framed it that I took it to be contingent on ML mattering.

[-]David Scott Krueger (formerly: capybaralet)6y20

I meant it to be about all AI research. I don't usually make too much effort to distinguish ML and AI, TBH.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

25

Project Proposal: Considerations for trading off capabilities and safety impacts of AI research

25

Ω 14

25

Ω 14