David Scott Krueger (formerly: capybaralet)

I'm more active on Twitter than LW/AF these days: https://twitter.com/DavidSKrueger

Bio from https://www.davidscottkrueger.com/:
I am an Assistant Professor at the University of Cambridge and a member of Cambridge's Computational and Biological Learning lab (CBL). My research group focuses on Deep Learning, AI Alignment, and AI safety. I’m broadly interested in work (including in areas outside of Machine Learning, e.g. AI governance) that could reduce the risk of human extinction (“x-risk”) resulting from out-of-control AI systems. Particular interests include:

  • Reward modeling and reward gaming
  • Aligning foundation models
  • Understanding learning and generalization in deep learning and foundation models, especially via “empirical theory” approaches
  • Preventing the development and deployment of socially harmful AI systems
  • Elaborating and evaluating speculative concerns about more advanced future AI systems
     

Wiki Contributions

Comments

Organizations that are looking for ML talent (e.g. to mentor more junior people, or get feedback on policy) should offer PhD students high-paying contractor/part-time work.

ML PhD students working on safety-relevant projects should be able to augment their meager stipends this way.

That is in addition to all the people who will give their AutoGPT an instruction that means well but actually translates to killing all the humans or at least take control over the future, since that is so obviously the easiest way to accomplish the thing, such as ‘bring about world peace and end world hunger’ (link goes to Sully hyping AutoGPT, saying ‘you give it a goal like end world hunger’) or ‘stop climate change’ or ‘deliver my coffee every morning at 8am sharp no matter what as reliably as possible.’ Or literally almost anything else.


I think these mostly only translate into dangerous behavior if the model badly "misunderstands" the instruction, which seems somewhat implausible.  

One must notice that in order to predict the next token as well as possible the LMM will benefit from being able to simulate every situation, every person, and every causal element behind the creation of every bit of text in its training distribution, no matter what we then train the LMM to output to us (what mask we put on it) afterwards.


Is there any rigorous justification for this claim?  As far as I can tell, this is folk wisdom from the scaling/AI safety community, and I think it's far from obvious that it's correct, or what assumptions are required for it to hold.  

It seems much more plausible in the infinite limit than in practice.

I have gained confidence in my position that all of this happening now is a good thing, both from the perspective of smaller risks like malware attacks, and from the perspective of potential existential threats. Seems worth going over the logic.

What we want to do is avoid what one might call an agent overhang.

One might hope to execute our Plan A of having our AIs not be agents. Alas, even if technically feasible (which is not at all clear) that only can work if we don’t intentionally turn them into agents via wrapping code around them. We’ve checked with actual humans about the possibility of kindly not doing that. Didn’t go great.


This seems like really bad reasoning... 

It seems like the evidence that people won't "kindly not [do] that" is... AutoGPT.
So if AutoGPT didn't exist, you might be able to say: "we asked people to not turn AI systems into agents, and they didn't.  Hooray for plan A!"

Also: I don't think it's fair to say "we've checked [...] about the possibility".  The AI safety community thought it was sketch for a long time, and has provided some lackluster pushback.  Governance folks from the community don't seem to be calling for a rollback of the plugins, or bans on this kind of behavior, etc.
 

Christiano and Yudkowsky both agree AI is an x-risk -- a prediction that would distinguish their models does not do much to help us resolve whether or not AI is an x-risk.

I'm not necessarily saying people are subconsciously trying to create a moat.  

I'm saying they are acting in a way that creates a moat, and that enables them to avoid competition, and that more competition would create more motivation for them to write things up for academic audiences (or even just write more clearly for non-academic audiences).

My point is not specific to machine learning. I'm not as familiar with other academic communities, but I think most of the time it would probably be worth engaging with them if there is somewhere where your work could fit.

My point (see footnote) is that motivations are complex.  I do not believe "the real motivations" is a very useful concept here.  

The question becomes why "don't they judge those costs to be worth it"?  Is there motivated reasoning involved?  Almost certainly yes; there always is.

Load More