Leon Lang

I'm a PhD student at the University of Amsterdam. I have research experience in multivariate information theory and equivariant deep learning and recently got very interested into AI alignment. https://langleon.github.io/

Wiki Contributions

Comments

https://twitter.com/ai_risks/status/1664323278796898306?s=46&t=umU0Z29c0UEkNxkJx-0kaQ

Apparently Bill Gates signed.

Stating the obvious: Do we expect that Bill Gates will donate money to prevent the extinction from AI?

It's great to see Yoshua Bengio and other eminent AI scientists like Geoffrey Hinton actively engage in the discussion around AI alignment. He evidently put a lot of thought into this. There is a lot I agree with here.

Below, I'll discuss two points of disagreement or where I'm surprised by his takes, to highlight potential topics of discussion, e.g. if someone wants to engage directly with Bengio.

  • Most of the post is focused on the outer alignment problem -- how do we specify a goal aligned with our intent -- and seems to ignore the inner alignment problem -- how do we ensure that the specified goal is optimized for.
    • E.g., he makes an example of us telling the AI to fix climate change, after which the AI wipes out humanity since that fixes climate change more effectively than respecting our implicit constraints of which the AI has no knowledge. In fact, I think language models show that there may be quite some hope that AI models will understand our implicit intent. Under that view, the problem lies at least as much in ensuring that the AI cares.
    • He also extensively discusses the wireheading problem of entities (e.g., humans, corporations, or AI systems) that try to maximize their reward signal. I think we have reasons to believe that wireheading isn't as much of a concern: inner misalignment will cause the agent to have some other goal than the precise maximization of the reward function, and once the agent is situationally aware, it has incentives to keep its goals from changing by gradient descent. 
    • He does discuss the fact that our brains reward us for pleasure and avoiding pain, which is misaligned with the evolutionary goal of genetic fitness. In the alignment community, this is most often discussed as an inner alignment issue between the "reward function" of evolution and the "trained agent" being our genomes. However, his discussion highlights that he seems to view it as an outer alignment issue between evolution and our reward signals in the brain, which shape our adult brains through in-lifetime learning. This is also the viewpoint in Brain-Like-AGI Safety, as far as I remember, and also seems related to viewpoints discussed in shard theory
  • "In fact, over two decades of work in AI safety suggests that it is difficult to obtain AI alignment [wikipedia], so not obtaining it is clearly possible."
    • I agree with the conclusion, but I am surprised by the argument. It is true that we have seen over two decades of alignment research, but the alignment community has been fairly small all this time. I'm wondering what a much larger community could have done. 

Yoshua Bengio was on David Krueger's PhD thesis committee, according to David's CV

After filling out the form, I could click on "see previous responses", which allowed me to see the responses of all other people who have filled out the form so far

That is probably not intended?

I disagree with this. I think the most useful definition of alignment is intent alignment. Humans are effectively intent-aligned on the goal to not kill all of humanity. They may still kill all of humanity, but that is not an alignment problem but a problem in capabilities: humans aren't capable of knowing which AI designs will be safe.

The same holds for intent-aligned AI systems that create unaligned successors. 

Has this already been posted? I could not find the post. 

For what it's worth, I think this comment seems clearly right to me, even if one thinks the post actually shows misalignment. I'm confused about the downvotes of this (5 net downvotes and 12 net disagree votes as of writing this). 

Now to answer our big question from the previous section: I can find some  satisfying the conditions exactly when all of the ’s are independent given the “perfectly redundant” information. In that case, I just set  to be exactly the quantities conserved under the resampling process, i.e. the perfectly redundant information itself.

 

In the original post on redundant information, I didn't find a definition for the "quantities conserved under the resampling process". You name this F(X) in that post.

Just to be sure: is your claim that if F(X) exists that contains exactly the conserved quantities and nothing else, then you can define  like this? Or is the claim even stronger and you think such  can always be constructed?

Edit: Flagging that I now think this comment is confused. One can simply define  as the conditional, which is a composition of the random variable  and the function 

When I converse with junior folks about what qualities they’re missing, they often focus on things like “not being smart enough” or “not being a genius” or “not having a PhD.” It’s interesting to notice differences between what junior folks think they’re missing & what mentors think they’re missing.

 

There may also be social reasons to give different answers depending on whether you are a mentor or mentee. I.e., answering "the better mentees were those who were smarter" seems like an uncomfortable thing to say, even if it's true. 

(I do not want to say that this social explanation is the only reason that answers between mentors and mentees differed. But I do think that one should take it into account in one's models)

Load More