Ph.D. student studying computational data science. Interested in AI safety. Planning to learn then eventually contribute. 


Sorted by New

Wiki Contributions


Evolution failed at imparting its goal into humans, since humans have their own goals that they shoot for instead when given a chance.


To me, your framing of inner misalignment sounds like Goodharting itself because we evolved our intrinsic motivations towards these measures because they were good measures in the ancestral environment. But when we got access to advanced technology we kept optimizing on the measure (sex, sugar, beauty, etc) which led to it becoming no longer a measure of the actual target (kids, calories, health, etc.)

I think outer alignment is better thought of as a property of the objective function i.e. "an objective function is outer aligned if it incentivizes or produces the behavior we actually want on the training distribution."

You should come for the Bangalore meet-up this Sunday. If you are near this part of India.

Answer by AdityaJul 11, 202244-1

I asked out my crushes. Worked out well for me.

I used to be really inhibited, now I have tried weed, alcohol and am really enjoying the moment.

Feels nice to see my name in a story. This fact about Romans is just so tasty.

It was hard to really imagine someone getting so emotionally caught up about a fact. I didn't expect to find it so hard.

Most fights are never about the underlying fact but it's tribal, about winning. If people cared about knowing the truth it would be discussions not debates.

This is totally possible and valid. I would love for this to be true. It's just that we can plan for the worst case scenario.

I think it can help to believe that things will turn out ok, we are training the AI on human data. It might adopt some values. Once you believe that, then working on alignment can just be a matter of planning for the worst case scenario.

Just in case. Seem like that would be better for mental health.

Oh ok, I had heard this theory from a friend. Looks like I was misinformed. Rather than evolution causing cancer I think it is more accurate to say evolution doesn’t care if older individuals die off.

evolutionary investments in tumor suppression may have waned in older age.

Moreover, some processes which are important for organismal fitness in youth may actually contribute to tissue decline and increased cancer in old age, a concept known as antagonistic pleiotropy

So thanks for clearing that up. I understand cancer better now.

When I talk to my friends, I start with the alignment problem. I found this analogy to human evolution really drives home the point that it’s a hard problem. We aren’t close to solving it.

So at this time questions come up about how intelligence necessarily means morality. I talk about orthogonality thesis. Then why would the AI care about anything other that what it was explicitly told to do, the danger comes from Instrumental convergence.

Finally people tend to say, we can never do it, they talk about spirituality, uniqueness of human intelligence. So I need to talk about evolution hill climbing to animal intelligence, how narrow ai has small models while we just need AGI to have a generalised world model. Brains are just electrochemical complex systems. It’s not magic.

Talk about pathways, imagen, gpt3 and what it can do, talk about how scaling seems to be working.

So it makes sense we might have AGI in our lifetime and we have tons of money and brains working on building ai capability, fewer on safety.

Try practising on other smart friends and develop your skill, you need to ensure people don’t get bored so you can’t use too much time. Use nice analogies. Have answers to frequent questions ready.

I think this is how evolution selected for cancer. To ensure humans don’t live for too long competing for resources with their descendants.

Internal time bombs are important to code in. But it’s hard to integrate that into the AI in a way that the ai doesn’t just remove it the first chance it gets. Humans don’t like having to die you know. AGI would also not like the suicide bomb tied onto it.

The problem of coding this (as part of training) into an optimiser such that it adopts it as a mesa objective is an unsolved problem.

Same this post is what made me decide I can't leave it to the experts. It is just a matter of spending the required time to catch up on what we know and tried. As Keltham said - Diversity is in itself an asset. If we can get enough humans to think about this problem we can get some breakthroughs many some angles others have not thought of yet.


For me, it was not demotivating. He is not a god, and it ain't over until the fat lady sings. Things are serious and it just means we should all try our best. In fact, I am kinda happy to imagine we might see a utopia happen in my lifetime. Most humans don't get a chance to literally save the world. It would be really sad if I died a few years before some AGI turned into a superintelligence.

Eliezer's latest fanfic is pretty fun to read; if any of you guys are reading it, I would love to discuss it. 

Load More