Ph.D. student in a computational data science department. Building the AI Safety Field in India. CBG Grant from CEA.

I am interested in Agent Foundations. Doing SERI MATS research sprint remotely with John this summer. 

Participating in Key Phenomena in AI Risk PIBBSS reading group.

Attending SLT in-person workshop end of this month. 


Wiki Contributions



I highly recommend people watch Connor talk about his interpretation of this post.

He talks about how Eliezer is a person who managed to access many anti memes that slid right off our heads.

What is an anti meme you might ask?

Anti meme

By their very nature they resist being known or integrated into your world model. You struggle to remember them. Just like how memes are sticky, and go viral anti memes are slippery and struggle to gain traction.

They could be extraordinarily boring. They could be facts about yourself that your ego protects you from really grasping. Facts about human nature that cause cognitive dissonance because it contradicts existing beliefs you hold dear.

Insights that are anti memes are hard to communicate. You need to use fables, narratives, embody the idea into a story and convey the vibe. You use metaphors in the story.

Jokes or humor is a great way to communicate anti memes, especially ones that are socially awkward and outside people's overton window.


Then Connor gives the example of this post - Death with Dignity as an example of an anti meme. Most of those reading the post seem to completely miss the actual point even when it was clearly spelled out.


maybe it's somewhat easier on account of how we have more introspective access to a working mind than we have to the low-level physical fields;

We have a biased access which makes things tricker because we weren't selected for our introspection skills to be high fidelity and having a correspondence with reality. Rather it's about the utility to survival.


It doesn't have to be the result of explicit metaphysical could be the result of vague guesswork, and analogical thinking.


Yeah I could be wrong but my claim is implicit metaphysical beliefs have a big role here. 


defining "agentic" as "possessing spooky metaphysical free will" rather than "not passive". It's perfectly possibly to build an agent-in-the-sense-of-active out of mechanical parts.


I was just noting that people who are aware of the internal workings of AI will have to acutely face cognitive dissonance if they admit it can have "spooky" agency.  They can't compartmentalize it the way others can. 


"topics about which philosophy is still concerned because we don't or can't get information that would enable us to have sufficient certainty of answers to allow those topics to transition into science".


I think that is quite close. I mean the implicit assumptions behind all these discussions, which are unquestioned. Moral realism, Computationalism, Empiricism, and Reductionism all come to mind. These topics cannot be tested or falsified with the scientific method. 

but there's not really anything here that seems like an argument that would convince anyone who didn't already agree


I thought it would be best to try even if I am not confident it will make any impact on people reading it. My attempt is, like you rightly said, trying to get AI safety researchers to take philosophy more seriously. Most people see it as a past time that they can enjoy for intrinsic pleasure. In my opinion there is a lot of utility if we practiced going more meta until we could see the underpinnings of both the problem of x risk and the solution. 

Some of the utility comes from being able to communicate it to more diverse people at higher fidelity. The rest comes from empowering existing researchers to maybe make a breakthrough in alignment itself. 

A lot of these objects like values, and goals seem to exist strongly in our ontology. I would like to see people try and question these things, consider other possibilities.

This exchange between Connor and Joscha seems to be an example where Connor clearly is irritated at the question because it is trying to use philosophy to question if we should even both saving humanity, is humans bad by our own standards. I can understand how he feels completely. But notice how Joscha seems to seriously think the philosophy of what values we have and how they are justified are very important. 

In this community it seems to taken as fact that the direction we align the AI towards is something to be considered after figuring out how to set the direction in anyway whatsoever. We have decoupled these two things. I would like to question these assumptions, and because I am not smart enough maybe others can also try. This needs us to unsee the boundaries we are so used to and be very careful which ones we put down.


In particular, they might unlearn it in narrow contexts related to their immediate work, but then get confused and fail to unlearn it in general, resulting in them getting confused about things like agency and free will.


Yeah, I was hoping to draw attention to this problem with my post. I love the embedded agency comic series. Yeah, the cartesian boundary is one of such boundaries which most of us have but again if we want to think about alignment honestly, I think it is worthwhile to train to unsee that too. 


I will check out your book. I hope to also maybe write something that can help people grok monoism and other philosophical ideas they might want to consider in its entirety. 


Aren't non-academics and non-experts the majority,


I was talking about people who had not grokked materialism which is the majority. The people who are not aware of the technical details model AI as this black box, therefore, seem to be more open to considering that it might be agentic but that is them just deferring to an outside view that sounds convincing rather than building their own model. 


so maybe people there, who are working on AI and machine learning, more often have a religious or spiritual concept of human nature, compared to their counterparts in the secularized West?


Most people I talked to were from India and it is possible there is a pattern there. But I see similar arguments come up even in the people in the west. When people say "it is just statistics", they seem to be pointing to the idea that deterministic processes can never be agentic. 


I am not trying to bring consciousness into the discussion necessarily but I think there is value in helping people make their existing philosophical beliefs more explicit so that they can see it to the natural conclusion. 


Thanks for the constructive critism. I thought about it and I guess I need to increase the legibility of what I wrote. 


I will add a TLDR and update the post soon.


Some things don't make sense unless you really experience it. Personally I have no words for the warping effects such emotions have on you. It's comparable to having kids or getting brain injury.

It's a socially acceptable mental disorder.

The only thing is to notice when you are in that state and put very low credence on all positive opinions you have about your Limerent Object. You cannot know to a high confidence anything about them in that state. Give it a few years.

Don't take decisions you can't undo, entangle parts of your life which will be painful to detach later.

But it's a ride worth going on. No point in living life too safely. Have fun but stay safe out there.


Evolution failed at imparting its goal into humans, since humans have their own goals that they shoot for instead when given a chance.


To me, your framing of inner misalignment sounds like Goodharting itself because we evolved our intrinsic motivations towards these measures because they were good measures in the ancestral environment. But when we got access to advanced technology we kept optimizing on the measure (sex, sugar, beauty, etc) which led to it becoming no longer a measure of the actual target (kids, calories, health, etc.)

I think outer alignment is better thought of as a property of the objective function i.e. "an objective function is outer aligned if it incentivizes or produces the behavior we actually want on the training distribution."


You should come for the Bangalore meet-up this Sunday. If you are near this part of India.

Answer by Aditya44-2

I asked out my crushes. Worked out well for me.

I used to be really inhibited, now I have tried weed, alcohol and am really enjoying the moment.

Load More