LESSWRONG
LW

3153
Priyanka Bharadwaj
14013240
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2Priyanka Bharadwaj's Shortform
1mo
1
No wikitag contributions to display.
Why Eliminating Deception Won’t Align AI
Priyanka Bharadwaj2mo10

Thanks for reading! I'm especially interested in feedback from folks working on mechanistic interpretability or deception threat models. Does this framing feel complementary, orthogonal, or maybe just irrelevant to your current assumptions? Happy to be redirected if there are blind spots I’m missing.

Reply
AI Induced Psychosis: A shallow investigation
Priyanka Bharadwaj15d10

Aw, yeah it is easier to just look stuff up online and debate with LLMs, isn't it? 

I am not a therapist, but I have been to therapists in multiple countries (US, UK and India) for several years, and I can share my understanding based on that experience. 

I think human therapist accountability has multiple layers. Firstly, you need a professional license for practice that involves years of training, supervision, revocable licenses, etc. Then you have legal obligations for ensuring complete documentation and following crisis protocols. If these fail (and they sometimes do), you also have malpractice liability, and free market feedback. Even if only 1 in 100 bad therapists faces consequences, it creates deterrent effects across the profession. The system is imperfect but exists. 

For AI systems, training, certification, supervision, documentation and crisis protocols are all doable, and probably far easier to scale, but at the end of the day, who is accountable for poor therapeutic advice? the model? the company building it? With normal adults, it's easy to ask for user discretion, but what do you do with vulnerable users? I am not sure how that would even work.

Reply
AI Induced Psychosis: A shallow investigation
Priyanka Bharadwaj15d90

Thank you for this very detailed study. 

I am most concerned about the accountability gap. Several students in my undergraduate class use these models as "someone to talk to" to deal with loneliness. While your study shows that some models handle vulnerable conversations better than others, I think the fundamental issue is that AI lacks the infrastructure for accountability that real therapeutic relationships require including continuity of care/ long-term mindset, professional oversight, integration with mental health systems, liability and negligence frameworks, etc.

Until then, I don't care how good the model is in terms of handling vulnerable conversations, I'd rather have it triage users by saying "Here are resources for professional support" and bow out, rather than attempting ongoing therapeutic relationships. Even perfectly trained therapeutic AI seems problematic without the broader accountability structures that protect vulnerable users.

More fundamentally, what are the underlying mechanisms that cause these model behaviours, and can training fixes address them without the accountability infrastructure?

Reply
Dating Roundup #7: Back to Basics
Priyanka Bharadwaj20d10

are relationship coaches (not PUA) not a thing in the US? 

Reply
Should we align AI with maternal instinct?
Priyanka Bharadwaj20d10

Wait… isn’t this already filial piety? We created AI, and now we want it to mother us.

Reply
Should we align AI with maternal instinct?
Priyanka Bharadwaj20d10

I don’t mean this as a technical solution, more a direction to start thinking in.

Imagine a human tells an AI, “I value honesty above convenience.” A relational AI could store this as a core value, consult it when short-term preferences tempt it to mislead, and, if it fails, detect, acknowledge, and repair the violation in a verifiable way. Over time it updates its prioritisation rules and adapts to clarified guidance, preserving trust and alignment, unlike a FAI that maximises a static utility function.

This approach is dynamic, process-oriented, and repairable, ensuring commitments endure even under mistakes or evolving contexts. It’s a sketch, not a finished design, and would need iterative development and formalization.

While simple, does this broadly capture the kind of thing you were asking about? I’d be happy to chat further sometime if you’re interested. 

Reply
Should we align AI with maternal instinct?
Priyanka Bharadwaj20d10

I’m reminded of a Sanskrit verse “Vidya dadati vinayam, vinayodyati patratam” which translates to intelligence gives power, but humility gives guidance. Applied to AI, intelligence alone doesn’t ensure alignment, just as humans aren’t automatically prosocial. What matters are the high-level principles we embed to guide behaviour toward repairable, cooperative, and trustable interactions, which we do see in long-term relationships built on shared values.

The architecture-level challenge of making AI reliably follow such principles is hard, yes, especially under extreme power asymmetry, but agreeing on relational alignment first is a necessary first step. Master/servant models may seem safe, but I believe carefully engineered relational principles offer a more robust and sustainable path.

Reply
Should we align AI with maternal instinct?
Priyanka Bharadwaj20d10

I completely agree that AI isn’t human, mammal, or biological, and that any relational qualities it exhibits will only exist because we engineer them. I’m not suggesting we model AI on any specific relationship, like mother-child, or try to mimic familiar social roles. Rather, alignment should be based on abstract relational principles that matter for any human interaction without hierarchy or coercion.

I also hear the frequent concern about technical feasibility, and I take it seriously. I see it as an opportunity rather than a reason to avoid this work. I’d love the chance to brainstorm and refine these ideas, to explore how we might engineer architectures that are simple yet robust, capable of sustaining trust, repair, and cooperation without introducing subjugation or dependency.

Ultimately, relational design matters because humans inevitably interact through familiar social frameworks such as trust, repair, etc. If we ignore that, alignment risks producing systems that are powerful but alien in ways that matter for human flourishing.

Reply
AI Companion Conditions
Priyanka Bharadwaj1mo20

Thanks for the detailed analysis, Zvi. 

But it’s sad, isn't? Despite developing technologies with such incredible scale, we are using them to amplify our lack of incentives to build a better future, rather than actually transforming it. 

Reply
evhub's Shortform
Priyanka Bharadwaj1mo10

I am reminded of Thomas Wolf's essay on the 'Einstein model' in AI, where he argues that breakthrough discoveries often come from people who think outside established frameworks, yet our field tends to filter for conventional technical skills.

I'm new to AI safety, but I've noticed this tension in fellowship applications too. The MATS program (at least Neel's track) allows non-programmers like me to contribute using existing skillsets, focusing on thoughtful research questions rather than just coding ability. In contrast, Anthropic Fellows requires strong programming skills and technical backgrounds upfront. 

While I may not make it to later stages, the inclusive MATS process has been immensely fulfilling and lets me explore meaningful questions. If we believe paradigm shifts come from diverse perspectives, maybe Anthropic should consider similar flexibility in future editions, unless I've misunderstood the program's purpose.

Reply
Load More
33Should we align AI with maternal instinct?
20d
15
2Priyanka Bharadwaj's Shortform
1mo
1
6Love, Lies and Misalignment
2mo
1
19Why Eliminating Deception Won’t Align AI
2mo
6
5Relational Design Can’t Be Left to Chance
3mo
0
11Coaching AI: A Relational Approach to AI Safety
3mo
0
28Research Without Permission
3mo
1
14Cognitive Exhaustion and Engineered Trust: Lessons from My Gym
4mo
3
13Can you care without feeling?
4mo
2
3Relational Alignment: Trust, Repair, and the Emotional Work of AI
4mo
0
Load More