agulaya24's Shortform

agulaya24

This is a special post for quick takes by agulaya24. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Recently learned about Cognizable Harm, "the threshold requirement to being a viable lawsuit or claim."

I generally think definitions around agents acting on your behalf tend to be under-defined. Primarily because there is no formal framework around what that really means. So what would cognizable harm mean for agents working on someone's behalf?

A simple example: AI Agent commissions a piece from an artist on your behalf, and provides a deposit. The commission is completed, but the agent's user states they never wanted the piece and refuses to pay.

What happens? Was there a contract? If an agent signed the contract on the user's behalf, does that mean the user is directly responsible since it is their agent? What if it went rogue? There are related legal precedents for this, but none specifically for AI agents acting on your behalf.

I've found that through what ended up being one of the most rigorous processes of research, review, and writing that I have ever undertaken, I have realized new perspectives and a deeper understanding of what I am interested in doing with my life. There is a tendency to think this feeling of clarity is temporary, fleeting. Sometimes, having a dream means trying to prove it wrong for your own sanity, to verify you are working on something that still aligns. How many trials and tribulations must it go through until it has earned its place as part of your being?

----

I am interested in founding a multidisciplinary research organization that focuses on human-AI alignment, specifically how do we ensure AI has an accurate understanding of humans as a collective and as individuals. AI is very quickly moving from a tool someone uses to an agent that acts on an individual's behalf.

This transition is currently happening in pockets, but over the next 5 years, as adoption spreads, I believe verifying AIs are acting in its user's best interests will become the primary focus. Currently, evaluating this behavioral alignment in an AI, is opaque and exists privately within leading frontier model labs. There is currently no transparent or verifiable access to an AI's current understanding of an individual user. Convention suggests only frontier providers would have access to that kind of data; a human-aligned perspective would suggest the users should have ownership and transparency over an AI's understanding or interpretation of them. A user-verified AI representation allowed to act on their behalf, owned by the individual, verified for all external parties.

In terms of the cheapest possible test, we would need to prove the organization's core thesis. An interpretive layer built from a living person's or persons' data can better align an AI's behavior with an individual user or users. Further, if that layer is better aligned, can it be used against the interests of an individual user or users. A human-backed extension of the prototype benchmark from Beyond Recall (Arxiv 2605.28969), could be used to measure an AI's behavioral alignment with its user.

A human-backed study would validate the organization's core principles: that AI alignment can be measured and that proof of that alignment should be verified and owned by the individual.

I am in the middle of filling out a MATS fellowship applications a very interesting rank-order question came up. I provided my rank order below and my subsequent answer to the question.

>>>>>>>>>>>>
"Imagine each of the options below are fully achievable. Which would be the most beneficial for the long-term flourishing of humanity? Rank from top (most beneficial) to bottom (least)."

1 Building oversight mechanisms that detect when AI systems pursue unintended goals

2 Detecting and labeling AI-generated misinformation and synthetic media

3 Accelerating AI applications in scientific research

4 Building defenses against AI-enabled biological threats

5 Ensuring fair compensation for workers displaced by AI automation

6 Protecting creators' work from unauthorized use in AI training data

7 Minimizing unrestricted access to state-of-the-art AI chips

8 Reducing the environmental impact of AI data centers

9 Reducing the computational cost of training frontier AI models

10 Building AI-powered robotic systems that can learn from physical interaction with their environment

"Explain the framework you used to arrive at your top three ranked outcomes."

My north star revolves around how we ensure AI acts in alignment with humanity and, by extension, its individual users. It's quite clear AI has the ability to completely transform how we operate as a society. The only way that makes sense is if we can verify that AI is acting in our best interests, as a collective and as individuals. Naturally then, before it is integrated into humanity's operating fabric, it must be measurable, traceable, and transparent.

Measuring alignment is very opaque, and I think there are multiple levels, say frontier safety controls vs individual user alignment; these are orthogonal, and both are required to ensure an AI is always working in the best interests of society and the individual. This means prioritizing options that measure, track, or require true AI alignment. With a secondary focus on the human condition, a tertiary focus on bounding constraints, and a final categorization of AI "sentience", which isn't necessarily something we need for humanity per se.

My 1st option focuses on oversight mechanisms because it assumes we know how to measure an AI's pursuit of unintended goals and can, accordingly, implement a contingency plan or path correction mechanism. Oversight implies we have a good enough understanding of the technology to enforce protections. The 2nd option folds into the first: unintended goals could be driven by AI-generated misinformation, or misinformation could result from an unintended goal. It comes 2nd because of its relevance; it fails to be first because its implications are smaller than those of oversight mechanisms. Lastly, if we have oversight mechanisms that can detect and label misinformation, we can create an AI flywheel that accelerates AI applications across all of scientific research. If we can achieve this, then all of the following options could be evaluated with scientific rigor. Scientific research touches every part of humanity, but if AI is the one accelerating it, then we need to ensure we can measure and ensure alignment mechanisms.

AI without alignment may or may not act in our best interests; as humans, we should ensure it does.

>>>>>>>>>>>

When reviewing my response with an LLM to understand my thinking, an interesting question came up. How does the connection to the individual appear when talking about multiple levels and the stated orthogonal relationship between Frontier Lab alignment research and Individual alignment?

I have been increasingly reflecting on this idea of per-user alignment calibration for AI. Personalization takes on a whole different meaning when AI is truly integrated into a human's workflow. Ironically, I tend to think of this as a nesting of Markov blankets. If we can ensure AI alignment with Humanity, then the next step is to ensure AI alignment with different groups of people, and eventually, you get to AI's alignment with the individual.

Then the thread that prompted me to write this quick take: what are the operating tensions between the collective and the individual, and how do we provide some kind of ordering constraints to an AI acting on an individual's behalf? I think this calls into question what an agentic world looks like in practice and how all of the collective and stratified groups of humanity agree on its operation.