This is not a condensed post with only my best final ideas[1], this post is me writing across multiple days[2] as I try to work through a problem, enjoy.
I did something recently that I regret. I did something that I suspect hurt someone[3]. If I had asked myself in the moment whether the action I was about to take would hurt this person I would’ve been at least 30% certain that it would - but I didn’t consider it. If I had thought about it I would’ve realized that I don’t think being truthful to someone I don’t know that well is worth those odds of inflicting[4] pain, and yes sure it’s my belief that the truth I shared is “long term helpful” for them to know, but I also believe that people should have agency over receiving this sort of thing.[5]
Something that is very important to me is truthfulness. I believe that truthfulness is to some extent a core foundational building block of most of what’s good in life. Without communal truthfulness we aren’t living in reality. I think I’ve been bucketing truthfulness as a terminal value, I’m starting to suspect that it’s not. I believe that my relationship to truthfulness has been making my life worse[6] according to my values.[7]
So, what are my values?[8] The first answer is something like two buckets:
Positive impact on the world
Connection and joy
When I look at that it feels like a weak answer. Like yes, the two things I am juggling in my utility function are how good do I feel and to what extent am I a net positive (or negative) on the world. There are of course questions about what is positive impact on the world and the answer is something approximately “human flourishing”[9] shaped. But it feels trite to say that my values are the fact that I am simultaneously trying to optimizing for myself and for doing good for humanity[10].
The obvious[11] first alive thread to pull on is the fact that I said community, I could’ve just said joy - that’s interesting. (It is at this point that I read the Wikipedia page on values and talked to my roommate[12] about this.) Paying attention to what feels alive seems like it will get me closer to what’s real. It doesn’t matter what I intellectually think the correct values are if those values aren’t what they actually are. If one side goal is to potentially shift my values, it feels hard to do that if I don’t know what they actually are. What do I care about? What is alive?
Truthfulness/being in reality
Playfulness (don’t forget to have fun!)
Efficiency (really getting the most out of this one life that I have)
Being a good/reliable trading partner (cooperation)
Trying (really trying)
Actually accomplish things that actually matter (avoid unimportant goals)
Execution/sticking to my word (both internally and externally)
Curiosity/openness
Sincerity/integrity
Goodness (have positive impact on the people around me)
Always be updating (make predictions and notice when they are right/wrong, internalize feedback)
Doing positive sum things for my people (even at cost to myself)
Say the thing, have the conflict (with the people I care about)
Be in the moment, feel feelings, pay attention
After banging out that list I then mulled for a while: talked to some dudes in a sauna, chatted with some homies, sat in “what are my values”. The list didn’t feel right. Too long. Not focused. Not helpful. It’s more a collection of things that I care about than necessarily values. I think there’s something about values being things that are actually helpful for me towards living the life that I want. The hope is that thinking or saying the value out loud helps me move towards the behavior that I want.
Before doing this ranking I barely thought about growth, I suspect because it’s so core that I didn’t even really think of it as a value. I am constantly trying to get better to grow and it’s not because I think it’s a virtue in of itself to change myself, but because I know I can be better, I know I can be doing a better job of reaching my goals of accomplishing what I want to accomplish. I know I can do a better job of living a life that feels alive and real and fulfilling and sincere.
Fun vs. happiness[13]. Both “meaningful work” and “success”[14] are way higher than they would’ve been a couple years ago.[15] The trifecta of Friendship/Love/Community at 3rd/4th/5th place on the values quiz wasn’t surprising but it really drove home how important connection is to me despite how much I struggle with it.[16]
What I considered and then later snipped:
I spent hours talking with Claude and workshopped a bunch of ideas, really felt into which ones felt real and which ones didn’t. Here are some that didn’t.
Efficiency (really getting the most out of this one life that I have)
Instrumental towards aliveness, but doesn’t feel crucial. I would rather know what matters and do it less efficiently than not know what matters and have lots of time for it because I was efficient. Feels more secondary, and also I’m already very practiced in this.
“Be a generous and dependable trading partner”
The sentence felt very me, but didn’t feel actionable - reading it doesn’t help me muddle through how to be better.
“be in reality, be good” → “be in reality, be good to reality” → separating them
They didn’t feel connected, and then I tried connecting them but it didn’t feel right. I care about being good not being good to reality (whatever that even means[17]).
“Don’t wait don’t avoid, face things head on”
Didn’t feel alive, didn’t feel like facing things head on was something I needed a value to help me do or was super duper crucial given it was gestured at in other values.
“Grow, be sincere, really try”
Felt too generic not actionable, kind of throwing a bunch of ideas that feel good into one value. I kept the sincerity aspect in another value in a way that feels much more present/evocative (“Hold every word accountable”).
A final list I felt good about:
Find what actually matters, then really fucking try
Practice deep mutual connection, it’s a practice
Grow, update, reckon with how I fall short
Hold every word accountable
Be present, be playful, pay attention
Be in reality
Be good
But then, I did open circle.[18] I talked about my relationship to integrity, to correctly modeling myself, to ensuring that the things coming out of my mouth my thoughts my commitments are actually what I believe. That what I put into the world tracks reality. And wow, the response was not what I expected. I know I am relatively good at this[19] but the very visceral response from the people in my life was that I would be spending skill points[20] in the wrong place.
Their core claim was that I should be trying to improve in other places. “Instead of going from 95% truthful to 98% truthful you should go from 10% cleanup to 80% cleanup”. They claimed that they had literally never seen my lack of truthfulness or ability to model myself as an issue and that by focusing on that I was trying to never fall instead of getting better at falling. I’m not great at resolving situations in which I have hurt someone (“cleanup”). I struggle to deal with me having harmed someone, especially if I hurt someone taking an action that fits within one of my values. This is something that I am working on, but it was really really interesting to see multiple people who are very close to me agree that this is the place where they would like to see me grow. It makes sense that the thing that I am bad at is the thing I did a bad job of representing in my values. Deep mutual connection is important to me, being good is important to me - so I would like a set of values that does a good job of moving me towards what I truly want.
This idea of proactively repairing the emotional cuts I inflict is something I knew was important to work on, and yet at this state the values list doesn’t really hit this in a satisfying way. “Be good” is adjacent but it’s too vague to actually remind me in the moment to take the hard conversation. “Don’t wait don’t avoid, face things head on” is closer but still is only adjacent. Not avoiding not waiting is important here, but it doesn’t really cover that I want to be way more responsible way more attuned to the ways in which I cause harm. I want to be a fucking force of nature, but I can’t unless I actively respond with care when I inevitably get something wrong and hurt someone. “Reckon with how I fall short” is close but it’s more internal it’s about improving for next time.
Even after that feedback I still think internal integrity and truthfulness is important enough to keep a place on the values. It feels so unbelievably crucial to me. But, I definitely need to add something to remind myself to actively go out there and mend the cuts I have inflicted on other people!
Okay I think I’m actually done:
Find what actually matters, then really fucking try
I think it can be easy to forget what the actual goal is when there are lots of intermediate instrumental goals. And this is a reminder to really pay attention to what matters and what will have impact. The easier part of it is the really trying, I have never had trouble with executing but nevertheless really trying is important. It’s not about looking like I’m trying, it’s not about internally believing that I’m trying, it’s about really actually just sprinting at the thing.
Practice deep mutual connection, it’s a practice
The most important parts here are “mutual” and “practice”. I have a much easier time putting myself out there than letting people in. The mutual is trying to remind myself of that. The practice is like yeah I’m not great at this but that’s not my terminal state! This is definitely a continued growth edge!
Grow, update, reckon with how I fall short
Reckon with how I fall short feels slightly clunky, I wish I could find a slightly more succinct way to get at really staring deeply at my mistakes not flinching. Growing and updating don’t feel as helpful to remind myself of, but they are so core that I think it’s worth including.
Hold every word accountable
This one is great. Hard. But, so important to me. Really pay attention to what I am saying/thinking. Do I actually believe it?
Be present, be playful, pay attention
Solid, feels slightly generic, playful is the part that feels the most me. What feels strongest about it is ease of saying/remembering it in the moment.
I am my impact, actively repair what I break
This feels slightly weak in that it’s sort of two things at once. I think really trying is important, but so is actually accomplishing goals. Output matters. Impact is standing in for both my impact on the real world, and my impact on people - which is what the second part is really pointing at and what I need the most reminder on.
This is the only value that is a claim. I am not sure I think “I am my impact” is 100% true. I’ve been sitting in it and it certainly resonates. I was thinking of a world where everyone lies and telling the truth is met with ostracization, but even in that world I think with my current value system if I just told the truth and accomplished nothing I wouldn’t be living to my values. Slowly trying to shift towards truth in smarter ways would be more in line with my values. I think taking principled stands in ways that are wildly ineffective is bad. I do think impact is more important than motivation or my internal story. Impact is the terminal fucking goal.
I never really defined what the goal of thinking through and writing down my values was. Mostly because I really didn’t have a crisp goal in mind. So what makes a good value? What informed my decisions when thinking through this? I think there’s a couple things going on. There’s what really feels alive as a descriptor of where I am at. There’s what feels alive when I think about who I want to be. There’s what is actually helpful, I want to use these values as a tool so they should be helpful! When I think about or say a value aloud it should be clarifying. This was the core weakness of “be in reality” and “be good” they are too vague they don’t feel clarifying. Saying “be good” to myself doesn’t actually help me be better, it isn’t a helpful signpost, it feels like looking up at the sky and hoping it won’t rain.
What’s Next?
I don’t think I’m done. Even though I have absolutely plowed hours into this endeavor[21] my values will certainly be changing and I am pretty sure I will decide I don’t like something about at least one of these. My current plan is to put together a super simple app that randomly chooses one of these each day for me to focus on. If you’re my friend you should definitely tell me if you think I’m not acting in line with my values, feel free to do it aggressively, you are giving me a gift.
Oh, and I’ve got at least 3 people I owe an apology/conversation to. I’m going to do that.
I'm trying out posting here on LW instead of just on my substack for things that feel sufficiently relevant. It's an experiment. I am extremely open to feedback whether that be "no keep this type of post away from here", or "yes I am glad you posted here I got a lot out of reading this".
As of a couple days after I wrote this sentence I have now gotten a large update that the person wasn’t hurt by my action, but I don’t think this changes my takeaway. I still think it was a net bad choice and I regret it even if I got away with it this time.
Yes it’s their response to be hurt by receiving my truthful accounting, but that doesn’t mean I am in the clear for doing something I knew would have real odds of a negative response.
For example. If someone is already feeling really bad, it’s probably not helpful to tell them in that moment another way I think they are fucking up their life even if I think it is helpful info for them to have long term. I think it is good to give people agency over whether they want to hear the hard thing.
It’s not to say that my relationship to truthfulness is bad, I think there are many many many other ways I could be which are way worse. But, I also think it’s clear that it could be better and I would like to live in the world where I am behaving in ways that are more in line with what I want, who I think I am, what my values actually are, etc.
Why have I never really explicitly thought through what my values are before? That seems like a mistake. I have thought about what I care about, what would feel purposeful, but those are slightly different questions. I don’t love that I have never or at least can’t remember, explicitly doing this.
Which is also sort of leaky because I don’t care zero about other moral agents, but also when I consider questions like the extinction of a given lifeform vs 1 human life it’s noticeable that one of my first considerations is how would that extinction impact the ecosystem and therefore humanity’s future on this planet. There’s also more confusion in there because I’m not convinced that most animals live net happy lives, and part of caring about flourishing of moral agents is not just the number that are alive, but how good is the life. This would imply that I should care more about cost effective ways to improve how good the lives are of animals which feeling into it in the moment feels more impactful than just animal lives. But, it’s still way less than how good the lives are of humans.
And here the word humanity is confusing because I both do believe in doing good in effective ways for people that are very not in my life, and also I care more about the people who are in my life on some sort of grounds that a life well lived involves a community a tribe of people that care about each other and that it’s very important not to lose track of that. But maybe that’s an impact claim, I and the people around me will do more good if we all feel safe and taken care of. I guess this is a claim about giving away a smaller percent of a much larger pie. (both in physical resources but also emotional capacity etc)
Obvious to me at least, I’ve been told many times I will start a sentence like this and the thing I say is obvious is not obvious to others. I’m not sure if this is a habit worth changing. Obviously my way of thinking is in fact quite unique to me and I don’t at all believe the places my brain goes are representative.
He is currently in the process of trying to codify his values in an effort to better model whether he is acting in integrity with himself which feels like another good reason to actually know what my values are.
Actually trying is important, but at the end of the day what matters is the actual impact what actually was done and did I succeed is really important. Did I actually do something
To some extent my history with NYC is me thinking purposeful work isn’t that important and deciding to optimize on making money at work and really working on community in my extensive free time. I believed that simply being a local community figure who makes the lives of the people I come into contact with consistently better, that this would be sufficient for a good life. Within the last couple years two things hit me. The first that simply stellar community is not nearly enough purpose or impact for me. The second is that even if I am great at bringing people together and curating community, that still doesn’t make me good at personally connecting with other individual souls. Comfy community co-living is a different skill from truly connecting with someone.
It’s hard to tell to what extent I am actually picky or if it’s just a skill issue I could fix or it’s just genuinely a hard problem. I do want us to feel so comfy and good and connected. And I have some of that! Even when in conflict with my housemates I still feel connected and good about them, but I struggle with the next level (whatever that means).
“be good to reality” doesn’t feel real it feels like something written on a poster not something that evokes any core experiences of moving through this reality we find ourselves in)
At the cost of AI work which I feel slightly bad about, but I think doing this was correct both because endeavoring to improve oneself is basically never a waste of time and because there’s a whole branch AI Safety work that is about the philosophy of what makes a good LLM, how do you instill human values. Thinking about my values is very adjacent to thinking about what are generically good values. Somewhat sidenote I am slightly down on the concept of things like Claude’s constitution because my model is that instantiating Claude with the full constitution vs something like “be good” lead to basically the same outcomes. But, there are many ways to instill human values into a model other than instantiating the chat with them, notoriously also from anthropic Constitutional AI where they had the model post-train itself (which has ofc has its own issues) to improve outputs based on a set of values, i.e. it looks at it’s own response and then applies a set of values to it to determine how the response could be better and then trains the model on those responses. Idk, this is sort of rambly but the claim is even if I am slightly suspicious i think there is value in thinking through what are values and what are the options for how to instill them into a model.
This is not a condensed post with only my best final ideas[1], this post is me writing across multiple days[2] as I try to work through a problem, enjoy.
I did something recently that I regret. I did something that I suspect hurt someone[3]. If I had asked myself in the moment whether the action I was about to take would hurt this person I would’ve been at least 30% certain that it would - but I didn’t consider it. If I had thought about it I would’ve realized that I don’t think being truthful to someone I don’t know that well is worth those odds of inflicting[4] pain, and yes sure it’s my belief that the truth I shared is “long term helpful” for them to know, but I also believe that people should have agency over receiving this sort of thing.[5]
Something that is very important to me is truthfulness. I believe that truthfulness is to some extent a core foundational building block of most of what’s good in life. Without communal truthfulness we aren’t living in reality. I think I’ve been bucketing truthfulness as a terminal value, I’m starting to suspect that it’s not. I believe that my relationship to truthfulness has been making my life worse[6] according to my values.[7]
So, what are my values?[8] The first answer is something like two buckets:
When I look at that it feels like a weak answer. Like yes, the two things I am juggling in my utility function are how good do I feel and to what extent am I a net positive (or negative) on the world. There are of course questions about what is positive impact on the world and the answer is something approximately “human flourishing”[9] shaped. But it feels trite to say that my values are the fact that I am simultaneously trying to optimizing for myself and for doing good for humanity[10].
The obvious[11] first alive thread to pull on is the fact that I said community, I could’ve just said joy - that’s interesting. (It is at this point that I read the Wikipedia page on values and talked to my roommate[12] about this.) Paying attention to what feels alive seems like it will get me closer to what’s real. It doesn’t matter what I intellectually think the correct values are if those values aren’t what they actually are. If one side goal is to potentially shift my values, it feels hard to do that if I don’t know what they actually are. What do I care about? What is alive?
After banging out that list I then mulled for a while: talked to some dudes in a sauna, chatted with some homies, sat in “what are my values”. The list didn’t feel right. Too long. Not focused. Not helpful. It’s more a collection of things that I care about than necessarily values. I think there’s something about values being things that are actually helpful for me towards living the life that I want. The hope is that thinking or saying the value out loud helps me move towards the behavior that I want.
So, I ranked my values!
Before doing this ranking I barely thought about growth, I suspect because it’s so core that I didn’t even really think of it as a value. I am constantly trying to get better to grow and it’s not because I think it’s a virtue in of itself to change myself, but because I know I can be better, I know I can be doing a better job of reaching my goals of accomplishing what I want to accomplish. I know I can do a better job of living a life that feels alive and real and fulfilling and sincere.
Fun vs. happiness[13]. Both “meaningful work” and “success”[14] are way higher than they would’ve been a couple years ago.[15] The trifecta of Friendship/Love/Community at 3rd/4th/5th place on the values quiz wasn’t surprising but it really drove home how important connection is to me despite how much I struggle with it.[16]
What I considered and then later snipped:
I spent hours talking with Claude and workshopped a bunch of ideas, really felt into which ones felt real and which ones didn’t. Here are some that didn’t.
A final list I felt good about:
But then, I did open circle.[18] I talked about my relationship to integrity, to correctly modeling myself, to ensuring that the things coming out of my mouth my thoughts my commitments are actually what I believe. That what I put into the world tracks reality. And wow, the response was not what I expected. I know I am relatively good at this[19] but the very visceral response from the people in my life was that I would be spending skill points[20] in the wrong place.
Their core claim was that I should be trying to improve in other places. “Instead of going from 95% truthful to 98% truthful you should go from 10% cleanup to 80% cleanup”. They claimed that they had literally never seen my lack of truthfulness or ability to model myself as an issue and that by focusing on that I was trying to never fall instead of getting better at falling. I’m not great at resolving situations in which I have hurt someone (“cleanup”). I struggle to deal with me having harmed someone, especially if I hurt someone taking an action that fits within one of my values. This is something that I am working on, but it was really really interesting to see multiple people who are very close to me agree that this is the place where they would like to see me grow. It makes sense that the thing that I am bad at is the thing I did a bad job of representing in my values. Deep mutual connection is important to me, being good is important to me - so I would like a set of values that does a good job of moving me towards what I truly want.
This idea of proactively repairing the emotional cuts I inflict is something I knew was important to work on, and yet at this state the values list doesn’t really hit this in a satisfying way. “Be good” is adjacent but it’s too vague to actually remind me in the moment to take the hard conversation. “Don’t wait don’t avoid, face things head on” is closer but still is only adjacent. Not avoiding not waiting is important here, but it doesn’t really cover that I want to be way more responsible way more attuned to the ways in which I cause harm. I want to be a fucking force of nature, but I can’t unless I actively respond with care when I inevitably get something wrong and hurt someone. “Reckon with how I fall short” is close but it’s more internal it’s about improving for next time.
Even after that feedback I still think internal integrity and truthfulness is important enough to keep a place on the values. It feels so unbelievably crucial to me. But, I definitely need to add something to remind myself to actively go out there and mend the cuts I have inflicted on other people!
Okay I think I’m actually done:
I never really defined what the goal of thinking through and writing down my values was. Mostly because I really didn’t have a crisp goal in mind. So what makes a good value? What informed my decisions when thinking through this? I think there’s a couple things going on. There’s what really feels alive as a descriptor of where I am at. There’s what feels alive when I think about who I want to be. There’s what is actually helpful, I want to use these values as a tool so they should be helpful! When I think about or say a value aloud it should be clarifying. This was the core weakness of “be in reality” and “be good” they are too vague they don’t feel clarifying. Saying “be good” to myself doesn’t actually help me be better, it isn’t a helpful signpost, it feels like looking up at the sky and hoping it won’t rain.
What’s Next?
I don’t think I’m done. Even though I have absolutely plowed hours into this endeavor[21] my values will certainly be changing and I am pretty sure I will decide I don’t like something about at least one of these. My current plan is to put together a super simple app that randomly chooses one of these each day for me to focus on. If you’re my friend you should definitely tell me if you think I’m not acting in line with my values, feel free to do it aggressively, you are giving me a gift.
Oh, and I’ve got at least 3 people I owe an apology/conversation to. I’m going to do that.
I'm trying out posting here on LW instead of just on my substack for things that feel sufficiently relevant. It's an experiment. I am extremely open to feedback whether that be "no keep this type of post away from here", or "yes I am glad you posted here I got a lot out of reading this".
Much longer than the average post which I bang out in one sitting.
As of a couple days after I wrote this sentence I have now gotten a large update that the person wasn’t hurt by my action, but I don’t think this changes my takeaway. I still think it was a net bad choice and I regret it even if I got away with it this time.
Yes it’s their response to be hurt by receiving my truthful accounting, but that doesn’t mean I am in the clear for doing something I knew would have real odds of a negative response.
For example. If someone is already feeling really bad, it’s probably not helpful to tell them in that moment another way I think they are fucking up their life even if I think it is helpful info for them to have long term. I think it is good to give people agency over whether they want to hear the hard thing.
It’s not to say that my relationship to truthfulness is bad, I think there are many many many other ways I could be which are way worse. But, I also think it’s clear that it could be better and I would like to live in the world where I am behaving in ways that are more in line with what I want, who I think I am, what my values actually are, etc.
Thanks to my friend who really pushed me on what I actually care about when it comes to truthfulness. I really appreciate it!
Why have I never really explicitly thought through what my values are before? That seems like a mistake. I have thought about what I care about, what would feel purposeful, but those are slightly different questions. I don’t love that I have never or at least can’t remember, explicitly doing this.
Which is also sort of leaky because I don’t care zero about other moral agents, but also when I consider questions like the extinction of a given lifeform vs 1 human life it’s noticeable that one of my first considerations is how would that extinction impact the ecosystem and therefore humanity’s future on this planet. There’s also more confusion in there because I’m not convinced that most animals live net happy lives, and part of caring about flourishing of moral agents is not just the number that are alive, but how good is the life. This would imply that I should care more about cost effective ways to improve how good the lives are of animals which feeling into it in the moment feels more impactful than just animal lives. But, it’s still way less than how good the lives are of humans.
And here the word humanity is confusing because I both do believe in doing good in effective ways for people that are very not in my life, and also I care more about the people who are in my life on some sort of grounds that a life well lived involves a community a tribe of people that care about each other and that it’s very important not to lose track of that. But maybe that’s an impact claim, I and the people around me will do more good if we all feel safe and taken care of. I guess this is a claim about giving away a smaller percent of a much larger pie. (both in physical resources but also emotional capacity etc)
Obvious to me at least, I’ve been told many times I will start a sentence like this and the thing I say is obvious is not obvious to others. I’m not sure if this is a habit worth changing. Obviously my way of thinking is in fact quite unique to me and I don’t at all believe the places my brain goes are representative.
He is currently in the process of trying to codify his values in an effort to better model whether he is acting in integrity with himself which feels like another good reason to actually know what my values are.
Fun feels active, playful a way of interacting with the world. Happiness feels more like a state more hedonistic.
Actually trying is important, but at the end of the day what matters is the actual impact what actually was done and did I succeed is really important. Did I actually do something
To some extent my history with NYC is me thinking purposeful work isn’t that important and deciding to optimize on making money at work and really working on community in my extensive free time. I believed that simply being a local community figure who makes the lives of the people I come into contact with consistently better, that this would be sufficient for a good life. Within the last couple years two things hit me. The first that simply stellar community is not nearly enough purpose or impact for me. The second is that even if I am great at bringing people together and curating community, that still doesn’t make me good at personally connecting with other individual souls. Comfy community co-living is a different skill from truly connecting with someone.
It’s hard to tell to what extent I am actually picky or if it’s just a skill issue I could fix or it’s just genuinely a hard problem. I do want us to feel so comfy and good and connected. And I have some of that! Even when in conflict with my housemates I still feel connected and good about them, but I struggle with the next level (whatever that means).
“be good to reality” doesn’t feel real it feels like something written on a poster not something that evokes any core experiences of moving through this reality we find ourselves in)
A minority of the other people had ever spent time to think through or solidified their values which was very interesting to hear.
Certainly compared to the average person in my life/community
My time and energy
At the cost of AI work which I feel slightly bad about, but I think doing this was correct both because endeavoring to improve oneself is basically never a waste of time and because there’s a whole branch AI Safety work that is about the philosophy of what makes a good LLM, how do you instill human values. Thinking about my values is very adjacent to thinking about what are generically good values. Somewhat sidenote I am slightly down on the concept of things like Claude’s constitution because my model is that instantiating Claude with the full constitution vs something like “be good” lead to basically the same outcomes. But, there are many ways to instill human values into a model other than instantiating the chat with them, notoriously also from anthropic Constitutional AI where they had the model post-train itself (which has ofc has its own issues) to improve outputs based on a set of values, i.e. it looks at it’s own response and then applies a set of values to it to determine how the response could be better and then trains the model on those responses. Idk, this is sort of rambly but the claim is even if I am slightly suspicious i think there is value in thinking through what are values and what are the options for how to instill them into a model.