Epistemic Status: Pointing at early stage concepts, but with high confidence that something real is here. Hopefully not the final version of this post.
When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly unconstrained resource, similar to having money in your bank account, and that we should make sure to primarily allocate power to the people who are good at thinking and making decisions.
That picture has changed a lot over the years. While I think there is still a lot of value in the idea of "philosopher kings", I've made a variety of updates that significantly changed my relationship to allocating power in this way:
- I have come to believe that people's ability to come to correct opinions about important questions is in large part a result of whether their social and monetary incentives reward them when they have accurate models in a specific domain. This means a person can have extremely good opinions in one domain of reality, because they are subject to good incentives, while having highly inaccurate models in a large variety of other domains in which their incentives are not well optimized.
- People's rationality is much more defined by their ability to maneuver themselves into environments in which their external incentives align with their goals, than by their ability to have correct opinions while being subject to incentives they don't endorse. This is a tractable intervention and so the best people will be able to have vastly more accurate beliefs than the average person, but it means that "having accurate beliefs in one domain" doesn't straightforwardly generalize to "will have accurate beliefs in other domains".
One is strongly predictive of the other, and that’s in part due to general thinking skills and broad cognitive ability. But another major piece of the puzzle is the person's ability to build and seek out environments with good incentive structures.
- Everyone is highly irrational in their beliefs about at least some aspects of reality, and positions of power in particular tend to encourage strong incentives that don't tend to be optimally aligned with the truth. This means that highly competent people in positions of power often have less accurate beliefs than competent people who are not in positions of power.
- The design of systems that hold people who have power and influence accountable in a way that aligns their interests with both forming accurate beliefs and the interests of humanity at large is a really important problem, and is a major determinant of the overall quality of the decision-making ability of a community. General rationality training helps, but for collective decision making the creation of accountability systems, the tracking of outcome metrics and the design of incentives is at least as big of a factor as the degree to which the individual members of the community are able to come to accurate beliefs on their own.
A lot of these updates have also shaped my thinking while working at CEA, LessWrong and the LTF-Fund over the past 4 years. I've been in various positions of power, and have interacted with many people who had lots of power over the EA and Rationality communities, and I've become a lot more convinced that there is a lot of low-hanging fruit and important experimentation to be done to ensure better levels of accountability and incentive-design for the institutions that guide our community.
I also generally have broadly libertarian intuitions, and a lot of my ideas about how to build functional organizations are based on a more start-up like approach that is favored here in Silicon Valley. Initially these intuitions seemed at conflict with the intuitions for more emphasis on accountability structures, with broken legal systems, ad-hoc legislation, dysfunctional boards and dysfunctional institutions all coming to mind immediately as accountability-systems run wild. I've since then reconciled my thoughts on these topics a good bit.
Somewhat surprisingly, "integrity" has not been much discussed as a concept handle on LessWrong. But I've found it to be a pretty valuable virtue to meditate and reflect on.
I think of integrity as a more advanced form of honesty – when I say “integrity” I mean something like “acting in accordance with your stated beliefs.” Where honesty is the commitment to not speak direct falsehoods, integrity is the commitment to speak truths that actually ring true to yourself, not ones that are just abstractly defensible to other people. It is also a commitment to act on the truths that you do believe, and to communicate to others what your true beliefs are.
Integrity can be a double-edged sword. While it is good to judge people by the standards they expressed, it is also a surefire way to make people overly hesitant to update. If you get punished every time you change your mind because your new actions are now incongruent with the principles you explained to others before you changed your mind, then you are likely to stick with your principles for far longer than you would otherwise, even when evidence against your position is mounting.
The great benefit that I experienced from thinking of integrity as a virtue, is that it encourages me to build accurate models of my own mind and motivations. I can only act in line with ethical principles that are actually related to the real motivators of my actions. If I pretend to hold ethical principles that do not correspond to my motivators, then sooner or later my actions will diverge from my principles. I've come to think of a key part of integrity being the art of making accurate predictions about my own actions and communicating those as clearly as possible.
There are two natural ways to ensure that your stated principles are in line with your actions. You either adjust your stated principles until they match up with your actions, or you adjust your behavior to be in line with your stated principles. Both of those can backfire, and both of those can have significant positive effects.
Who Should You Be Accountable To?
In the context of incentive design, I find thinking about integrity valuable because it feels to me like the natural complement to accountability. The purpose of accountability is to ensure that you do what you say you are going to do, and integrity is the corresponding virtue of holding up well under high levels of accountability.
Highlighting accountability as a variable also highlights one of the biggest error modes of accountability and integrity – choosing too broad of an audience to hold yourself accountable to.
There is tradeoff between the size of the group that you are being held accountable by, and the complexity of the ethical principles you can act under. Too large of an audience, and you will be held accountable by the lowest common denominator of your values, which will rarely align well with what you actually think is moral (if you've done any kind of real reflection on moral principles).
Too small or too memetically close of an audience, and you risk not enough people paying attention to what you do, to actually help you notice inconsistencies in your stated beliefs and actions. And, the smaller the group that is holding you accountable is, the smaller your inner circle of trust, which reduces the amount of total resources that can be coordinated under your shared principles.
I think a major mistake that even many well-intentioned organizations make is to try to be held accountable by some vague conception of "the public". As they make public statements, someone in the public will misunderstand them, causing a spiral of less communication, resulting in more misunderstandings, resulting in even less communication, culminating into an organization that is completely opaque about any of its actions and intentions, with the only communication being filtered by a PR department that has little interest in the observers acquiring any beliefs that resemble reality.
I think a generally better setup is to choose a much smaller group of people that you trust to evaluate your actions very closely, and ideally do so in a way that is itself transparent to a broader audience. Common versions of this are auditors, as well as nonprofit boards that try to ensure the integrity of an organization.
This is all part of a broader reflection on trying to create good incentives for myself and the LessWrong team. I will try to follow this up with a post that more concretely summarizes my thoughts on how all of this applies to LessWrong concretely.
- One lens to view integrity through is as an advanced form of honesty – “acting in accordance with your stated beliefs.”
- To improve integrity, you can either try to bring your actions in line with your stated beliefs, or your stated beliefs in line with your actions, or reworking both at the same time. These options all have failure modes, but potential benefits.
- People with power sometimes have incentives that systematically warp their ability to form accurate beliefs, and (correspondingly) to act with integrity.
- An important tool for maintaining integrity (in general, and in particular as you gain power) is to carefully think about what social environment and incentive structures you want for yourself.
- Choose carefully who, and how many people, you are accountable to:
- Too many people, and you are limited in the complexity of the beliefs and actions that you can justify.
- Too few people, too similar to you, and you won’t have enough opportunities for people to notice and point out what you’re doing wrong. You may also not end up with a strong enough coalition aligned with your principles to accomplish your goals.
[This post was originally posted on my shortform feed]
We can distinguish two things that both fall under what you're calling integrity:
It seems to me that, while (1) is generally virtuous, (2) is only selectively virtuous. I generally don't mind people abandoning their principles if they publicly say "well, I tried following these principles, and it didn't work / I stopped wanting to / I changed my mind about what principles are good / whatever, so I'm not following these anymore" (e.g. on Twitter). This can be quite useful to people who are tracking how possible it is to follow different principles given the social environment, including people considering adopting principles themselves. Unfortunately, principles are almost always abandoned silently.
It's worth noting that, in Judaism, because of the seriousness with which vows are treated, it is usually considered unvirtuous to make vows regularly:
And there are rituals for dissolving vows:
Which makes sense under the view that silent abandonment of principles, without proper ritualistic recognition, is much more of a problem than abandonment of principles with proper ritualistic recognition. (Posting that you changed or broke your principles on social media seems like a fine ritual for people who don't already have one)
I like this frame.
A related thing it brings to mind is something like "if you speak in support of something [a project or org] because you believe in it, and then later change your mind about it and think it's less good or harmful, you've done something bad to the commons by lending your credibility and then leaving that inertia there. “
(This could be woven directly into the ritual frame by having something kinda like swearing an oath in court, where you say "I'm making these claims to the best of my ability as a rationalist, upon my word. Furthermore, if I am to change my mind about these claims, I promise to make a good faith effort to do so publicly". Or variations on that)
I also like this frame. Some additional thoughts:
I roughly agree with this at the level of changing your mind about major principles once every few weeks or months. But if someone changes their stated principles in an unpredictable fashion every day (or every hour), then I think most of the benefits of openly stating your principles disappear. In particular accountability is only really enabled when the foundations of your principles last long enough to allow someone to both comprehend your beliefs and principles and your actions that tried to follow those principles. Since your actions usually tend to be delayed for a few weeks, there is value in not fundamentally changing your principles all the time.
Yes, I agree with this. (Though, of course, harm is minimized from constant principles-shifting if it's publicly declared, so no one expects the person to act consistently)
This post's focus on accountability kind of implies a continuous model of integrity, and I think there's an important near-discrete component which is, whether you're trying to use beliefs to navigate the world, or just social reality. Nightmare of the Perfectly Principled explored this a bit; enforcing consistency in a way that's not about trying to use beliefs doesn't help that much. For people who are trying to use their mental structures to model the world, pressure towards inconsistency doesn't register as a temptation to be nobly avoided, it registers as abuse and gaslighting.
I don't think I am successfully parsing this sentence. Could you rephrase?
This post seems like it's implying that integrity is a thing you might do a little more or less of on the margin depending on incentives. But some integrity is because people are trying to use their beliefs to model and act in reality, not just to be consistent.
Hmm, something about this feels off, so we might still be talking past each other. I think I agree with what you are saying, and did not intend to contradict it with my post.
For the context of this post, I used the word "integrity" (which is generally a massively overloaded word, so this isn't the only meaning) to point towards the idea of "acting in accordance with your stated beliefs about the world".
I explicitly am trying to highlight the skill of integrity as something that helps you push against certain incentives, and the importance of setting up incentives to help you act with more integrity (by making sure that you are being held accountable by a system that actually has any chance of understanding your true beliefs).
Like, I agree that people should try to use their beliefs to model and act in reality. At least in the definition of integrity that I tried to use in this post, that is kind of a prerequisite for integrity, but not itself sufficient. You also need to communicate or externalize your beliefs in order to be able to act with integrity (which enables accountability). And generally the reason why you want accountability is because you want others to help you increase the accuracy of your beliefs and the correspondence of your beliefs to your actions (and also to build trust with the others who hold you accountable).
Here are a couple specific ways I expect that lumping these cases together will case bad decisions (Raemon's comment helped me articulate this):
I'm not saying that people "should try" to use their beliefs to model and act in reality.
I'm saying that some people's minds are set up such that stated beliefs are by default reports about a set of structurally integrated (and therefore logically consistent) constraints on their anticipations. Others' minds seem to be concerned with making socially desirable assertions, where apparent consistency is a desideratum. The first group is going to have no trouble at all "acting in accordance with [their] stated beliefs about the world" so long as they didn't lie when they stated their beliefs, and the sort of accountability you're talking about seems a bit silly. The second group is going to have a great deal of trouble, and accountability will at best cause them to perform consistency when others are watching, not to take initiative based on their beliefs. (Cf. Guess culture screens for trying to cooperate.)
This seems weakly plausible but unlikely to me. By computational limitations it seems basically impossible to act in accordance with all of your stated beliefs, and there will always be some level of contradiction between your beliefs, as well as your correspondence of beliefs to actions. Figuring out how to leverage more than your own brain to notice inconsistencies in your actions and beliefs seems like a desirable goal for basically everyone.
And not only that, your internal beliefs are usually far too complicated to easily communicate to someone else. So even if you have internally consistent beliefs, it's a major challenge to communicate them in a way that allows other people to understand your consistency. This is why there is a tension between accuracy and transparency here (and hence the tradeoff I am pointing to in your choice of group of accountability).
To maybe make it more clear, accountability has two major benefits, both of which seem highly desirable to basically every person:
To maybe respond more concretely to your top-level comment, which I think I now understand better, I do think that a more continuous model is accurate here, though I share at least a bit of your sense (or at least what I perceive to be your sense) of there being some discrete shift between the two different modes of thinking.
I do however think that people can change what their primary mode of thinking is (at least over the course of years), and also think that for most people (and definitely for me) there is often an unendorsed temptation to use the profession of beliefs as speech acts and not as reporting of anticipated constraints of my future observations and I benefit a lot from being in an environment in which I am rewarded for doing the latter and not the former.
This exchange has given me the feeling of pushing on a string, so instead of pretending that I feel like engaging on the object level will be productive, I'm going to try to explain why I don't feel that way.
It seems to me like you're trying to find an angle where our disagreement disappears. This is useful for papering over disagreements or pushing them off, which can be valuable when that reallocates attention from zero-sum conflict to shared production or trade relations. But that's not the sort of thing I'd hope for on a rationalist forum. What I'd expect there is something more like double-cruxing, trying to find the angle at which our core disagreement becomes most visible and salient.
Sentences like this seem like a strong tell to me:
While "I think you're partly wrong, but also partly right" is a position I often hold about someone I'm arguing with, it doesn't clarify things any more than "let's agree to disagree." It can set the frame for a specific effort to articulate what exactly I think is wrong under what circumstances. What I would have hoped to see from you would have been more like:
Thanks for popping up a meta-level. Seems reasonable in this circumstance.
I agree with you that that one paragraph is mostly doing the "I think you're partly wrong, but also partly right" thing, but the rest of my comment doesn't really do that, so I am a bit sad/annoyed that you perceived that to be my primary intention (or at least that's what I read into your above comment).
I also think that paragraph is doing some other important work that isn't only about the "let's avoid a zero-sum conflict situation", but I don't really want to go into that too much, since I expect it to be less valuable than the other conversations we could be having.
The rest of my comment is pointing out some relatively concrete ways that make me doubt the things that you are saying. I have a model in my head of where you are coming from, and can see how that contradicts with other parts of reality that seem a lot more robust than the justifications that I think underlie your model.
I don't yet have a sense that you see those parts of reality that make me think that your models are unlikely to be correct, and so I was trying primarily to point them out to you, and then for you to either produce a response of how you have actually integrated them, or for you to change your mind.
I think this mostly overlaps with your second suggested frame, so I guess we can just continue from there. I think I know why you care, and can probably give at least an approximate model of where you are coming from. I tried to explain what I think you are missing, which was concretely the concerns around bounded computation and the relatively universal need for people to coordinate with other people, which seem to me to contradict some of the things you are saying.
Also happy to give a summary of where I think you are coming from, and what my best guess of your current model is. While I see some contradictions in your model (or my best guess of it), it does seem actually important to point out that I've found value in thinking about it and am interested in seeing it fleshed out further (and am as such interested in continuing this conversation).
This could either happen in the form of...
I don't really have any super strong preference for any of these, but will likely not respond for a day. After that, I will try summarizing your perspective a bit more explicitly and then either ask some followup questions or point out the contradictions I currently see in it more explicitly.
I don't understand the relevance of your responses to my stated model. I'd like it if you tried to explain why your responses are relevant, in a way that characterizes what you think I'm saying more explicitly.
My other most recent comment tries to show what your perspective looks like to me, and what I think it's missing.
I think this is the most helpful encapsulation I've gotten of your preferred meta-frame.
I think I mostly just agree with it now that it's spelled out a better. (I think I have some disagreements about how exactly rationalist forums should relate to this, and what moods are useful. But in this case I basically agree that the actions you suggest at the end are the right move and it seems better to focus on that).
This seems like a proposal to use the same kinds of postural adjustments on a group that includes anatomically complete human beings, and lumps of clay. Even if there's a continuum between the two, if what you want to produce is the former, adjustments that work for the latter are going to be a bad idea.
If someone's inconsistencies are due to an internal confusion about what's true, that's a different situation requiring a different kind of response from the situation in which those inconsistencies are due to occasionally lying when they have an incentive to avoid disclosing their true belief structure. Both are different from one in which there simply isn't an approximately coherent belief structure to be represented.
Can't answer for habryka, but my current guess of where you're point at here is something like: "the sort of drive towards consistency is part of an overall pattern that seems net harmful, and that the correct action is more like stopping and thinking than like 'trying to do better at what you were currently doing'."
(You haven't yet told me if this comment was successfully passing your ITT, but it's my working model of your frame)
I think habryka (and separately, but not coincidentally, me) has a belief that he's the sort of person where looking for opportunities to improve consistency is beneficial. I'm not sure whether you're disagreeing with that, or if you're point is more that the median-lesswrong will be taking wrong advice from this?
[Assuming I've got your frame right, I obviously disagree quite a bit – but I'm not sure what to do about it locally, here]
Thanks for checking - I'm trying to say something pretty different.
It seems like the frame of the OP is lumping together the kind of consistency that comes from using the native architecture to model the deep structure of reality (see also Geometers, Scribes, and the structure of intelligence), and the kind of consistency that comes from trying to perform a guaranteed level of service for an outside party (see also Unreal's idea of Dependability), and an important special case of the latter is rule-following as a form of submission or blame-avoidance. These are very different mental structures, respond very differently to incentives, and learn very different things from criticism. (Nightmare of the Perfectly Principled is my most direct attempt to point to this distinction.)
People who are trying to submit or avoid blame will try to alleviate the pressure of criticism with minimal effort, in ways that aren't connected to their other beliefs. On the other hand, people with structured models will sometimes leapfrog past the critic, or jump in another direction entirely, as Benito pointed out in A Sketch of Good Communication.
If we don't distinguish between these cases, then attempts to reason about the "optimal" attitude towards integrity or accountability will end up a lumpy, unsatisfactory linear compromise between the following policy goals:
Depending on what problem you're trying to solve, habryka's statement that "if someone changes their stated principles in an unpredictable fashion every day (or every hour), then I think most of the benefits of openly stating your principles disappear" can be almost exactly backwards.
If your principles predictably change based on your circumstances, that's reasonably likely to be a kind of adversarial optimization similar to A/B testing of communication. They don't mean their literal content, at least.
But there's plenty of point in principles consistent with learning new things fast. In that case, change represents noise, which is costly, but much less costly than messaging optimized for extraction. And of course changing principles doesn't need to imply a change in behavior to match - your new principles can and should take into account the fact that people may have committed resources based on your old stated principles.
In summary, my objection is that habryka seems to be thinking of beliefs as a special case of promises, while I think that if we're trying to succeed based on epistemic rationality, we should be modeling promises as a special case of beliefs. For more detail on that, see Bindings and Assurances.
Agree strongly with this decomposition of integrity. They're definitely different (although correlated) things.
My biggest disagreement with this model is that the first form (structurally integrated models) seems to me to be something broader? Something like, you have structurally integrated models of how things work and what matters to you, and take the actions suggested by the models to achieve what matters to you based on how things work?
Need to think through this in more detail. One can have what one might call integrity of thought without what one might call integrity of action based on that thought - you have the models, but others/you can't count on you to act on them. And you can have integrity of action without integrity of thought, in the sense that you can be counted on to perform certain actions in certain circumstances, without integrity of thought, in which case you'll do them whether or not it makes any sense, but you can at least be counted on. Or you can have both.
And I agree you have to split integrity of action into keeping promises when you make them slash following one's own code, and keeping to the rules of the system slash following others' codes, especially codes that determine what is blameworthy. To me, that third special case isn't integrity. It's often a good thing, but it's a different thing - it counts as integrity if and only if one is following those rules because of one's own code saying one should follow the outside code. We can debate under what circumstances that is or isn't the right code, and should.
So I think for now I have it as Integrity-1 (Integrity of Thought) and Integrity-2 (Integrity of Action), and a kind of False-Integrity-3 (Integrity of Blamelessness) that is worth having a name for, and tracking who has and doesn't have it in what circumstances to what extent, like the other two, but isn't obviously something it's better to increase than decrease by default. Whereas Integrity-1 is by default to be increased, as is Integrity-2, and if you disagree with that, this implies to me there's a conflict causing you to want others to be less effective, or you're otherwise trying to do extraction or be zero sum.
It seems to me that integrity of thought is actually quite a lot easier if it constrains the kind of anticipations that authentically and intuitively affect actions. Actions can still diverge from beliefs if someone with integrity of thought gets distracted enough to drop into a stereotyped habit (e.g. if I'm a bit checked out while driving and end up at a location I'm used to going to instead of the one I need to be at) or is motivated to deceive (e.g. corvids that think carefully about how to hide their food from other corvids).
The kind of belief-action split we're used to seeing, I think, involves a school-broken sort of "believing" that's integrated with the structures that are needed to give coherent answers on tests, but severed from thinking about one's actual environment and interests.
The most important thing I did for my health in the last few years was healing this split.
False-Integrity-3 seems to me that it's name could be Integrity of Innocence.
The concerns here make sense.
Something I still can't tell about your concern, though: one of the things that seemed like the "primary takeaway" here, at least to me, is the concept of thinking carefully about who you want to be accountable to (and to be wary of holding yourself accountable to too many people that won't be able to understand more complex moral positions you might hold)
So far, having thought through that concept through the various lenses of integrity you list here, it doesn't seem like something that likely to make things worse. Do you think it is?
(the other claim, to think about what sort of incentives you want for yourself, does seem like the sort of thing some people might interpret as instruction to create coercive environments for themselves. This was fairly different from what I think habryka was aiming at, but I'd agree that might not be clear from this post)
I don't think I find that objectionable, it didn't seem particularly interesting as a claim. It's as old as "you can only serve one master," god vs mammon, etc etc - you can't do well at accountability to mutually incompatible standards. I think it depends a lot on the type and scope of accountability, though.
If the takeaway were what mattered about the post, why include all the other stuff?
I think habryka was trying to get across a more cohesive worldview rather than just a few points. I also don't know that my interpretation is the same as his. But here are some points that I took from this. (Hmm. These may not have been even slightly obvious in the current post, but were part of the background conversation that prompted it, and would probably have eventually been brought up in a future post. And I think the OP at least hints at them)
First, I think there are people in the LessWrong readership who still have some naive conception of "be accountable to the public", which is in fact a recipe for
This is pretty different from how I'd describe this.
In some sense, you only get to serve one master. But value is complex and fragile. So there many be many facets of integrity or morality that you find important to pay attention to, and you might be missing some of them. Your master may contain multitudes.
Any given facet of morality (or more generally, things I care about it) is complicated. I might want, to hold myself to a higher standard that I currently meet, is to have several people whom I hold myself accountable to, each of which pays deep attention to a different facet.
If I'm running a company, I might want to be accountable to
I might want multiple people for each facet, who look at that facet through a different lens.
By having these facets represented in concrete individuals, I also improve my ability to resolve confusions about how to trade off multiple sacred values. Each individual might deeply understand their domain and see it as most important. But if they disagree, they can doublecrux with each other, or with me, and I can try to integrate their views into something coherent and actionable.
There's also the important operationalization of "what does accountable mean?" There's different powers you could give these people, possibly including:
There might be some people you trust with some powers but not others (i.e. you might think someone has good perspectives that justify the emergency double crux button but not the "fire you" button)
There's a somewhat different conception you could have of all this that's more coalition focused than personal development focused.
I feel like all of this mixes together info sources and incentives, so it feels a bit wrong to say I agree, but also feels a bit wrong to say I disagree.
I agree that there's a better, crisper version of this that has those more distinct.
I'm not sure if the end product, for most people, should keep them distinct because by default humans seem to use blurry clusters of concepts to simplify things into something manageable.
But, I think if you're aiming to be a robust agent, or build a robustly agentic organization, there's is something valuable about keeping these crisply separate so you can reason about them well. (you've previously mentioned that this is analogous to the friendly AI problem and I agree). I think it's a good project for many people in the rationalsphere to have undertaken to deepen our understanding, even if it turns out not to be practical for the average person.
The "different masters" thing is a special case of the problem of accepting feedback (i.e. learning from approval/disapproval or reward/punishment) from approval functions in conflict with each other or your goals. Multiple humans trying to do the same or compatible things with you aren't "different masters" in this sense, since the same logical-decision-theoretic perspective (with some noise) is instantiated on both.
But also, there's all sorts of gathering data from others' judgment that doesn't fit the accountability/commitment paradigm.
To give a concrete example, I expect math prodigies to have the easiest time solving any given math problem, but even so, I don't expect that a system that punishes the students who don't complete their assignments correctly will serve the math prodigies well. This, even if under other, totally different circumstances it's completely appropriate to compel performance of arbitrary assignments through the threat of punishment.
This post seems excellent overall, and makes several arguments that I think represent the best of LessWrong self-reflection about rationality. It also spurred an interesting ongoing conversation about what integrity means, and how it interacts with updating.
The first part of the post is dedicated to discussions of misaligned incentives, and makes the claim that poorly aligned incentives are primarily to blame for irrational or incorrect decisions. I’m a little bit confused about this, specifically that nobody has pointed out the obvious corollary: the people in a vacuum, and especially people with well-aligned incentive structures, are broadly capable of making correct decisions. This seems to me like a highly controversial statement that makes the first part of the post suspicious, because it treads on the edge of proving (hypothesizing?) too much: it seems like a very ambitious statement worthy of further interrogation that people’s success at rationality is primarily about incentive structures, because that assumes a model in which humans are capable and preform high levels of rationality regularly. However, I can’t think of an obvious counterexample (a situation in which humans are predictably irrational despite having well-aligned incentives for rationality), and the formulation of this post has a ring of truth for me, which suggests to me that there’s at least something here. Conditional on this being correct, and there not being obvious counterexamples, this seems like a huge reframing that makes a nontrivial amount of the rationality community’s recent work inefficient-if humans are truly capable of behaving predictably rationally through good incentive structures, then CFAR, etc. should be working on imposing external incentive structures that reward accurate modeling, not rationality as a skill. The post obliquely mentions this through discussion of philosopher-kings, but I think this is a case in which an apparently weaker version of a thesis actually implies the stronger form: philosopher-kings being not useful for rationality implies that humans can behave predictably rationally, which implies that rationality-as-skill is irrelevant. This seems highly under-discussed to me, and this post is likely worthy of further promotion solely for its importance to this issue.
However, the second broad part of the post, examining (roughly) epistemic incentive structures, is also excellent. I strongly suspect that a unified definition of integrity with respect to behavior in line with ideology would be a significant advance in understanding how to effectively evaluate ideology that’s only “viewable” through behavior, and I think that this post makes an useful first step in laying out the difficulties of punishing behavior unmoored from principles while avoiding enforcing old unupdated beliefs. The comment section also has several threads that I think are worthy of revisitation: while the suggestion of allowing totally free second-level updating was found untenable due to the obvious hole of updating ideology to justify in-the-moment behavior, the discussion of ritual around excessive vows and Zvi’s (I believe) un-followed-up suggestion of distinguishing beliefs from principle both seem to have real promise to them: my guess would be that some element of ritual is necessary to avoid cheapening principle and allowing for sufficient contradictory principles to justify any behavior.
Finally, the discussion of accountability seems the least developed, but also a useful hook for further discussion. I especially like the suggestion of “mandatory double-crux”‘powers: I’ve informally tried this system by double-cruxing controversial decisions before action and upon reflection, I believe it’s the right level and type of impediment: likely to induce reflection, a non-trivial inconvenience, but not a setting that’s likely to shake well-justified beliefs and cause overcorrection.
Overall, I support collation of this post, and would strongly support collation if it was updated to pull more on the many potential threads it leaves.
Minor note: the large paragraph blocks make this hard to read.
Related: Integrity for consequentialists by Paul Christiano
Most important comments from the original shortform version of the post:
"More than fine. Please do post a version on its own. A lot of strong insights here, and where I disagree there's good stuff to chew on. I'd be tempted to respond with a post.
I do think this has a different view of integrity than I have, but in writing it out, I notice that the word is overloaded and that I don't have as good a grasp of its details as I'd like. I'm hesitant to throw out a rival definition until I have a better grasp here, but I think the thing you're in accordance with is not beliefs so much as principles?"
"This was a great post that might have changed my worldview some.
I've heard people say things like this in the past, but haven't really taken it seriously as an important component of my rationality practice. Somehow what you say here is compelling to me (maybe because I recently noticed a major place where my thinking was majorly constrained by my social ties and social standing) and it prodded me to think about how to build "mech suits" that not only increase my power but incentives my rationality. I now have a todo item to "think about principles for incentivizing true beliefs, in team design."
Similarly, thinking explicitly about which groups I want to be accountable to sounds like a really good idea.
I had been going through the world keeping this Paul Graham quote in mind...
...choosing good friends, and and doing things that would impress them.
But what you're pointing at here seems like a slightly different thing. Which people do I want to make myself transparent to, so that they can judge if I'm living up to my values.
This also gave me an idea for a CFAR style program: a reassess your life workshop, in which a small number of people come together for a period of 3 days or so, and reevaluate cached decisions. We start by making lines of retreat (with mentor assistance), and then look at high impact questions in our life: given new info, does your current job / community / relationship / life-style choice / other still make sense?
Thanks for writing."
"See Sinclair: "It is difficult to get a man to understand something, when his salary depends upon his not understanding it!""
I tend to think of integrity as the ability to have true beliefs and take good action in spite of incentives. I'm thinking of the person who chooses not to break a principle, even when nobody is looking and it's not an obviously important case of the principle and there's a lot of moral/personal value to be gained from it.
But there is also the second part. A person with high integrity can make many principled decisions in spite of incentives, but a person with high integrity also notices when they're entering an environment where they're going to be faced with too many decisions to be able to make good ones.
For example, I have a personal rule against googling a large class of gossipy/political things that feel yummy except when I have explicitly time-boxed time to think through it, because at the minute I don't trust my background reasoning processes to incorporate the evidence in an unbiased way. I used to google gossipy/political topics a fair bit. Sometimes I still do it at midnight when I am bored and not tired. But I increasingly have been able to catch myself and say "this is not behaviour I endorse", even though it's quite difficult because there's not an alternative yummy thing to do. Increasingly, I'm becoming someone who can say "well, I made a policy decision to not take this action, so I won't". However the more useful thing is noticing that my phone is generally giving me a lot of decisions to make that are difficult and often many of the choices I don't endorse, and then systematically remove those options. I've done things like blocked social media on my phone except for 2 hours on Saturday, and blocked it permanently on my laptop, and followed some of Tristan Harris's advice on organising my apps. Both of these things preserve my ability to think clearly and take good action.
There's this idea that people with integrity can be handed power and be expected to continue doing the sort of things they did when they had less power - be the same sort of person, hold the same principles, etc. Or alternatively they will turn down the power if they think that they won't be able to be the same person. Following that, there's the old idea that the people who should be given power are those who don't want it. I'm not sure this really holds up - those who don't want it often have actual models predicting that they will experience failures of integrity. Though at least they have a model of where the mistakes will come and can try to prepare for them. Most people don't even know where their errors will come.
I'm trying to figure out whether "acting in accordance with your stated beliefs" feels like the right description. I guess that there's this thing relating to noticing when you will stop being the same kind of person, and avoiding taking that action unless you endorse it. I expect a person with a lot of integrity to change things about themselves and their actions, but in ways that they reflectively endorse, rather than being pulled around by the winds of the local incentives.
If I am to propose an alternative definition, it's that someone with integrity is someone I can trust to follow their long-term goals and not be thrown off-course by following short-term incentives. Someone who is able to turn down power when it will throw them off their long-term course, even if they haven't figured out how to get power a different way yet. Someone who will learn to say no to the short-term offers they are getting.
If I think about preserving system integrity over time, or an agent preserving goal integrity over time, I think of the ability for the system/agent to move through a wide variety of environments without being broken / fundamentally changed in ways it doesn't want by outside forces. This conceptualisation of integrity - being able to preserve the core parts of you and your goals over time - seems good to me. (Reminds me of Ray / Critch talking about being a robust agent.) Someone with integrity is wholly them and will stay whole over the long run, even if crazy things are thrown at them. It's not a claim about their competences/goals/beliefs now, it's a claim about the long-term integrity of their competences/goals/beliefs.
This post is relevant to my post on Dependability.
I'm at MAPLE in order to acquire a certain level of integrity in myself.
The high-reaching goal is to acquire a level of integrity that isn't much-influenced by short/medium-term incentives, such that I can trust myself to be in integrity even when I'm not in an environment that's conducive for that.
But that's probably a long way off, and in the meantime, I am just working on how to be the kind of person that can show up on time to things and say what needs to be said when it's scary and take responsibility for my mistakes and stuff.
I thumbs-up anyone who attempts to become more in integrity over time! Seems super worthwhile.
Not sure how strong you intend this statement to be (due to ambiguity of "often"), but I would think that all-else-equal, a randomly selected competent person with some measure of power has more accurate beliefs than a less competent person w/o power, even after controlling for e.g. IQ.
Would you disagree with that?
I'd grant that the people with the very most accurate beliefs are probably not the same as the people who are the very most competent, but that's mostly just because the tails come apart.
I'd also grant that having power subjects one to new biases. But being competent and successful is a strong filter for your beliefs matching reality (at least in some domains, and to the extent that your behavior is determined by your beliefs), while incompetence often seems to go hand-in-hand with various kinds of self-deception (making excuses, blaming others, having unrealistic expectations of what will work or not).
So overall I'd expect the competent person's beliefs to be more accurate.
I do expect there to be more going on here than just the tails coming apart, but I agree that on average people with some amount of power will probably have more accurate beliefs than people without power.
I also expect this to come apart earlier than what we would expect just from unbiased statistics and tails coming apart due to distortionary forces on power.
Got it, that makes sense.
I think you're focusing on the "competence" here when the active ingredient was more the "position of power" thing.
But being in a position of power filters for competence, and competence filters for accurate beliefs.
If the quoted bit had instead said:
I wouldn't necessarily have disagreed. But as is I'm pretty skeptical of the claim (again depending on what is meant by "often").
Made a related change to the OP (changed it from "much less competent" to just "competent").
I think the original phrasing is still fine, because it depends on the cutoff you are looking at. I think if you condition on the kind of people I tend to spend most of my time with, then the original phrasing holds, but it doesn't really hold if you just look at the general population.
Some relevant concepts in psychology:
I like this part best:
Importantly, these mostly won't be individuals. Instead, we mostly have groups, and the composition of those groups is not subject to our decisions, eg: our own families, our spouse's families, the other employees at work, the other congregants at church, the other members of the club, etc.
I feel strongly that selecting and acting within groups is a badly neglected area of moral reflection.
Part of the point as I saw it was that being accountable to a group limits the complexity of the types of moral logic you can be guided by.
i.e, if I'm accountable to all employees at work, my moral principles have to be simpler, and probably have to account for asymmetric justice. This doesn't necessarily mean I shouldn't be accountable to all the employees at work (if I'm their employer, or a fellow employee). But I saw the point of this post as "be wary of how exactly you operationalize that."
I'm inclined to agree that we need to be wary of how we operationalize accountability to groups.
But if it reduces the complexity of the moral logic, it should be simpler to express and abide by that logic. And yet, I see huge amounts of analysis for all the permutations of the individual case, and virtually none for the group one.
I am deeply and generally confused by this, not just in the context of the post. Why not reason about the group first, and then extend that reasoning as needed to deal with individual cases? This causes me to expect that the group case is much more difficult, like it has a floor of complexity or something.
This represents a lot of changes in my own thinking, and I have indeed thought about integrity a lot more since it was published. (I mean, I'm somewhat indirect because I work with Habryka full-time, so it's often more from talking than from this post, but I think this post represents some of the core ideas well.)
This post, and related discussions both in the comments here and elsewhere, has played a significant role in my thinking. Figuring out how integrity and accountability work seems important both for individual epistemics as well as group coordination.
I also found several of comments here helpful for fleshing out my thinking here (in particular this one by Benquo), which noted some ways in which habryka's model was incomplete.
For a long time, I viewed integrity as a virtue. For the past 15 years or so , I've created and strived to live by mission statements that elucidate my values. I had a strong sense of my ideal self, and a strong sense of my "actual self" and worked to live up to it as often as I could.
More recently, the concept of my "self" has started to seem less of a coherent concept, and the concept of an "ideal self" has started to seem less obvious to me. Instead, I've started to shift towards context-specific integrity, that is, being legible and consistent to specific groups of people, but not having a global set of principles that I live up to. This allows me to focus more on terminal values and things I care about, while still being predictable to outside people.
There's a sense where there's a global type of integrity which is something like "Am I actually acting in accordance with my current frame of meaning and the things I care about?" but there's no way to put that into a single consistent set of principles.
This seems similar to how I feel - I'm much more aware of varying contexts and of the changes in salience of the models I use than I used to be, and I put less weight on self-consistency across time. And more on self-consistency at a coarser level over time and across contexts - this is what I take from the "Integrity" desire.
It's both "am I acting in accordance with my current frame?" and "is my current frame compatible with (even if differing in many specifics, because it considers different factors in it's model) my overall meta-frame?". With a little bit of "am I transparent enough in my stated/communicated frames that I am not hurting someone in a way that violates my frame or meta-frame?".
I mostly agree with the direction you're going, but I find myself with something to disagree about in almost every point. I can't tell if these are irrelevant nitpicks, or if I'm getting a hint that some assumptions underlying the text are very different from mine. Two examples:
1) The title "Integrity and accountability are core parts of rationality" confuses me. Integrity and accountability are core parts of social and personal interaction, and important topics in the culture of any group (including rationalists). I don't know that they're core parts of rationality itself. Our disagreement here may be that I don't think "group rationality" is a thing. Rationality is individual, and some (perhaps all) groups would benefit by having more rational members and more rational behavior by members, but ones values and beliefs are fundamentally private and only partially communicable.
2) "This means a person can have extremely good opinions in one domain of reality, because they are subject to good incentives, while having highly inaccurate models in a large variety of other domains in which their incentives are not well optimized. " I like and agree with the observation, and I'm suspicious of the causality. I think we'll need to dive into categorization and legibility (to the incented and to the observer) of incentives to untangle what this means and when it's true.
To be clear, my intended point is definitely that integrity and accountability are a core part of individual rationality, not just group rationality. In particular, getting good at designing the incentives that you are under strikes me as likely a necessary step to actually be able to have accurate beliefs about the world. In some sense this requires thinking about other people, but it is with the aim of making your own models more accurate.
Oh, that's interesting, and not where I thought you were going. Knowing you mean it about your and my biases due to incentives, and that understanding and choosing the situations that have incentive structures that allow rational thinking for myself rather than general other-person incentives helps a lot. I think I can fully support that framing.
A decent solution to the "who should you be accountable to", from the wisdom of the ancients (shows thought on many of the considerations mentioned)
Leaving out "parents" gets rid of some of the obvious objections, but even then, I don't want my children to know about my sexual fetishes. Other objections may include, for instance, letting your friends know that you voted for someone who they think will ruin the country. And I certainly wouldn't want rationalist-but-unpopular opinions I hold to be on the front page of the local paper to be seen by everyone (Go ahead, see what happens when the front page of the newspaper announces that you think that you should kill a fat man to stop a trolley.) This aphorism amounts to "never compartmentalize your life" which doesn't seem very justifiable.
I got a strong associative bond with "authenticity" to contrast with "integrity". If you are under strong expectations you might just follow the incentives without really formulating your own stance in the matter. As discussed in the post integrity seems to focus on either words or actions changing but with autheticity it's about (duty to) discover who you are and communicating it to others. If unintegrity is a form of falsehood of dissonance between action and talk unauthenticity is a form of lieing by omission by not discovering facts, sharing irrelevant information or taking on the vaguest available role.
This is one of my favorite readings period in 2019. I remember retelling it's message in multiple conversations and it was important in my thinking about how my organization should think about accountability.
In many cases, you should simply downweight anything someone says, and look for integrity as a form of consistency of behavior. People can be wrong (without consciously intending to deceive) about their beliefs and motives, but it's much harder to be wrong about how one actually behaved.
Are you going to state your beliefs? I'm asking because I'm not sure what that looks like. My concern is that the statement will be very vague or very long and complex. Either way, you will have a lot of freedom to argue that actually your actions do match your statements, regardless of what those actions are. Then the statement would not be useful.
Instead I suggest that you should be accountable to people who share your beliefs. Having someone who disagrees with you try to model your beliefs and check your actions against that model seems like a source of conflict. Of course, stating your beliefs can be helpful in recognizing these people (but it is not the only method).
Thank you, your writing is very inspiring! improving self integrity is difficult, we must recognize ourselves first. sometimes the results of integrity are contrary to the original plan, but can't this life be determined according to plan? I read an inspiring story from a graduate of the dentistry faculty who switched to become a banker. after going through a great debate within him, Handayani was determined to integrate himself to make a decision. it is not wrong, it is precisely integrity and accountability that determines your identity. You can visit this link if you are interested in reading the story of Handayani more fully http://alumni.unair.ac.id/site/article/read/698/dari-dokter-gigi-loncat-ke-banker.html .
I think this can be read many ways. First, obviously if a person is subject to an incentive to hold true beliefs about X, they will start trying to learn about X and their beliefs will become more accurate. This part isn't very interesting.
The more interesting parts of your idea, I think, are the notions that
(1) In the absence of incentives to have true beliefs about X, people don't just have no beliefs about X, but in fact tend to have beliefs that are wrong.
(2) In the presence of incentives to have wrong beliefs about X, people tend to adopt those wrong beliefs.
I'm less convinced that these things are true generally. I do think they are true of many people if we define "belief" as "an opinion that a person expresses". But whether that's a reasonable definition of belief is unclear---I think that often the people for whom (1) and (2) are true are the same people who don't care whether their expressed opinions are correct. In that case the observation reduces to "if people don't care about saying true things, they will say things they are incentivized to say", which isn't surprising.
For the average LessWrong reader, I'm not convinced (1) and (2) are accurate. The observation that people tend to have beliefs that align with their incentives might instead be explained by a tendency for people with belief X to gravitate towards a position that rewards them for having it.
It seems to me that the way humans acquire language pretty strongly suggests that (2) is true. (1) seems probably false, depending on what you mean by incentives, though.
I do think people (including myself) tend towards adopting politically expedient beliefs when there is pressure to do so (esp. when their job, community or narrative are on the line).
This is based in part on person experience, and developing the skill of noticing what motions my brain makes in what circumstances.
Are good ways of incentivizing accurate models the same across domains? (As opposed to different, say based on differences between the domains, or their practitioners.)
So integrity is three things, and it's having all three of them, at the same time.
Four things. This part of "integrity" sounds like being able to hold up under stress, and being good with people.
This is surprising, because it seems like usually people are judged for the standards they express, not by the standards they express. Both seems like a 'surefire way to make people overly hesitant' to a) share their standards, b) actually do this integrity thing.
The common sense which makes this virtue seem virtuous, and useful, instead of a disaster.
I was trying to point at roughly three things. I did not intend for accountability to imply stress or to imply that you have to be good with people. There are definitely accountability relations that require you to hold up under stress or to be "good with people", but those skills are not the core thing that I am trying to point at.
Summarizing it as three things at the same time seems fine, though I do want to highlight that those three things can feel like a single motion in a way that I think is good, and why I think of it as one virtue and not three.
Agreed, although the 1st and the 3rd part do seem like the same thing - Communication. (The 2nd part is Action, the 4th part is something like Feedback/making sure you're on course.)
(Kind of brought to mind The Godfather, which happens to be the book my husband had me read to explain the familial dynamics in the household. What can I say, it works. At least until people start going senile.)