Wiki Contributions


This makes a good point!

The only thing that I think constrains the ability to deceive in a simulation which I don't see mentioned here are energy/physical constraints. It's my assumption (could be wrong with very high intelligence, numbers, and energy) that it's harder, even if only by a tiny, tiny bit to answer the simulation trials deceptively than it is to answer honestly. So I think if the simulation is able to ask enough questions/perform enough trials, it will eventually see time differences in the responses of different programs, with unaligned programs on average taking longer to get the correct answers. So I don't think it's fundamentally useless to test program behavior in simulations to assess utility function if there is some kind of constraint involved like time it takes to executes steps of each algorithm. 

Awesome! Would you mind sending me the email address where you'd like to get the google doc invite? I should be sending it out sometime next week.

Sweet! Would you mind PMing me the email address you'd like the google doc sent to? You should be getting in around a week.

I think you should consider the legibility of the signals you send, but that should flow from a desire to monitor yourself so you can improve and be consistent with your higher goals. I feel like you’re assuming virtue signal means manipulative signal, and I suppose that’s my fault for taking a word whose meaning seems to have been too tainted and not being explicit about trying to reclaim it more straightforwardly as “emissions of a state of real virtue”.

Maybe in your framework it would be more accurate to say to LWers: “Don’t fall into the bad virtue signal of not doing anything legibly virtuous or with the intent of being virtuous. Doing so can make it easy to deceive yourself and unnecessarily hard to cooperate with others.”

It seems like the unacknowledged virtue signals among rationalists are 1) painful honesty, including erring on the side of the personally painful course of action when it’s not clear which is most honest and dogpiling on any anyone who seems to use PR, and 2) unhesitant updating (goodharting “shut up and multiply”) that doesn’t indulge qualms of the intuition. If they could just stop doing these then I think they might be more inclined to use the legible virtue signals I’m advocating as a tool, or at the very least they would focus on developing other aspects of character.

I also think if thinking about signaling is too much of a mindfuck (and it has obviously been a serious mindfuck for the community) that not thinking about it and focusing on being good, as you’re suggesting, can be a great solution.

Suggestions for new terms and strategies for preventing them being co-opted too?

I think it's too early to say the true meaning of virtue signal is now tribal signal. I wish to reclaim the word before that happens. At the very least I want people to trip on the phrase a little when they reach for it lazily, because the idea of signaling genuine virtue is not so absurd that it could only be meant ironically. 

> If people optimize to gain status by donating and being vegan, you can't trust people who donate and are vegan to do moves that cost them status but that would result in other positive ends.

How are people supposed to know their moves are socially positive? 

Also I'm not saying to make those things the only markers of status. You seem to want to optimize for costly signals of "honesty", which I worry is being goodharted in this conversation.

Editing pictures that you publish on your own website to remove uncomfortable information, is worse than just not speaking about certain information. It would be possible to simply not publish the photo. Deciding to edit it to remove information is a conscious choice that's a signal.

I don't know this full situation or what I would conclude about it but I don't think your interpretation is QED on its face. Like I said, I feel like it is potentially more dishonest or misleading to seem to endorse Leverage. Idk why they didn't just not post the pictures at all, which seems the least potentially confusing or deceptive, but the fact that they didn't doesn't lead me to conclude dishonesty without knowing more.

I actually think LWers tend toward the bad kind of virtue signaling with honesty, and they tend to define honesty as not doing themselves any favors with communication. (Makes sense considering Hanson's foundational influence.)

Load More