Wiki Contributions


A couple of guesses for why we might see this, which don't seem to depend on property:

  • An obligation to act is much more freedom-constraining than a prohibition on an action. The more and more one considers all possible actions with the obligation to take the most ethically optimal one, the less room they have to consider exploration, contemplation, or pursuing their own selfish values. Prohibition on actions does not have this effect.
  • The environment we evolved in had roughly the same level of opportunity to commit harmful acts, bur far less opportunity to take positive consequentialist action (and far less complicated situations to deal with). It was always possible to hurt your friends and suffer consequences, but it was rare to have to think about the long term consequences of every action.
  • The consequences of killing, stealing, and hurting people are easier to predict than altruistic actions. Resources are finite, therefore sharing them can be harmful or beneficial, depending on the circumstances and who they are shared with. Other people can defect or refuse to reciprocate. If you hurt someone, they are almost guaranteed to retaliate. If you help someone, there is no guarantee there will be a payoff for you.

It seems to construct an estimate of it by averaging a huge number of observations together before each update (for Dota 5v5, they say each batch is around a million observations, and I'm guessing it processes about a million batches). The surprising thing is that this works so well, and it allows leveraging of computational resources very easily.

My guess for how it deals with partial observability in a more philosophical sense is that it must be able to store an implicit model of the world in some way, in order to better predict the reward it will eventually observe. I'm beginning to wonder if the distinction between partial and full observability isn't very useful after all. Even with AlphaGo, even though it can see the whole board, there are also a whole bunch of "spaces" it can't see fully, possibly the entire action space, the space of every possible game trajectory, or the mathematical structure of play strategies. And yet, it managed to infer enough about those spaces to become good at the game.

I don't know how hard it would be to do a side by side "FLOPS" comparison of Dota 5v5 vs AlphaGo / AlphaZero, but it seems like they are relatively similar in terms of computational cost required to achieve something close to "human level". However, as has been noted by many, Dota is a game of vastly more complexity because of its continuous state, partial observability, large action space, and time horizon. So what does it mean when it requires roughly similar orders of magnitude of compute to achieve the same level of ability as humans, using a fairly general architecture and learning algorithm?

Some responses to AlphaGo at the time were along the lines of "Don't worry too much about this, it looks very impressive, but the game still has a discrete action space and is fully observable, so that explains why this was easy."

I've been meditating since I was about 19, and before I came across rationality / effective altruism. There is quite a bit of overlap between the sets of things I've been able to learn from both schools of thought, but I think there are still a lot of very useful (possibly even necessary) things that can only be learned from meditative practices right now. This is not because rationality is inherently incapable of learning the same things, but because within rationality it would take very strong and well developed theories, perhaps developed through large scale empirical observations of human behavior, to come to the same conclusions. On the other hand, with meditation a lot of these same conclusions are just "obvious."

Most of these things have to do with subtle issues of psychology, particularly with values and morality. For example, before I began meditating, I generally believed that:

  • Moral principles could be determined logically from a set of axioms that were "self-evidently true" and that once I deduced those things, I would simply follow them.
  • The set of things that seemed to make me happy, like having friends, being in love, feeling accomplished, were not incompatible with true moral principles, and in fact were instrumentally helpful in achieving terminal moral goals.
  • I intrinsically value what is moral. If it ever seemed like I valued what was not moral, I could chalk it up to temporary or easily surmountable issues, like vestigial animal instincts or lack of willpower. Basically desires that could be easily overridden.
  • Pleasure, pain, and emotions were more like guidelines, things that made it possible to act quickly in certain situations. Insofar as certain forms of pleasure were "intrinsic values" (like love) they did not interfere with moral goals. They were not things that determined my behavior very strongly, and certainly they didn't have subtle cascading effects on the entire set of my beliefs.

After having meditated for a long time, many of these beliefs were eradicated. Right now it seems more likely that:

  • My values are not even consistent, let alone determined by moral principles. It's not clear that deducing a good set of moral principles could even change my values.
  • My values are malleable, but not easily malleable in a direction that can be controlled by me (not without a ton of meditation, anyway).
  • The formalization of my values in my mind are not a good predictor of what my actions will be. A better predictor involves far more short term mechanisms in my psyche.
  • The beliefs I had prior to meditating were more likely constructed so that I could report these to other people in a way that would make them more likely to value me and approve of me.
  • Values that truly do seem hard to deconstruct are surprisingly selfish. For example, I assumed that I valued approval from other humans because this was an instrumental goal in helping me judge the quality of my actions. It now seems more likely that social approval is in fact an intrinsic goal, which is very worrying to me in regards to my ability to attain my altruistic goals.

If it turns out that meditating has given me better self-reflective capabilities, and the things I've observed are accurate, then this has some pretty far-reaching implications. If I'm not extremely atypical, then most people are probably very blind to their own intrinsic values. This is a worrying prospect for the long-term efficacy of effective altruism.

Hopefully this isn't too controversial to say, but it seems to me like a lot of the main currents within EA are operating more-or-less along the lines of my prior-to-meditating beliefs. Here I'm thinking about the type of ethics where you are encouraged to maximize your altruistic output. Things like, "earn to give", "choose only the career that maximizes your ability to be altruistic", "donate as much of your time and energy as you can to being altruistic", etc. Of course EA thought is very diverse, so this doesn't represent all of it. But the way that my values currently seem structured, it's probably unrealistic that I could actually fulfill these, unless I experienced an abnormally large amount of happiness for each altruistic act that outweighed most of my other values. It's of course possible that I'm unusually selfish or even a sociopath, but my prior on that is very low.

On the other hand, if my values really are malleable, and it is possible to influence those values, then it makes sense for me to spend a lot of time deciding how that process should proceed. This is only possible because my values are inconsistent. If they were consistent, it would be against my values to change them, but it seems that once a set of values is inconsistent, it could actually make sense to try to alter them. And meditation might turn out to be one of the ways to make these kind of changes to your own mind.

It seems like in the vast majority of conversations, we find ourselves closer to the "exposed to the Deepak Chopra version of quantum mechanics and haven't seen the actual version yet" situation than we do to the "Arguing with someone who is far less experienced and knowledgeable than you are on this subject." In the latter case, it's easy to see why steelmanning would be counterproductive. If you're a professor trying to communicate a difficult subject to a student, and the student is having trouble understanding your position, it's unhelpful to try to "steelman" the student (i.e. try to present a logical-sounding but faulty argument in favor of what the student is saying), but it's far more helpful to the student to try to "pass their ITT" by modeling their confusions and intuitions, and then use that to try to help them understand the correct argument. I can imagine Eliezer and Holden finding themselves in this situation more often than not, since they are both experts in their respective fields and have spent many years refining their reasoning skills and fine-tuning the arguments to their various positions on things.

But in most situations, for most of us who may not quite know how strong the epistemological ground we stand on really is, are probably using some mixture of flawed intuitions and logic to present our understandings of some topic. We might also be modeling people whom we really respect as being in a similar situation as we are. In which case it seems like the line between steelmanning and ITT becomes a bit blurry. If I know that both of us are using some combination of intuition (prone to bias and sometimes hard to describe), importance weighting of various facts, and different logical pathways to reach some set of conclusions, both trying to pass each other's ITT as well as steelmanning potentially have some utility. The former might help to iron out differences in our intuitions and harder to formalize disagreements, and the latter might help with actually reaching more formal versions of arguments, or reasoning paths that have yet to be explored.

But I do find it easy to imagine that as I progress in my understanding and expertise in some particular topic, the benefits of steelmanning relative to ITT do seem to decrease. But it's not clear to me that I (or anyone outside of the areas they spend most of their time thinking about) have actually reached this point in situations where we are debating with or cooperating on a problem together with respected peers.

I don't see him as arguing against steelmanning. But the opposite of steelmanning isn't arguing against an idea directly. You've got to be able to steelman an opponent's argument well in order to argue against it well too, or perhaps determine that you agree with it. In any case, I'm not sure how to read a case for locally valid argumentation steps as being in favor of not doing this. Wouldn't it help you understand how people arrive at their conclusions?

I would also like to have a little jingle or ringtone play every time someone passes over my comments, please implement for Karma 3.0 thanks

What's most unappealing to me about modern, commercialized aesthetics is the degree to which the bandwidth is forced to be extremely high - something I'd call the standardization of aesthetics. When I walk down the street in the financial district of SF, there's not much variety to be found in people's visual styles. Sure, everything looks really nice, but I can't say that it doesn't get boring after a while. It's clear that a lot of information is being packed into people's outfits, so I should be able to infer a huge amount about someone just by looking at them. Same thing with websites. There's really only one website design. Can it truly be said that there is something inherently optimal about these designs? I strongly suspect no. There are more forces at play that guarantee convergence that don't depend on optimality.

Part of it might be the extremely high cost of defection. As aesthetics is a type of signalling mechanism, most of what Robin Hanson says applies here. It's just usually not worth it to be an iconoclast or truly original. And at some point we just start believing the signals are inherently meaningful, because they've been there for so long. But all it takes is to look at the different types of beauty produced by other cultures or at different points in human history to see that this is not the case. The color orange, in silicon valley, might represent "innovation" or "ingenuity" (look at Tensorflow's color scheme), but the orange robes of Buddhist monks evoke serenity, peace and compassion (but of course the color was originally dependent on the dyes that were available). However, one can also observe that there is little variety within each culture as well, suggesting that the same forces pushing towards aesthetic convergence are at play.

The sum of the evidence suggests to me that I am getting an infinitesimal fraction of the possible pleasant aesthetic experiences which could feasibly be created by someone given that they were not subject to signalling constraints. This seems deeply disappointing.

It seems like this objection might be empirically testable, and in fact might be testable even with the capabilities we have right now. For example, Paul posits that AlphaZero is a special case of his amplification scheme. In his post on AlphaZero, he doesn't mention there being an aligned "H" as part of the set-up, but if we imagine there to be one, it seems like the "H" in the AlphaZero situation is really just a fixed, immutable calculation that determines the game state (win/loss/etc.) that can be performed with any board input, with no risk of the calculation being incorrectly performed, and no uncertainty of the result. The entire board is visible to H, and every board state can be evaluated by H. H does not need to consult A for assistance in determining the game state, and A does not suggest actions that H should take (H always takes one action). The agent A does not choose which portions of the board are visible to H. Because of this, "H" in this scenario might be better understood as an immutable property of the environment rather than an agent that interacts with A and is influenced by A. My question is, to what degree is the stable convergence of AlphaZero dependent on these properties? And can we alter the setup of AlphaZero such that some or all of these properties are violated? If so, then it seems as though we should be able to actually code up a version in which H still wants to "win", but breaks the independence between A and H, and then see if this results in "weirder" or unstable behavior.

I can't emphasize enough how important the thing you're mentioning here is, and I believe it points to the crux of the issue more directly than most other things that have been said so far. 

We can often weakman postmodernism as making basically the same claim, but this doesn't change the fact that a lot of people are running an algorithm in their head with the textual description "there is no outside reality, only things that happen in my mind." This algorithm seems to produce different behaviors in people than if they were running the algorithm "outside reality exists and is important." I think the first algorithm tends to produce behaviors that are a lot more dangerous than the latter, even though it's always possible to make philosophical arguments that make one algorithm seem much more likely to be "true" than the other. It's crucial to realize that not everyone is running the perfectly steelmanned version of such algorithms to do with updating our beliefs based on observations of the processes of how we update on our beliefs, and such things are very tricky to get right. 

Even though it's valid to make observations of the form "I observe that I am running a process that produces the belief X in me", it is definitely very risky to create a social norm that says such statements are superior to statements like "X is true" because such norms create the tendency to assign less validity to statements like "X is true". In other words, such a norm can itself become a process that produces the belief "X is not true" when we don't necessarily want to move our beliefs on X just because we begin to understand how the processes work. It's very easy to go from "X is true" to "I observe I believe X is true" to "I observe there are social and emotional influences on my beliefs" to "There are social and emotional influences on my belief in X" to finally "X is not true" and I can't help but feel a mistake is being made somewhere in that process. 

Load More