Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss

Wiki Contributions


To elaborate on that, Shear is presumably saying exactly as much as he is allowed to say in public. This implies that if the removal had nothing to do with safety, then he would say "The board did not remove Sam over anything to do with safety". His inserting of that qualifier implies that he couldn't make a statement that broad, and therefore that safety considerations were involved in the removal.

I expect safety of that to be at zero

At least it refuses to give you instructions for making cocaine.

For site libraries, there is indeed no alternative since you have to use some libraries to get anything done, so there you do have to do it on a case-by-case basis. In the case of exposing user data, there is an alternative - limiting yourself to only public data. (See also my reply to jacobjacob.)

we're a small team and the world is on fire, and I don't think we should really be prioritising making Dialogue Matching robust to this kind of adversarial cyber threat for information of comparable scope and sensitivity! 

I agree that it wouldn't be a very good use of your resources. But there's a simple solution for that - only use data that's already public and users have consented to you using. (Or offer an explicit opt-in where that isn't the case.)

I do agree that in this specific instance, there's probably little harm in the information being revealed. But I generally also don't think that that's the site admin's call to make, even if I happen to agree with that call in some particular instances. A user may have all kinds of reasons to want to keep some information about themselves private, some of those reasons/kinds of information being very idiosyncratic and hard to know in advance. The only way to respect every user's preferences for privacy, even the unusual ones, is by letting them control what information is used and not make any of those calls on their behalf.

My point is less about the individual example than the overall decision algorithm. Even if you're correct that in this specific instance, you can verify the whole trail of implications and be certain that nothing bad happens, a general policy of "figure it out on a case-by-case basis and only do it when it feels safe" means that you're probably going to make a mistake eventually, given how easy it is to make a mistake in this domain.

I've wondered the same thing; I've suggested before merging them, so that posts in shortform would automatically be posted into that month's open thread and vice versa. As it is, I every now and then can't decide which one to post in, so I post in neither.

We tenatively postulated it would be fine to do this as long as seeing a name on your match page gave no more than like a 5:1 update about those people having checked you. 

I would strongly advocate against this kind of thought; any such decision-making procedure relies on the assumption that you correctly figure out all the ways such information can be used, and that there isn't a clever way an adversary can extract more information than you had thought. This is bound to fail - people come up with clever ways to extract more private information than anticipated all the time. For example:

  • Timing Attacks on Web Privacy:
    • We describe a class of attacks that can compromise the privacy of users’ Web-browsing histories. The attacks allow a malicious Web site to determine whether or not the user has recently visited some other, unrelated Web page. The malicious page can determine this information by measuring the time the user’s browser requires to perform certain operations. Since browsers perform various forms of caching, the time required for operations depends on the user’s browsing history; this paper shows that the resulting time variations convey enough information to compromise users’ privacy.
  • Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)
    • We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.
  • De-anonymizing Social Networks 
    • We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy “sybil” nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary’s auxiliary information is small.
  • On the Anonymity of Home/Work Location Pairs
    • Many applications benefit from user location data, but location data raises privacy concerns. Anonymization can protect privacy, but identities can sometimes be inferred from supposedly anonymous data. This paper studies a new attack on the anonymity of location data. We show that if the approximate locations of an individual’s home and workplace can both be deduced from a location trace, then the median size of the individual’s anonymity set in the U.S. working population is 1, 21 and 34,980, for locations known at the granularity of a census block, census track and county respectively. The location data of people who live and work in different regions can be re-identified even more easily. Our results show that the threat of re-identification for location data is much greater when the individual’s home and work locations can both be deduced from the data.
  • Bubble Trouble: Off-Line De-Anonymization of Bubble Forms
    • Fill-in-the-bubble forms are widely used for surveys, election ballots, and standardized tests. In these and other scenarios, use of the forms comes with an implicit assumption that individuals’ bubble markings themselves are not identifying. This work challenges this assumption, demonstrating that fill-in-the-bubble forms could convey a respondent’s identity even in the absence of explicit identifying information. We develop methods to capture the unique features of a marked bubble and use machine learning to isolate characteristics indicative of its creator. Using surveys from more than ninety individuals, we apply these techniques and successfully reidentify individuals from markings alone with over 50% accuracy.

Hmm, I would actually expect neurotypicals to find this advice more useful, since they're more likely to have thoughts like "I can't do that, that'd be weird" while the stereotypical autist would be blissfully unaware of there being anything weird about it.

No worries! Yeah, I agree with that. These paragraphs were actually trying to explicitly say that things may very well not work out in the end, but maybe that wasn't clear enough:

Love doesn’t always win. There are situations where loyalty, cooperation, and love win, and there are situations where disloyalty, selfishness, and hatred win. If that wasn’t the case, humans wouldn’t be so clearly capable of both.

It’s possible for people and cultures to settle into stable equilibria where trust and happiness dominate and become increasingly beneficial for everyone, but also for them to settle into stable equilibria where mistrust and misery dominate, or anything in between.

I don't think any of these arguments depend crucially on whether there is a sole explicit goal of the training process, or if the goal of the training process changes a bunch. The only thing the argument depends on is whether there exist such abstract drives/goals 

I agree that they don't depend on that. Your arguments are also substantially different from the ones I was criticizing! The ones I was responding were ones like the following:

The central analogy here is that optimizing apes for inclusive genetic fitness (IGF) doesn't make the resulting humans optimize mentally for IGF. Like, sure, the apes are eating because they have a hunger instinct and having sex because it feels good—but it's not like they could be eating/fornicating due to explicit reasoning about how those activities lead to more IGF. They can't yet perform the sort of abstract reasoning that would correctly justify those actions in terms of IGF. And then, when they start to generalize well in the way of humans, they predictably don't suddenly start eating/fornicating because of abstract reasoning about IGF, even though they now could. Instead, they invent condoms, and fight you if you try to remove their enjoyment of good food (telling them to just calculate IGF manually). The alignment properties you lauded before the capabilities started to generalize, predictably fail to generalize with the capabilities. (A central AI alignment problem: capabilities generalization, and the sharp left turn)

15. [...] We didn't break alignment with the 'inclusive reproductive fitness' outer loss function, immediately after the introduction of farming - something like 40,000 years into a 50,000 year Cro-Magnon takeoff, as was itself running very quickly relative to the outer optimization loop of natural selection.  Instead, we got a lot of technology more advanced than was in the ancestral environment, including contraception, in one very fast burst relative to the speed of the outer optimization loop, late in the general intelligence game. [...]

16. Even if you train really hard on an exact loss function, that doesn't thereby create an explicit internal representation of the loss function inside an AI that then continues to pursue that exact loss function in distribution-shifted environments.  Humans don't explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn't produce inner optimization in that direction.  (AGI Ruin: A List of Lethalities)

Those arguments are explicitly premised on humans having been optimized for IGF, which is implied to be a single thing. As I understand it, your argument is just that humans now have some very different behaviors from the ones they used to have, omitting any claims of what evolution originally optimized us for, so I see it as making a very different sort of claim.

To respond to your argument itself:

I agree that there are drives for which the behavior looks very different from anything that we did in the ancestral environment. But does very different-looking behavior by itself constitute a sharp left turn relative to our original values?

I would think that if humans had experienced a sharp left turn, then the values of our early ancestors should look unrecognizable to us, and vice versa. And certainly, there do seem to be quite a few things that our values differ on - modern notions like universal human rights and living a good life while working in an office might seem quite alien and repulsive to some tribal warrior who values valor in combat and killing and enslaving the neighboring tribe, for instance.

At the same time... I think we can still basically recognize and understand the values of that tribal warrior, even if we don't share them. We do still understand what's attractive about valor, power, and prowess, and continue to enjoy those kinds of values in less destructive forms in sports, games, and fiction. We can read Gilgamesh or Homer or Shakespeare and basically get what the characters are motivated by and why they are doing the things they're doing. An anthropologist can go to a remote tribe to live among them and report that they have the same cultural and psychological universals as everyone else and come away with at least some basic understanding of how they think and why.

It's true that humans couldn't eradicate diseases before. But if you went to people very far back in time and told them a story about a group of humans who invented a powerful magic that could destroy diseases forever and then worked hard to do so... then the people of that time would not understand all of the technical details, and maybe they'd wonder why we'd bother bringing the cure to all of humanity rather than just our tribe (though Prometheus is at least commonly described as stealing fire for all of humanity, so maybe not), but I don't think they would find it a particularly alien or unusual motivation otherwise. Humans have hated disease for a very long time, and if they'd lost any loved ones to the particular disease we were eradicating they might even cheer for our doctors and want to celebrate them as heroes.

Similarly, humans have always gone on voyages of exploration - e.g. the Pacific islands were discovered and settled long ago by humans going on long sea voyages - so they'd probably have no difficulty relating to a story about sorcerers going to explore the moon, or of two tribes racing for the glory of getting there first. Babylonians had invented the quadratic formula by 1600 BC and apparently had a form of Fourier analysis by 300 BC, so the math nerds among them would probably have some appreciation of modern-day advanced math if it was explained to them. The Greek philosophers argued over epistemology, and there were apparently instructions on how to animate golems (arguably AGI-like) around by the late 12th/early 13th century.

So I agree that the same fundamental values and drives can create very different behavior in different contexts... but if it is still driven by the same fundamental values and drives in a way that people across time might find relatable, why is that a sharp left turn? Analogizing that to AI, it would seem to imply that if the AI generalized its drives in that kind of way when it came to novel contexts, then we would generally still be happy about the way it had generalized them.

This still leaves us with that tribal warrior disgusted with our modern-day weak ways. I think that a lot of what is going on with him is that he has developed particular strategies for fulfilling his own fundamental drives - being a successful warrior was the way you got what you wanted back in that day - and internalized them as a part of his aesthetic of what he finds beautiful and what he finds disgusting. But it also looks to me like this kind of learning is much more malleable than people generally expect. One's sense of aesthetics can be updated by propagating new facts into it, and strongly-held identities (such as "I am a technical person") can change in response to new kinds of strategies becoming viable, and generally many (I think most) deep-seated emotional patterns can at least in principle be updated. (Generally, I think of human values in terms of a two-level model, where the underlying "deep values" are relatively constant, with emotional responses, aesthetics, identities, and so forth being learned strategies for fulfilling those deep values. The strategies are at least in principle updatable, subject to genetic constraints such as the person's innate temperament that may be more hardcoded.)

I think that the tribal warrior would be disgusted by our society because he would rightly recognize that we have the kinds of behavior patterns that wouldn't bring glory in his society and that his tribesmen would find it shameful to associate with, and also that trying to make it in our society would require him to unlearn a lot of stuff that he was deeply invested in. But if he was capable of making the update that there were still ways for him to earn love, respect, power, and all the other deep values that his warfighting behavior had originally developed to get... then he might come to see our society as not that horrible after all.

I am confused by your AlphaGo argument because "winning states of the board" looks very different depending on what kinds of tactics your opponent uses, in a very similar way to how "surviving and reproducing" looks very different depending on what kinds of hazards are in the environment. 

I don't think the actual victory states look substantially different? They're all ones where AlphaGo has more territory than the other player, even if the details of how you get there are going to be different.

I predict that AlphaGo is actually not doing that much direct optimization in the sense of an abstract drive to win that it reasons about, but rather has a bunch of random drives piled up that cover various kinds of situations that happen in Go.

Yeah, I would expect this as well, but those random drives would still be systematically shaped in a consistent direction (that which brings you closer to a victory state).

Load More