Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss

Wiki Contributions


in particular, we don't typically think of freedom as a property of relationships, but rather a property of individuals.

How about "spaciousness" (as in the relationship giving both individuals the space to move/act as they prefer) instead of freedom/trust?


Some notable/famous signatories that I noted: Geoffrey Hinton, Yoshua Bengio, Demis Hassabis (DeepMind CEO), Sam Altman (OpenAI CEO), Dario Amodei (Anthropic CEO), Stuart Russell, Peter Norvig, Eric Horvitz (Chief Scientific Officer at Microsoft), David Chalmers, Daniel Dennett, Bruce Schneier, Andy Clark (the guy who wrote Surfing Uncertainty), Emad Mostaque (Stability AI CEO), Lex Friedman, Sam Harris.

Edited to add: a more detailed listing from this post:

Signatories include notable philosophers, ethicists, legal scholars, economists, physicists, political scientists, pandemic scientists, nuclear scientists, and climate scientists. [...]

Signatories of the statement include:

  • The authors of the standard textbook on Artificial Intelligence (Stuart Russell and Peter Norvig)
  • Two authors of the standard textbook on Deep Learning (Ian Goodfellow and Yoshua Bengio)
  • An author of the standard textbook on Reinforcement Learning (Andrew Barto)
  • Three Turing Award winners (Geoffrey Hinton, Yoshua Bengio, and Martin Hellman)
  • CEOs of top AI labs: Sam Altman, Demis Hassabis, and Dario Amodei
  • Executives from Microsoft, OpenAI, Google, Google DeepMind, and Anthropic
  • AI professors from Chinese universities
  • The scientists behind famous AI systems such as AlphaGo and every version of GPT (David Silver, Ilya Sutskever)
  • The top two most cited computer scientists (Hinton and Bengio), and the most cited scholar in computer security and privacy (Dawn Song)

Relevant: Goh et al. finding multimodal neurons (ones responding to the same subject in photographs, drawings, and images of their name) in the CLIP image model, including ones for Spiderman, USA, Donald Trump, Catholicism, teenage, anime, birthdays, Minecraft, Nike, and others.

To caption images on the Internet, humans rely on cultural knowledge. If you try captioning the popular images of a foreign place, you’ll quickly find your object and scene recognition skills aren't enough. You can't caption photos at a stadium without recognizing the sport, and you may even need to know specific players to get the caption right. Pictures of politicians and celebrities speaking are even more difficult to caption if you don’t know who’s talking and what they talk about, and these are some of the most popular pictures on the Internet. Some public figures elicit strong reactions, which may influence online discussion and captions regardless of other content.

With this in mind, perhaps it’s unsurprising that the model invests significant capacity in representing specific public and historical figures — especially those that are emotional or inflammatory. A Jesus Christ neuron detects Christian symbols like crosses and crowns of thorns, paintings of Jesus, his written name, and feature visualization shows him as a baby in the arms of the Virgin Mary. A Spiderman neuron recognizes the masked hero and knows his secret identity, Peter Parker. It also responds to images, text, and drawings of heroes and villians from Spiderman movies and comics over the last half-century. A Hitler neuron learns to detect his face and body, symbols of the Nazi party, relevant historical documents, and other loosely related concepts like German food. Feature visualization shows swastikas and Hitler seemingly doing a Nazi salute.

Which people the model develops dedicated neurons for is stochastic, but seems correlated with the person's prevalence across the dataset 16 and the intensity with which people respond to them. The one person we’ve found in every CLIP model is Donald Trump. It strongly responds to images of him across a wide variety of settings, including effigies and caricatures in many artistic mediums, as well as more weakly activating for people he’s worked closely with like Mike Pence and Steve Bannon. It also responds to his political symbols and messaging (eg. “The Wall” and “Make America Great Again” hats). On the other hand, it most *negatively* activates to musicians like Nicky Minaj and Eminem, video games like Fortnite, civil rights activists like Martin Luther King Jr., and LGBT symbols like rainbow flags.

Many commenters seem to be reading this post as implying something like slavery and violence being good or at least morally okay. Which is weird, since I didn't get that impression - especially since the poster explicitly says they don't support slavery and even quotes someone saying that a defense of slavery was an "idiotic" explanation.

I don't read the post as making any claim about what is ultimately right or wrong. Rather, I read it as a caution similar to the common points of "how sure are you that you would have made the morally correct choice if you had been born as someone benefiting from slavery back when it was a thing" combined with "the values that we endorse are strongly shaped by self-interest and motivated cognition"; both the kinds of sentiments that were made many times in the Sequences as well as the original Overcoming Bias blog.

Then you quote Samuel Cartwright "conjuring up creatively compelling excuses" for slavery, and never argue against the quotation.

Do you mean this quote?

Gurwinder cites exactly such an example with the 19th century physician Samuel A. Cartwright:

A strong believer in slavery, he used his learning to avoid the clear and simple realization that slaves who tried to escape didn’t want to be slaves, and instead diagnosed them as suffering from a mental disorder he called drapetomania, which could be remedied by “whipping the devil” out of them. It’s an explanation so idiotic only an intellectual could think of it.

That's someone criticizing Cartwright's practice of coming up with such excuses, so having the quote is already an argument against Cartwright (and thus slavery). Arguing against the quotation would be arguing for slavery and oppression.

outweigh an extra 3 to 7 years of working on alignment

Another relevant-seeming question is the extent to which LLMs have been a requirement for alignment progress. It seems to me like LLMs have shown some earlier assumptions about alignment to be incorrect (e.g. pre-LLM discourse had lots of arguments about how AIs have to be agentic in a way that wasn't aware of the possibility of simulators; things like the Outcome Pump thought experiment feel less like they show alignment to be really hard than they did before, given that an Outcome Pump driven by something like an LLM would probably get the task done right).

In old alignment writing, there seemed to be an assumption that an AGI's mind would act more like a computer program than like a human mind. Now with us seeing an increased number of connections between the way ANNs seem to work like and the way the brain seems to work like, it looks to me as if the AGI might end up resembling a human mind quite a lot as well. Not only does it weaken the conclusions of some previous writing, it also makes it possible to formulate approaches to alignment that draw stronger inspiration from the human mind, such as my preference fulfillment hypothesis. Even if you think that that one is implausible, various approaches to LLM interpretability look like they might provide insights into how later AGIs might work, which is the first time that we've gotten something like experimental data (as opposed to armchair theorizing) to the workings of a proto-AGI.

What this is suggesting to me is that if OpenAI didn't bet on LLMs, we effectively wouldn't have gotten more time to do alignment research, because most alignment research done before an understanding of LLMs would have been a dead end. And that actually solving alignment may require people who have internalized the paradigm shift represented by LLMs and figuring out solutions based on that. Under this model, even if we are in an insight-constrained world, OpenAI mostly hasn't burned away effective years of alignment research (because alignment research carried out before we had LLMs would have been mostly useless anyway).

Thanks, I'd tried self-administered EMDR sometime before and didn't get much out of it. Now I gave it another shot and it caused some stuff to surface, so seemed to be doing at least something even if I didn't get to the root of the issue yet.

Do you have any thoughts on how I should try to balance the external stimuli vs. the internal content? I notice that it's easy for either the EMDR stimuli to push the emotional content out of consciousness or vice versa. Should I try to keep them exactly balanced, or predominantly emotional content with some stimuli, or predominantly stimuli with some emotional content?

I also wondered about, when I was focusing on a felt sense and memory fragments started coming up, should I "make more room" for those memory fragments or just ignore them and keep allocating exactly the same amount of mental space to the felt sense.

That survey result feels hard to square with reports like this:

Three weeks ago I went to a soccer match between Shanghai SIPG and FC Seoul. After the game the traffic around the area was quite heavy. I was waiting for a pedestrian light to turn green when a couple in their electric scooter went through a red light, an old lady hit them and the three of them fell to the ground. The couple got up, yelled something to the old lady and then just got on the scooter and left. The old lady stayed there for some minutes while people passing by didn’t even try to help her.

This may be a weird situation for a foreigner who hasn’t been in China before, but it’s a normal thing to see here. When an accident occurs, people would not try to help others and would try to avoid any contact with the people involved in it.

While individualism in China is a big thing, this situation is more related to the fear of being accused as the responsible of the accident, even when you just tried to help.

The most popular case happened in the city of Nanjing, a city located at the west of Shanghai. The year was 2006 when Xu Shoulan, an old lady trying to get out of a bus, fell and broke her femur. Peng Yu, was passing by and helped her taking her to the hospital and giving her ¥200 (~30 USD) to pay for her treatment. After the first diagnosis Xu needed a femur replacement surgery, but she refused to pay it by herself so she demanded Peng to pay for it, as he was the responsible of the accident according to her. She sued him and after six months she won and Peng needed to cover all the medical expenses of the old lady. The court stated that “no one would, in good conscience, help someone unless they felt guilty”.

While this incident wasn’t the first, it was very popular and it showed one of the non written rules of China. If you help someone it’s because you feel guilty of what happened, so in some way you were or are involved in the accident or incident.

After the incident more cases like this appeared, usually with old people involved and suing their helpers because “if you weren’t responsible, why would you stopped to help me”. So people just stopped helping each other.

The page that you linked also has this caveat:

Measures of trust from attitudinal survey questions remain the most common source of data on trust. Yet academic studies have shown that these measures of trust are generally weak predictors of actual trusting behaviour.

In general I think that cross-cultural surveys asking for things like "how much do you agree with the statement that most people can be trusted" convey very little information. While there are some commonalities, there are also significant differences in how "trust" is understood between different cultures (e.g. 1, 2), so it's not clear to what extent people in different countries can be described as actually answering the same question.

That said... I think these results do show that while I think there's something very real that Richard's post is pointing at, "trust" might be too general of a term. Some of those links say that in more authoritarian cultures, people are considered to be trustworthy if they show respect to their superiors - which reads to me as saying that you're trusted if you show that you will obey. Which I think fits the model in this post - in conditions of scarcity, everyone needs to do things in a very specific way rather than debating the decision forever or worse, rebelling against their leaders. And then the leaders will trust those underlings who have shown themselves willing to obey in that way.

But "believing that Kaj is trustworthy (in that he will obey orders to the letter and show proper respect)" is a different kind of trust than "believing that Kaj is trustworthy (in that he will do something sensible and won't hurt you even if he is allowed to use his own initiative)".

Maybe a more accurate alternative title would be something like "Coercion is an adaptation to scarcity; freedom is an adaptation to abundance".

Oops, never got around answering this question.

When you ask how likely it is that it's an artifact of the therapeutic procedure, what's the alternative hypothesis you have in mind? What would not being an artifact of the therapeutic procedure mean?

Load More