Book 5 of the Sequences Highlights

To understand reality, especially on confusing topics, it's important to understand the mental processes involved in forming concepts and using words to speak about them.

First Post: Taboo Your Words

Recent Discussion

This may be a bad idea that acted upon may do more harm than good, but I propose that we should find a way to induce a controlled and stable state of hypomania in people who are willing to enhance their functional capabilities.

Hypomania is a state that frequently occurs in people with bipolar disorder, which in various variants affects up to 3% of the population. In contrast to full-blown mania, characterized by delusions, mind racing and intense euphoria and/or dysphoria, hypomania does not incapacitate a person on the individual or social level. Instead, creative thinking and cognition as well as overall energy level are considerably enhanced, while the need for sleep and rest is reduced significantly. Mood is well above the baseline of the given individual. Often, hypomania...

Answer by MvBJun 16, 202310

Thinking about the responses, I have to come to the conclusion that this is a rather bad idea. The positive symptoms which I remember very intensely just don‘t make up for the decline in critical reflection of what one is actually doing, thinking and feeling. I had suppressed that to some extent, but it is clearly a major part of what I went through. Thanks for pointing out this aspect. Personally, I will probably try to work on healthy habits, routines and stay on my medication (what I would have done anyway).

This review was originally written for the Astral Codex Ten Book Review Contest. Unfortunately it didn’t make it as one of the finalists. but since I made use of the LessWrong  proofreading/feedback service, I am reposting it here. It can also be found on my gender blog.

If I ask ChatGPT to explain transgender people to me, then it often retreats into vague discussions of gender identity. It is very hard to get it to explain what these things mean, in terms of actual experiences people might have. And that might not be a coincidence - the concepts used to understand transness seem to be the result of a complicated political negotiation, at least as much as they are optimized to communicate people’s experiences.

Some people claim to do better,...

1alternat15h
I have a somewhat meta-level question to people who sympathize with Blanchardian writing: what is the interest of this research? What questions could we answer by knowing to what extent sexuality plays a causal role in transition? Are there decisions we make as a result of this? Is it for the benefit of trans people themselves that they think of themselves in this way? I come across a disproportionate interest in noting the (generally) taboo sexual fetishes that are (perhaps) more common among trans people. I would search for quantitative evidence by going to Google Trends and looking at autogynephilia as compared to some similar term for a different social group, but I'm not even sure what other term I'd plug in just because I've never come across something applicable. Correct me if my perception is wrong, though. From my perspective, this discussion feels akin to questions like "is (x group) socially bad in (y abstract way that is very difficult to answer without splitting hairs over definition)?". You don't really learn much that is actually concrete or useful, but you solidify an ontology that associates a socially disadvantaged group with undeserved toxicity. To be clear, I don't mean to attack the character of OP with this post, in particular because I think I trust lesswrong more than other places to have discussions that are (and should be) generally taboo. Also, to the extent that I am familiar with Blanchard's research, I +1 Orual's reply. I feel fairly convinced that the legitimization of this line of questioning is bad for societal opinion of trans people in an unearned way. Blanchard's research is frequently cited by pundits with the strongest anti-trans political opinions. However, I only feel weakly that it is a basically useless line of questioning.
2tailcalled30m
Some groups of people I have noticed being into Blanchardianism and things superficially resembling Blanchardianism: 1. "Human biodiversity" people, that is intellectually inclined racists/sexists. They are usually conservatives trying to build models of society which acknowledge human differences as causes of group outcomes and ignore the relevance of ideology. A major reason they do this is to have explanations to counter antiracists/feminists and progressives who are trying to achieve group equality through affirmative action. Blanchardianism is important to them partly because gynephilic trans women's traits in many ways resemble those of biological males and gender ideology explains that through socialization forces, so Blanchardianism becomes a counternarrative they can appeal to in order to dismiss these socialization forces, which they want to do because they are sexist. And Blanchardianism is also important to them because they are ordinarily conservative so they kind of want to say that trans women are socially bad in an abstract way. 2. Miscellaneous people who have conflict with trans women in various contexts, e.g. people who read too much Mumsnet and JK Rowling and now hate trans women, transwidows, female athletes who have to compete against trans women, non-GAMP lesbians whose dating sites have been overrun by trans women, feminists or HSTSs or transmeds playing respectability politics against conservatives who make fun of them for trans stuff. I think these are the main ones you are thinking of in your comment. 3. ?Some unknown subset? (perhaps disproportionately masochistic?) of trans women who don't really feel like the standard trans narratives accurately match them, and feel that autogynephilia models are more accurate. 4. Autogynephilic men who either don't want to transition and are using the term "autogynephilia" to explain how they differ from trans wome

Actually upon further thought, the heritability section of Autoheterosexuality shows that Phil also has some elements of group 1.

This is a linkpost for https://dynomight.net/aliens/

Some suggest there might be alien aircraft on Earth now. The argument goes something like this:

(1) A priori, there’s no reason there shouldn’t be alien aircraft. Earth is 4.54 billion years old, but the universe is 13.7 billion years old, and within a billion light years of Earth there are something like 5 × 10¹⁴ stars. Most of those stars have planets, and if an alien civilization arose anywhere and built a von Neumann probe, those probes would spread everywhere.

(2) We have tons of observations that would be more likely if there were alien aircraft around than if there weren’t. These include:

  • Vast numbers of anecdotal reports from pilots.
  • Videos that appear to show objects with flight characteristics far beyond known human capabilities.
  • Senators—with access to classified information—raising concerns about
...
1dr_s3h
The aliens are here and they're super advanced, but they're also kind of klutzes.
1MichaelDickens3h
I think the evidence against (most) miracles is stronger because they violate the laws of physics. Although I think the same could be said for a few UAPs--if a UAP moves in a way that is physically impossible as far as we know, that's strong evidence against it being aliens, because aliens still have to follow the laws of physics. How would a tic-tac to accelerate at 700g with no visible propulsion, even positing the existence of super-advanced technology? The best I can think of off the top of my head is that it's using an extremely strong magnet to manipulate its position relative to earth's magnetic field. But that would require an absurd amount of energy so it would probably need to be powered by a tiny cold fusion reactor (which may be physically impossible), and it would still need to avoid emitting noticeable amounts of heat, and even if it has some sort of hyper-insulating shell, it would need internal parts that don't evaporate under that much heat, and also need to avoid emitting the massive amount of heat that would be generated by friction with the air.
2Dagon2h
You do your argument a disservice when you conflate "laws of physics" with "extrapolations of current materials and energy engineering".   If speed of light isn't violated, and the force involved isn't so great that the reaction would be measurable as changes in earth rotation or something, and the energy is much less than the theoretical limit of a small amount of antimatter, it's not "laws of physics" that is the constraint. Note I'm not saying you're wrong in considering it very unlikely, but hyperbole doesn't help in thinking or in discussion (here on LW, at least - it's common and perhaps useful in other contexts).  

Ok, fair point, I was going too far in assuming that the sort of engineering necessary was physically impossible.

There is an idea that I’ve sometimes heard around rationalist and EA circles, that goes something like “you shouldn’t ever feel safe, because nobody is actually ever safe”. I think there are at least two major variations of this:

  1. You shouldn’t ever feel safe, because something bad could happen at any time. To think otherwise is an error of rationality.
  2. You shouldn’t ever feel safe, because AI timelines might be short and we might be about to die soon.[1] Thus, to think that you’re safe is making an error of rationality.

I’m going to argue against both of these. If you already feel like both of these are obviously wrong, you might not need the rest of this post.

Note that I only intend to dispute the intellectual argument that these are making....

I want to mention here that the war example is an example of where there is an adversarial scenario, or adversarial game, and applying an adversarial frame is usually not the correct decision to do, and importantly given that the most perverse scenarios usually can't be dealt with without exotic physics due to computational complexity reason, you usually shouldn't focus on adversarial scenarios, and here Kaj Sotala is very, very correct on this post.

Or: LessWrong cartography, the illusion of separation, and blowing one's mind

TLDR: At LessWrong, we make maps. To make maps, we carve the universe out into shards: this is your daily reminder that you can (mostly) carve the shards out in whatever way you want.  

Disclaimer: I came up with these ideas for fun. I knew there was something useful within them, but I had to go through several drafts of this post to understand what my point was. I hope the final result isn't too bad, just keep in mind this is a kind-of messy soup of ideas.

1: LessWrong is about making reasonably accurate maps of reality for individuals to use. In order to make them, you have to divide the universe into shards and then assemble...

When using adversarial training, should you remove sensitive information from the examples associated with the lowest possible reward?

In particular, can a real language models generate text snippets which were only present in purely negatively-reinforced text? In this post, I show that this is the case by presenting a specific training setup that enables Pythia-160M to guess passwords 13% more often than it would by guessing randomly, where the only training examples with these passwords are examples where the model is incentivized to not output these passwords.

This suggests that AI labs training powerful AI systems should either try to limit the amount of sensitive information in the AI’s training data (even if this information is always associated with minimum rewards), or demonstrate that the effect described by this...

Awesome, thanks for writing this up!

I very much like how you are giving a clear account for a mechanism like "negative reinforcement suppresses text by adding contextual information to the model, and this has more consequences than just suppressing text".

(In particular, the model isn't learning "just don't say that", it's learning "these are the things to avoid saying", which can make it easier to point at the whole cluster?)

2the gears to ascension2h
cool work! this feels related to https://arxiv.org/abs/2304.11082 [https://arxiv.org/abs/2304.11082] - what are your thoughts on the connection?
8Fabien Roger2h
I find this paper mostly misleading. It assumes that the LLM is initially 99% certain to be friendly and 1% certain to me "malicious", and that "friendly" and "malicious" can be distinguished if you have a long enough prompt (more precisely, at no point have you gathered so much evidence for or against being malicious that you prob would not go up and down based on new information). Assuming those, it's pretty obvious that the LLM will say bad things if you have a long enough prompt. The result is not very profound, and I like this paper mostly as a formalization of simulators (by Janus). (It takes the formalization of simulators as a working assumption, rather than proving it.) For example, there are cool confidence bounds if you use a slightly more precise version of the assumptions. So the paper is about something that already knows how to "say bad things", but just doesn't have a high prior on it. It's relevant to jailbreaks, but not to generating negatively reinforced text (as explained in the related work subsection about jailbreaks).
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Subscribe to Curated posts
Log In Reset Password
...or continue with

Summary

In May 2023, MetaAI submitted a paper to arxiv called LIMA: Less Is More for Alignment. It's a pretty bad paper and (in my opinion) straightforwardly misleading. Let's get into it.

The Superficial Alignment Hypothesis

The authors present an interesting hypothesis about LLMs —

We define the Superficial Alignment Hypothesis: A model’s knowledge and capabilities are learnt almost entirely during pretraining, while alignment teaches it which subdistribution of formats should be used when interacting with users.

If this hypothesis is correct, and alignment is largely about learning style, then a corollary of the Superficial Alignment Hypothesis is that one could sufficiently tune a pretrained language model with a rather small set of examples.

We hypothesize that alignment can be a simple process where the model learns the style or format for interacting

...
3Dan H1h
It is. It's an outer alignment benchmark for text-based agents (such as GPT-4), and it includes measurements for deception, resource acquisition, various forms of power, killing, and so on. Separately, it's to show reward maximization induces undesirable instrumental (Machiavellian) behavior in less toyish environments, and is about improving the tradeoff between ethical behavior and reward maximization. It doesn't get at things like deceptive alignment, as discussed in the x-risk sheet in the appendix [https://arxiv.org/pdf/2304.03279.pdf#page=30]. Apologies that the paper is so dense, but that's because it took over a year.

Sorry, thanks for the correction.

I personally disagree on this being a good benchmark for outer alignment for various reasons, but it's good to understand the intention.

Palantir published marketing material for their offering of AI for defense purposes. There's a video of how a military commander could order a military strike on an enemy tank with the help of LLMs. 

One of the features that Palantir advertises is:

Agents

Define LLM agents to pursue specific, scoped goals.

Given military secrecy we are hearing less about Palantir's technology than we hear about OpenAI, Google, Microsoft and Facebook but Palantir is one player and likely an important one. 

1Andrea_Miotti1h
Palantir's recent materials [https://www.youtube.com/watch?v=XEM5qz__HOU]on this show [https://twitter.com/PeterHndrsn/status/1651357100327723008]that they're using three (pretty small for today frontier's standards) open source LLMs: Dolly-v2-12B, GPT-NeoX-20B, and Flan-T5 XL.  
4Nathan Helm-Burger2h
Counterintuitively, I kind of hope Palantir does make progress in weaponizing AI. I think that that's a good way to get the government and general populace to take AI risks more seriously, but doesn't actually advance the Pareto frontier of superintelligent AGI and its concomitant existential risks. My experience with talking with non-technical friends and family about AI risk is that 'Robots with guns' is a much easier risk for them to grasp than non-embodied superintelligent schemer. 

I would expect that most actual progress in weaponizing AI would not be openly shared. 

However, the existing documentation should provide some grounding for talking points. Palantir talking about how the system is configured to protect the privacy of the medical data of the soldiers is an interesting view of how they see "safe AI". 

Polygenic screening is a method for modifying the traits of future children via embryo selection. If that sounds like gobbledygook, then think of it a bit like choosing stats for your baby.

That may sound amazing. It may sound like science fiction. It may even sound horribly dystopian. But whatever your feelings, it is in fact possible. And these benefits are available right now for a price that, while expensive, is within reach for most middle-class families.

On a more serious note, there is limited selection power available with today's technologies, so you will not be able to have a baby Einstein unless you are already a Nobel laureate. But polygenic screening will allow you to decrease your child's risk of common diseases by 10-60%, reduce their risk of...

There seem to be some disease genes correlated with higher IQ. There's speculation about whether genetic conditions in Ashkenazi Jews cause higher intelligence, but there's also a gene that causes blindness in middle age that also appears to raise intelligence by enhancing neuronal signaling.

In general, selective breeding of animals for various traits have often managed to produce animals that excel in that trait but are noticeably less healthy overall. At this point, I don't think actually know which genes are tradeoffs and which are just flaws - includin... (read more)