To understand reality, especially on confusing topics, it's important to understand the mental processes involved in forming concepts and using words to speak about them.
This may be a bad idea that acted upon may do more harm than good, but I propose that we should find a way to induce a controlled and stable state of hypomania in people who are willing to enhance their functional capabilities.
Hypomania is a state that frequently occurs in people with bipolar disorder, which in various variants affects up to 3% of the population. In contrast to full-blown mania, characterized by delusions, mind racing and intense euphoria and/or dysphoria, hypomania does not incapacitate a person on the individual or social level. Instead, creative thinking and cognition as well as overall energy level are considerably enhanced, while the need for sleep and rest is reduced significantly. Mood is well above the baseline of the given individual. Often, hypomania...
This review was originally written for the Astral Codex Ten Book Review Contest. Unfortunately it didn’t make it as one of the finalists. but since I made use of the LessWrong proofreading/feedback service, I am reposting it here. It can also be found on my gender blog.
If I ask ChatGPT to explain transgender people to me, then it often retreats into vague discussions of gender identity. It is very hard to get it to explain what these things mean, in terms of actual experiences people might have. And that might not be a coincidence - the concepts used to understand transness seem to be the result of a complicated political negotiation, at least as much as they are optimized to communicate people’s experiences.
Some people claim to do better,...
Actually upon further thought, the heritability section of Autoheterosexuality shows that Phil also has some elements of group 1.
Some suggest there might be alien aircraft on Earth now. The argument goes something like this:
(1) A priori, there’s no reason there shouldn’t be alien aircraft. Earth is 4.54 billion years old, but the universe is 13.7 billion years old, and within a billion light years of Earth there are something like 5 × 10¹⁴ stars. Most of those stars have planets, and if an alien civilization arose anywhere and built a von Neumann probe, those probes would spread everywhere.
(2) We have tons of observations that would be more likely if there were alien aircraft around than if there weren’t. These include:
Ok, fair point, I was going too far in assuming that the sort of engineering necessary was physically impossible.
There is an idea that I’ve sometimes heard around rationalist and EA circles, that goes something like “you shouldn’t ever feel safe, because nobody is actually ever safe”. I think there are at least two major variations of this:
I’m going to argue against both of these. If you already feel like both of these are obviously wrong, you might not need the rest of this post.
Note that I only intend to dispute the intellectual argument that these are making....
I want to mention here that the war example is an example of where there is an adversarial scenario, or adversarial game, and applying an adversarial frame is usually not the correct decision to do, and importantly given that the most perverse scenarios usually can't be dealt with without exotic physics due to computational complexity reason, you usually shouldn't focus on adversarial scenarios, and here Kaj Sotala is very, very correct on this post.
Or: LessWrong cartography, the illusion of separation, and blowing one's mind
TLDR: At LessWrong, we make maps. To make maps, we carve the universe out into shards: this is your daily reminder that you can (mostly) carve the shards out in whatever way you want.
Disclaimer: I came up with these ideas for fun. I knew there was something useful within them, but I had to go through several drafts of this post to understand what my point was. I hope the final result isn't too bad, just keep in mind this is a kind-of messy soup of ideas.
1: LessWrong is about making reasonably accurate maps of reality for individuals to use. In order to make them, you have to divide the universe into shards and then assemble...
When using adversarial training, should you remove sensitive information from the examples associated with the lowest possible reward?
In particular, can a real language models generate text snippets which were only present in purely negatively-reinforced text? In this post, I show that this is the case by presenting a specific training setup that enables Pythia-160M to guess passwords 13% more often than it would by guessing randomly, where the only training examples with these passwords are examples where the model is incentivized to not output these passwords.
This suggests that AI labs training powerful AI systems should either try to limit the amount of sensitive information in the AI’s training data (even if this information is always associated with minimum rewards), or demonstrate that the effect described by this...
Awesome, thanks for writing this up!
I very much like how you are giving a clear account for a mechanism like "negative reinforcement suppresses text by adding contextual information to the model, and this has more consequences than just suppressing text".
(In particular, the model isn't learning "just don't say that", it's learning "these are the things to avoid saying", which can make it easier to point at the whole cluster?)
In May 2023, MetaAI submitted a paper to arxiv called LIMA: Less Is More for Alignment. It's a pretty bad paper and (in my opinion) straightforwardly misleading. Let's get into it.
The authors present an interesting hypothesis about LLMs —
...We define the Superficial Alignment Hypothesis: A model’s knowledge and capabilities are learnt almost entirely during pretraining, while alignment teaches it which subdistribution of formats should be used when interacting with users.
If this hypothesis is correct, and alignment is largely about learning style, then a corollary of the Superficial Alignment Hypothesis is that one could sufficiently tune a pretrained language model with a rather small set of examples.
We hypothesize that alignment can be a simple process where the model learns the style or format for interacting
Sorry, thanks for the correction.
I personally disagree on this being a good benchmark for outer alignment for various reasons, but it's good to understand the intention.
Palantir published marketing material for their offering of AI for defense purposes. There's a video of how a military commander could order a military strike on an enemy tank with the help of LLMs.
One of the features that Palantir advertises is:
Agents
Define LLM agents to pursue specific, scoped goals.
Given military secrecy we are hearing less about Palantir's technology than we hear about OpenAI, Google, Microsoft and Facebook but Palantir is one player and likely an important one.
I would expect that most actual progress in weaponizing AI would not be openly shared.
However, the existing documentation should provide some grounding for talking points. Palantir talking about how the system is configured to protect the privacy of the medical data of the soldiers is an interesting view of how they see "safe AI".
Polygenic screening is a method for modifying the traits of future children via embryo selection. If that sounds like gobbledygook, then think of it a bit like choosing stats for your baby.
That may sound amazing. It may sound like science fiction. It may even sound horribly dystopian. But whatever your feelings, it is in fact possible. And these benefits are available right now for a price that, while expensive, is within reach for most middle-class families.
On a more serious note, there is limited selection power available with today's technologies, so you will not be able to have a baby Einstein unless you are already a Nobel laureate. But polygenic screening will allow you to decrease your child's risk of common diseases by 10-60%, reduce their risk of...
There seem to be some disease genes correlated with higher IQ. There's speculation about whether genetic conditions in Ashkenazi Jews cause higher intelligence, but there's also a gene that causes blindness in middle age that also appears to raise intelligence by enhancing neuronal signaling.
In general, selective breeding of animals for various traits have often managed to produce animals that excel in that trait but are noticeably less healthy overall. At this point, I don't think actually know which genes are tradeoffs and which are just flaws - includin...
Thinking about the responses, I have to come to the conclusion that this is a rather bad idea. The positive symptoms which I remember very intensely just don‘t make up for the decline in critical reflection of what one is actually doing, thinking and feeling. I had suppressed that to some extent, but it is clearly a major part of what I went through. Thanks for pointing out this aspect. Personally, I will probably try to work on healthy habits, routines and stay on my medication (what I would have done anyway).