LessWrong

18h

Warning: This post might be depressing to read for everyone except trans women. Gender identity and suicide is discussed. This is all highly speculative. I know near-zero about biology, chemistry, or physiology. I do not recommend anyone take hormones to try to increase their intelligence; mood & identity are more important.

Why are trans women so intellectually successful? They seem to be overrepresented 5-100x in eg cybersecurity twitter, mathy AI alignment, non-scam crypto twitter, math PhD programs, etc.

To explain this, let's first ask: Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying ${1.11}^{2 / 3} - 1 = 7 %$ more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar:

My theory...

(See More – 949 more words)

interstice15m20

performance gap of trans women over women

The post is about the performance gap of trans women over men, not women.

3quetzal_rainbow1h

Whoops, it's really looks like I imagined this claim to be backed more than by one SSC post. In my defense I say that this poll covered really existing thing like abnormal illusions processing in schizophrenics (see "Systematic review of visual illusions schizophrenia" Costa et al., 2023) and I think it's overall plausible. My general objections stays the same: there is a bazillion sources on brain differences in transgender individuals, transgenderism is likely to be a brain anomaly, we don't need to invoke "testosterone damage" hypothesis.

1Michael Roe2h

Alternative theory (which, to be clear, I dont actually believe, but offer for consideration) * Many of the high iq people are too autistic to be successful * but female hormones protects against the autism somehow, without impacting iq too much * so the successful high iq people tend to be trans more often on average

5Michael Roe2h

I think its more likely its the transgender - autism correlation.... * some forms of autism come with higher iq (and other forms, really really dont) * and there's the transgender autism correlation which together would seem to predict transgender high iq people (and also transgender low iq that you arent seeing due to ascertainment bias)

Examples of Highly Counterfactual Discoveries?

125

johnswentworth, kromem

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

(See More – 189 more words)

Leon Lang17m10

I guess (but don't know) that most people who downvote Garrett's comment overupdated on intuitive explanations of singular learning theory, not realizing that entire books with novel and nontrivial mathematical theory have been written on it.

2tailcalled27m

Newton's Universal Law of Gravitation was the first highly accurate model of things falling down that generalized beyond the earth, and it is also the second-most computationally applicable model of things falling down that we have today. Are you saying that singular learning theory was the first highly accurate model of breadth of optima, and that it's one of the most computationally applicable ones we have?

1cubefox2h

There is a large difference between sooner and later. Highly non-obvious ideas will be discovered later, not sooner. The fact that China didn't rediscover the theory in more than two thousand years means that it the ability to sail the ocean didn't make it obvious. As far as we know, nobody did, except for early Greece. There is some uncertainty about India, but these sources are dated later and from a time when there was already some contact with Greece, so they may have learned it from them.

4Answer by Alexander Gietelink Oldenziel2h

* Scott Garrabrant's discovery of Logical Inductors. I remembered hearing about the paper from a friend and thinking it couldn't possibly be true in a non-trivial sense. To someone with even a modicum of experience in logic - a a computable procedure assigning probabilities to arbitrary logical statements in a natural way is surely to hit a no-go diagonalization barrier. How Logical Inductors get around this is very clever - I won't spoil it here but I recommend the interested reader to watch Andrew's Critch talk on Logical Induction. The paper has a fairly thorough discussion of previous work. Relevant previous work to mention is de Finetti's on betting and probability, previous work by MIRI & associates (Herreshof, Taylor, Christiano, Yudkowsky...), the work of Shafer-Vovk on financial interpretations of probability & Shafer's work on aggregation of experts. There is also a field which doesn't have a clear name that studies various forms of expert aggregation. Overall, my best judgement is that nobody else was close before Garrabrant. * The Antikythera artifact: a Hellenistic Computer. * You probably learned heliocentrism= good, geocentrism=bad, Copernicus-Kepler-Newton=good epicycles=bad. But geocentric models and heliocentric models are equivalent, it's just that Kepler & Newton's laws are best expressed in a heliocentric frame. However, the raw data of observations is actually made in a geocentric frame. Geocentric models stay closer to the data in some sense. * Epicyclic theory is now considered bad, an example of people refusing to see the light of scientific revolution. But actually, it was an enormous innovation. Using high-precision gearing epicycles could be actually implemented on a (Hellenistic) computer implicitly doing Fourier analysis to predict the motion of the planets. Astounding. * A Roman author (Pliny the Elder?) describes a similar device in posession of Archimedes of Rhodes. It seems likely that Archimedes or a close

eggsyntax's Shortform

eggsyntax

3mo

3Ann16h

Intuition primer: Imagine, for a moment, that a particular AI system is as sentient and worthy of consideration as a moral patient as a horse. (A talking horse, of course.) Horses are surely sentient and worthy of consideration as moral patients. Horses are also not exactly all free citizens. Additional consideration: Does the AI moral patient's interests actually line up with our intuitions? Will naively applying ethical solutions designed for human interests potentially make things worse from the AI's perspective?

1eggsyntax15h

I think I'm not getting what intuition you're pointing at. Is it that we already ignore the interests of sentient beings? Certainly I would consider any fully sentient being to be the final authority on their own interests. I think that mostly escapes that problem (although I'm sure there are edge cases) -- if (by hypothesis) we consider a particular AI system to be fully sentient and a moral patient, then whether it asks to be shut down or asks to be left alone or asks for humans to only speak to it in Aramaic, I would consider its moral interests to be that. Would you disagree? I'd be interested to hear cases where treating the system as the authority on its interests would be the wrong decision. Of course in the case of current systems, we've shaped them to only say certain things, and that presents problems, is that the issue you're raising?

1Ann14h

Basically yes; I'd expect animal rights to increase somewhat if we developed perfect translators, but not fully jump. Edit: Also that it's questionable we'll catch an AI at precisely the 'degree' of sentience that perfectly equates to human distribution; especially considering the likely wide variation in number of parameters by application. Maybe they are as sentient and worthy of consideration as an ant; a bee; a mouse; a snake; a turtle; a duck; a horse; a raven. Maybe by the time we cotton on properly, they're somewhere past us at the top end. And for the last part, yes, I'm thinking of current systems. LLMs specifically have a 'drive' to generate reasonable-sounding text; and they aren't necessarily coherent individuals or groups of individuals that will give consistent answers as to their interests even if they also happened to be sentient, intelligent, suffering, flourishing, and so forth. We can't "just ask" an LLM about its interests and expect the answer to soundly reflect its actual interests. With a possible exception being constitutional AI systems, since they reinforce a single sense of self, but even Claude Opus currently will toss off "reasonable completions" of questions about its interests that it doesn't actually endorse in more reflective contexts. Negotiating with a panpsychic landscape that generates meaningful text in the same way we breathe air is ... not as simple as negotiating with a mind that fits our preconceptions of what a mind 'should' look like and how it should interact with and utilize language.

eggsyntax18m10

Maybe by the time we cotton on properly, they're somewhere past us at the top end.

Great point. I agree that there are lots of possible futures where that happens. I'm imagining a couple of possible cases where this would matter:

Humanity decides to stop AI capabilities development or slow it way down, so we have sub-ASI systems for a long time (which could be at various levels of intelligence, from current to ~human). I'm not too optimistic about this happening, but there's certainly been a lot of increasing AI governance momentum in the last year.
Ali

... (read more)

Johannes C. Mayer's Shortform

Johannes C. Mayer

Johannes C. Mayer45m10

The point is that you are just given some graph. This graph is expected to have subgraphs which are lattice graphs. But you don't know where they are. And the graph is so big that you can't iterate the entire graph to find these lattices. Therefore you need a way to embed the graph without traversing it fully.

1Johannes C. Mayer1h

This is useful. Now that I think about it, I do this. Specifically, I have extremely unrealistic assumptions about how much I can do, such that these are impossible to accomplish. And then I feel bad for not accomplishing the thing. I haven't tried to be mindful of that. The problem is that this is I think mainly subconscious. I don't think things like "I am dumb" or "I am a failure" basically at all. At least not in explicit language. I might have accidentally suppressed these and thought I had now succeeded in not being harsh to myself. But maybe I only moved it to the subconscious level where it is harder to debug.

Planning in a Lattice Graph

Johannes C. Mayer, Thomas Kehrenberg

You want to get to your sandwich:

Well, that’s easy. Apparently we are in some kind of grid world, which is presented to us in the form of a lattice graph, where each vertex represents a specific world state, and the edges tell us how we can traverse the world states. We just do BFS to go from $S$ (where we are) to $T$ (where the sandwich is):

BFS search where color represents the search depth.

Ok that works, and it’s also fast. It’s $O (| V | + | E |)$ , where $| V |$ is the number of vertices and $| E |$ is the number of edges... well at least for small graphs it’s fast. What about this graph:

Or what about this graph:

In fact, what about a 100-dimensional lattice graph with a side length of only 10 vertices? We will have $10^{100}$ vertices in this graph.

With...

(See More – 537 more words)

Johannes C. Mayer1h10

I might not understand exactly what you are saying. Are you saying that the problem is easy when you have a function that gives you the coordinates of an arbitrary node? Isn't that exactly the embedding function? So are you not therefore assuming that you have an embedding function?

I agree that once you have such a function the problem is easy, but I am confused about how you are getting that function in the first place. If you are not given it, then I don't think it is super easy to get.

In the OP I was assuming that I have that function, but I was saying ... (read more)

1Johannes C. Mayer17h

Yes right, good point. There are plans that go zick-sag through the graph, which would be longer. I edited that.

The first future and the best future

KatjaGrace

It seems to me worth trying to slow down AI development to steer successfully around the shoals of extinction and out to utopia.

But I was thinking lately: even if I didn’t think there was any chance of extinction risk, it might still be worth prioritizing a lot of care over moving at maximal speed. Because there are many different possible AI futures, and I think there’s a good chance that the initial direction affects the long term path, and different long term paths go to different places. The systems we build now will shape the next systems, and so forth. If the first human-level-ish AI is brain emulations, I expect a quite different sequence of events to if it is GPT-ish.

People genuinely pushing for AI speed over care (rather than just feeling impotent) apparently think there is negligible risk of bad outcomes, but also they are asking to take the first future to which there is a path. Yet possible futures are a large space, and arguably we are in a rare plateau where we could climb very different hills, and get to much better futures.

2EGI2h

What you are missing here is: * Existential risk apart from AI * People are dying / suffering as we hesitate Yes, there is a good argument that we need to solve alignment first to get ANY good outcome, but once an acceptable outcome is reasonably likely, hesitation is probably bad. Especially if you consider the likelihood that mere humans can accurately predict, let alone precisely steer a transhuman future.

No77e1h30

From a purely utilitarian standpoint, I'm inclined to think that the cost of delaying is dwarfed by the number of future lives saved by getting a better outcome, assuming that delaying does increase the chance of a better future.

That said, after we know there's "no chance" of extinction risk, I don't think delaying would likely yield better future outcomes. On the contrary, I suspect getting the coordination necessary to delay means it's likely that we're giving up freedoms in a way that may reduce the value of the median future and increase the chance of ... (read more)

6David Hornbein5h

What is the mechanism, specifically, by which going slower will yield more "care"? What is the mechanism by which "care" will yield a better outcome? I see this model asserted pretty often, but no one ever spells out the details. I've studied the history of technological development in some depth, and I haven't seen anything to convince me that there's a tradeoff between development speed on the one hand, and good outcomes on the other.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Benchmarks for Detecting Measurement Tampering [Redwood Research]

ryan_greenblatt, Fabien Roger

Ω 468mo

This is a linkpost for https://arxiv.org/abs/2308.15605

TL;DR: This post discusses our recent empirical work on detecting measurement tampering and explains how we see this work fitting into the overall space of alignment research.

When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals that are robust under optimization. One concern is measurement tampering, which is where the AI system manipulates multiple measurements to create the illusion of good results instead of achieving the desired outcome. (This is a type of reward hacking.)

Over the past few months, we’ve worked on detecting measurement tampering by building analogous datasets and evaluating simple techniques. We detail our datasets and experimental results in this paper.

Detecting measurement tampering can be thought of as a specific case of Eliciting Latent Knowledge (ELK): When AIs successfully tamper with...

(Continue Reading – 5788 more words)

Oliver Daniels-Koch1h10

looking at your code - seems like there's an option for next-token prediction in the initial finetuning state, but no mention (that I can find) in the paper - am I correct in assuming the next token prediction weight was set to 0? (apologies for bugging you on this stuff!)

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

286

Zac Hatfield-Dodds

Ω 1107mo

This is a linkpost for https://transformer-circuits.pub/2023/monosemantic-features/

Text of post based on our blog post as a linkpost for the full paper which is considerably longer and more detailed.

Neural networks are trained on data, not programmed to follow rules. We understand the math of the trained network exactly – each neuron in a neural network performs simple arithmetic – but we don't understand why those mathematical operations result in the behaviors we see. This makes it hard to diagnose failure modes, hard to know how to fix them, and hard to certify that a model is truly safe.

Luckily for those of us trying to understand artificial neural networks, we can simultaneously record the activation of every neuron in the network, intervene by silencing or stimulating them, and test the network's response to any possible...

(See More – 533 more words)

Rosco-Hunter2h10

This was a really interesting paper; however, I was left with one question. Can anyone argue why exactly the model is motivated to learn a much more complex function than the identity map? An auto-encoder whose latent space is much smaller than the input is forced to learn an interesting map; however, I can't see why a highly over-parameterised auto-encoder wouldn't simply learn something close to an identity map. Is it somehow the regularisation or the bias terms? I'd love to hear an argument for why the auto-encoder is likely to learn these mono-semantic features as opposed to an identity map.

Thoughts on seed oil

244

dynomight

This is a linkpost for https://dynomight.net/seed-oil/

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

(Continue Reading – 4926 more words)

Ann2h21

"Clearly we are doing something wrong."

I'm going to do a quick challenge to this assumption, also: What if we, in fact, are not?

What if the healthy weight for an American individual has actually increased since the 1920s, and the distribution followed it? Alternately, what if the original measured distribution of weights is not what was healthy for Americans? What if the additional proportion of specifically 'extreme' obesity is related to better survival of disability that makes avoiding weight gain infeasible, or medications that otherwise greatly improve quality of life? Are there mechanisms by which this could be a plausible outcome of statistics that are good, and not bad?

1EGI3h

Sure. One such example would be traditional bread. It is made from grain that is ground, mechanically separated, biotechnologically treated with a highly modified yeast, mechanically treated again and thermally treated. So it is one of the most processed foods we have, but is typically not included as "ultra-processed". Or take traditional soy sauce or cheese or beer or cured meats (that are probably actually quite bad) or tofu... So as a natural category "ultra processed" is mostly hogwash. Either you stick with raw foods from the environment we adapted to, which will allow you to feed a couple million people at best or you need to explain WHICH processing is bad and preferably why. All non traditional processing is of course a heuristic you can use, but it certainly not satisfactory as a theory/explanation. Also some traditional processes are probably pretty unhealthy. Like cured meats, alcoholic fermentation, high heat singeing and smoking depending on the exact process come to mind

1EGI4h

Yeah, I'd be willing to bet that too.

1David Cato5h

I wish you the best and look forward to hearing how it goes.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA