LessWrong

Warning: This post might be depressing to read for everyone except trans women. Gender identity and suicide is discussed. This is all highly speculative. I know near-zero about biology, chemistry, or physiology. I do not recommend anyone take hormones to try to increase their intelligence; mood & identity are more important.

Why are trans women so intellectually successful? They seem to be overrepresented 5-100x in eg cybersecurity twitter, mathy AI alignment, non-scam crypto twitter, math PhD programs, etc.

To explain this, let's first ask: Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying ${1.11}^{2 / 3} - 1 = 7 %$ more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar:

My theory...

(See More – 949 more words)

lukehmiles3m10

Yes my point is the low T did it before the transition

1kromem16h

It implicitly does compare trans women to other women in talking about the performance similarity between men and women: "Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying 1.112/3−1=7% more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar" So OP is saying "look, women and men are the same, but trans women are exceptional." I'm saying that identifying the exceptionality of trans women ignores the environmental disadvantage other women experience, such that the earlier claims of unexceptionable performance of women (which as I quoted gets an explicit mention from a presumption of assumed likelihood of male competency based on what's effectively phrenology) are reflecting a disadvantaged sample vs trans women. My point is that if you accounted for environmental factors the data would potentially show female exceptionality across the board and the key reason trans women end up being an outlier against both men and other women is because they are avoiding the early educational disadvantage other women experience.

Cooperation is optimal, with weaker agents too - tldr

Ryo

This is a linkpost for https://medium.com/p/aeb68729829c

It's a ‘superrational’ extension of the proven optimality of cooperation in game theory
+ Taking into account asymmetries of power
// Still AI risk is very real

Short version of an already skimmed 12min post
29min version here

For rational agents (long-term) at all scale (human, AGI, ASI…)

In real contexts, with open environments (world, universe), there is always a risk to meet someone/something stronger than you, and overall weaker agents may be specialized in your flaws/blind spots.

To protect yourself, you can choose the maximally rational and cooperative alliance:

Because any agent is subjected to the same pressure/threat of (actual or potential) stronger agents/alliances/systems, one can take an insurance that more powerful superrational agents will behave well by behaving well with weaker agents. This is the basic rule allowing scale-free cooperation.

If you integrated this super-cooperative...

(Continue Reading – 1117 more words)

1Ryo 13m

Indeed, I am insisting in the three posts that from our perspective, this is the crucial point: Fermi's paradox. Now there is a whole ecosystem of concepts surrounding it, and although I have certain preferred models, the point is that uncertainty is really heavy. Those AI-lions are cosmical lions thinking on cosmical scales. Is it easy to detect an AI-Dragon you may meet in millions/billions of years? Is it undecidable? Probably. For many reasons* Is this [astronomical level of uncertainty/undecidability + the maximal threat of a death sentence] worth the gamble? -> "Meeting a stronger AI" = "death" -> Maximization = 0 -> AI only needs 1 stronger AI to be dead. What is the likelihood for a human-made AI to not encounter [a stronger alien AI], during the whole length of their lifetime? *(reachable but rare and far in space-time Dragons, but also cases where Dragons are everywhere and so advanced that lower technological proficiency isn't enough etc.).

Ryo 3m10

I can't be certain of the solidity of this uncertainty, and think we still have to be careful, but overall, the most parsimonious prediction to me seems to be super-coordination.

Compared to the risk of facing a revengeful super-cooperative alliance, is the price of maintaining humans in a small blooming "island", really that high?

Many other-than-human atoms are lions' prey.

And a doubtful AI may not optimize fully for super-cooperation, simply alleviating the price to pay in the counterfactuals where they encounter a super-cooperative cluster (resulti... (read more)

Explaining grokking through circuit efficiency

Vikrant Varma, Rohin Shah

Ω 508mo

This is a linkpost for https://arxiv.org/abs/2309.02390

This is a linkpost for our paper Explaining grokking through circuit efficiency, which provides a general theory explaining when and why grokking (aka delayed generalisation) occurs, and makes several interesting and novel predictions which we experimentally confirm (introduction copied below). You might also enjoy our explainer on X/Twitter.

Abstract

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation. We propose that grokking occurs when the task admits a generalising solution and a memorising solution, where the generalising solution is slower to learn but more efficient, producing larger logits with the same parameter norm. We hypothesise that memorising circuits become more inefficient with larger training datasets while generalising circuits do...

(See More – 595 more words)

2LawrenceC6h

Higher weight norm means lower effective learning rate with Adam, no? In that paper they used a constant learning rate across weight norms, but Adam tries to normalize the gradients to be of size 1 per paramter, regardless of the size of the weights. So the weights change more slowly with larger initializations (especially since they constrain the weights to be of fixed norm by projecting after the Adam step).

Rohin Shah21mΩ220

Sounds plausible, but why does this differentially impact the generalizing algorithm over the memorizing algorithm?

Perhaps under normal circumstances both are learned so fast that you just don't notice that one is slower than the other, and this slows both of them down enough that you can see the difference?

social lemon markets

bhauth

This is a linkpost for https://www.bhauth.com/blog/culture/lemon%20markets.html

I refuse to join any club that would have me as a member.

— Groucho Marx

Alice and Carol are walking on the sidewalk in a large city, and end up together for a while.

"Hi, I'm Alice! What's your name?"

Carol thinks:

If Alice is trying to meet people this way, that means she doesn't have a much better option for meeting people, which reduces my estimate of the value of knowing Alice. That makes me skeptical of this whole interaction, which reduces the value of approaching me like this, and Alice should know this, which further reduces my estimate of Alice's other social options, which makes me even less interested in meeting Alice like this.

Carol might not think all of that consciously, but that's how human social reasoning tends to...

(See More – 809 more words)

gwern22m30

Hence the advice to lost children to not accept random strangers soliciting them spontaneously, but if no authority figure is available, to pick a random adult and ask them for help.

10jchan43m

A corollary of this is that if anyone at an [X] gathering is asked “So, what got you into [X]?” and answers “I heard there’s a great community around [X]”, then that person needs to be given the cold shoulder and made to feel unwelcome, because otherwise the bubble of deniability is pierced and the lemon spiral will set in, ruining it for everyone else. However, this is pretty harsh, and I’m not confident enough in this chain of reasoning to actually “gatekeep” people like this in practice. Does this ring true to you?

Scaling of AI training runs will slow down after GPT-5

Maxime Riché

24m

My credence: 33% confidence in the claim that the growth in the number of GPUs used for training SOTA AI will slow down significantly directly after GPT-5. It is not higher because of (1) decentralized training is possible, and (2) GPT-5 may be able to increase hardware efficiency significantly, (3) GPT-5 may be smaller than assumed in this post, (4) race dynamics.

TLDR: Because of a bottleneck in energy access to data centers and the need to build OOM larger data centers.

The reasoning behind the claim:

Current large data centers consume around 100 MW of power, while a single nuclear power plant generates 1GW. The largest seems to consume 150 MW.
An A100 GPU uses 250W, and around 1kW with overheard. B200 GPUs, uses ~1kW without overhead. Thus a 1MW

...

(See More – 730 more words)

AI Regulation is Unsafe

Maxwell Tabarrok

This is a linkpost for https://www.maximum-progress.com/p/ai-regulation-is-unsafe

Concerns over AI safety and calls for government control over the technology are highly correlated but they should not be.

There are two major forms of AI risk: misuse and misalignment. Misuse risks come from humans using AIs as tools in dangerous ways. Misalignment risks arise if AIs take their own actions at the expense of human interests.

Governments are poor stewards for both types of risk. Misuse regulation is like the regulation of any other technology. There are reasonable rules that the government might set, but omission bias and incentives to protect small but well organized groups at the expense of everyone else will lead to lots of costly ones too. Misalignment regulation is not in the Overton window for any government. Governments do not have strong incentives...

(Continue Reading – 1176 more words)

quetzal_rainbow37m32

May I strongly recommend that you try to become a Dark Lord instead?

I mean, literally. Stage some small bloody civil war with expected body count of several millions, become dictator, provide everyone free insurance coverage for cryonics, it will be sure more ethical than 10% of chance of killing literally everyone from the perspective of most of ethical systems I know.

2Daniel Kokotajlo2h

Big +1 to that. Part of why I support (some kinds of) AI regulation is that I think they'll reduce the risk of totalitarianism, not increase it.

2Daniel Kokotajlo2h

So, it sounds like you'd be in favor of a 1-year pause or slowdown then, but not a 10-year? (Also, I object to your side-swipe at longtermism. Longtermism according to wikipedia is "Longtermism is the ethical view that positively influencing the long-term future is a key moral priority of our time." "A key moral priority" doesn't mean "the only thing that has substantial moral value." If you had instead dunked on classic utilitarianism, I would have agreed.)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Take the wheel, Shoggoth! (Lesswrong is trying out changes to the frontpage algorithm)

Ruby, RobertM

For the last month, @RobertM and I have been exploring the possible use of recommender systems on LessWrong. Today we launched our first site-wide experiment in that direction.

(In the course of our efforts, we also hit upon a frontpage refactor that we reckon is pretty good: tabs instead of a clutter of different sections. For now, only for logged-in users. Logged-out users see the "Latest" tab, which is the same-as-usual list of posts.)

Why algorithmic recommendations?

A core value of LessWrong is to be timeless and not news-driven. However, the central algorithm by which attention allocation happens on the site is the Hacker News algorithm^[1], which basically only shows you things that were posted recently, and creates a strong incentive for discussion to always be...

(See More – 965 more words)

Tamsin Leake1h20

I'm generally not a fan of increasing the amount of illegible selection effects.

On the privacy side, can lesswrong guarantee that, if I never click or Recommended, then recombee will never see an (even anonymized) trace of what I browse on lesswrong?

4niplav1h

I realized I hadn't given feedback on the actual results of the recommendation algorithm. Rating the recommendations I've gotten (from -10 to 10, 10 is best): * My experience using financial commitments to overcome akrasia: 3 * An Introduction to AI Sandbagging: 3 * Improving Dictionary Learning with Gated Sparse Autoencoders: 2 * [April Fools' Day] Introducing Open Asteroid Impact: -6 * LLMs seem (relatively) safe: -3 * The first future and the best future: -2 * Examples of Highly Counterfactual Discoveries?: 5 * "Why I Write" by George Orwell (1946): -3 * My Clients, The Liars: -4 * 'Empiricism!' as Anti-Epistemology: -2 * Toward a Broader Conception of Adverse Selection: 4 * Ambitious Altruistic Software Engineering Efforts: Opportunities and Benefits: 6

dirk's Shortform

dirk

dirk3h85

Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept

3dirk4h

I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.

1dirk4h

I'm not alexithymic; I directly experience my emotions and have, additionally, introspective access to my preferences. However, some things manifest directly as preferences which I have been shocked to realize in my old age, were in fact emotions all along. (In rare cases these are stronger than the ones directly-felt even, despite reliably seeming on initial inspection to be simply neutral metadata).

1dirk5h

Meta/object level is one possible mixup but it doesn't need to be that. Alternative example, is/ought: Cedar objects to thing Y. Dusk explains that it happens because Z. Cedar reiterates that it shouldn't happen, Dusk clarifies that in fact it is the natural outcome of Z, and we're off once more.

Paul Christiano named as US AI Safety Institute Head of AI Safety

250

Joel Burget

10d

This is a linkpost for https://www.commerce.gov/news/press-releases/2024/04/us-commerce-secretary-gina-raimondo-announces-expansion-us-ai-safety

U.S. Secretary of Commerce Gina Raimondo announced today additional members of the executive leadership team of the U.S. AI Safety Institute (AISI), which is housed at the National Institute of Standards and Technology (NIST). Raimondo named Paul Christiano as Head of AI Safety, Adam Russell as Chief Vision Officer, Mara Campbell as Acting Chief Operating Officer and Chief of Staff, Rob Reich as Senior Advisor, and Mark Latonero as Head of International Engagement. They will join AISI Director Elizabeth Kelly and Chief Technology Officer Elham Tabassi, who were announced in February. The AISI was established within NIST at the direction of President Biden, including to support the responsibilities assigned to the Department of Commerce under the President’s landmark Executive Order.

Paul Christiano, Head of AI Safety, will design

...

(See More – 100 more words)

Davidmanheim1h20

That doesn't seem like "consistently and catastrophically," it seems like "far too often, but with thankfully fairly limited local consequences."

2Davidmanheim11h

BSL isn't the thing that defines "appropriate units of risk", that's pathogen risk-group levels, and I agree that those are are problem because they focus on pathogen lists rather than actual risks. I actually think BSL are good at what they do, and the problem is regulation and oversight, which is patchy, as well as transparency, of which there is far too little. But those are issues with oversight, not with the types of biosecurity measure that are available.

2Adam Scholl15h

This thread isn't seeming very productive to me, so I'm going to bow out after this. But yes, it is a primary concern—at least in the case of Open Philanthropy, it's easy to check what their primary concerns are because they write them up. And accidental release from dual use research is one of them.

2Davidmanheim11h

If you're appealing to OpenPhil, it might be useful to ask one of the people who was working with them on this as well. And you've now equivocated between "they've induced an EA cause area" and a list of the range of risks covered by biosecurity - not what their primary concerns are - and citing this as "one of them." I certainly agree that biosecurity levels are one of the things biosecurity is about, and that "the possibility of accidental deployment of biological agents" is a key issue, but that's incredibly far removed from the original claim that the failure of BSL levels induced the cause area!

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

For rational agents (long-term) at all scale (human, AGI, ASI…)

Abstract

The reasoning behind the claim:

Why algorithmic recommendations?

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA