Introduction

A recent popular tweet did a "math magic trick", and I want to explain why it works and use that as an excuse to talk about cool math (functional analysis). The tweet in question:

This is a cute magic trick, and like any good trick they nonchalantly gloss over the most important step. Did you spot it? Did you notice your confusion?

Here's the key question: Why did they switch from a differential equation to an integral equation? If you can use $(1 - x)^{- 1} = 1 + x + x^{2} + . . .$ when $x = \int$ , why not use it when $x = d / d x$ ?

Well, lets try it, writing $D$ for the derivative:

$\begin{matrix} f^{'} & = & f (1 - D) f & = & 0 f & = & (1 + D + D^{2} + . . .) 0 f & = & 0 + 0 + 0 + . . . f & = & 0 \end{matrix}$

So now you may be disappointed, but relieved: yes, this version fails, but at least it fails-safe, giving you the trivial solution, right?

But no, actually $(1 - D)^{- 1} = 1 + D + D^{2} + . . .$ can fail catastrophically, which we can see if we try a nonhomogeneous equation...

(Continue Reading – 1217 more words)

2Robert_AIZI10h

Ah sorry, I skipped over that derivation! Here's how we'd approach this from first principals: to solve f=Df, we know we want to use the (1-x)=1+x+x^2+... trick, but now know that we need x=I instead of x=D. So that's why we want to switch to an integral equation, and we get f=Df If=IDf = f-f(0) where the final equality is the fundamental theorem of calculus. Then we rearrange: f-If=f(0) (1-I)f=f(0) and solve from there using the (1-I)=1+I+I^2+... trick! What's nice about this is it shows exactly how the initial condition of the DE shows up.

1notfnofn10h

This is true, but I'm looking for an explicit, non-recursive formula that needs to handle the general case of the kth anti-derivative (instead of just the first). The solution involves doing something funny with formal power series, like in this post.

2DaemonicSigil8h

Heh, sure.

notfnofn15m10

Very nice! Notice that if you write $r = j - k,$ $I$ as $D^{- 1}$ , and play around with binomial coefficients a bit, we can rewrite this as:

D^{- k} (f p) = \infty \sum r = 0 (\frac{- k}{r}) (D^{- k - r} f) (D^{r} p)

which holds for $k < 0$ as well, in which case it becomes the derivative product rule

(By the way, how do you spoiler tag?)

Biorisk is an Unhelpful Analogy for AI Risk

Davidmanheim

There are two main areas of catastrophic or existential risk which have recently received significant attention; biorisk, from natural sources, biological accidents, and biological weapons, and artificial intelligence, from detrimental societal impacts of systems, incautious or intentional misuse of highly capable systems, and direct risks from agentic AGI/ASI. These have been compared extensively in research, and have even directly inspired policies. Comparisons are often useful, but in this case, I think the disanalogies are much more compelling than the analogies. Below, I lay these out piecewise, attempting to keep the pairs of paragraphs describing first biorisk, then AI risk, parallel to each other.

While I think the disanalogies are compelling, comparison can still be useful as an analytic tool - while keeping in mind that the ability to directly...

(See More – 707 more words)

Steven Byrnes16m20

This is an 800-word blog post, not 5 words. There’s plenty of room for nuance.

The way it stands right now, if there’s a conversation like:

Person A: It’s not inconceivable that the world might wildly under-invest in societal resilience against catastrophic risks even after a “warning shot” for AI. Like for example, look at the case of bio-risks—COVID just happened, so the costs of novel pandemics are right now extremely salient to everyone on Earth, and yet, (…etc.).
Person B: You idiot, bio-risks are not at all analogous to AI. Look at this blog post by Dav

... (read more)

2faul_sname7h

"Immunology" and "well-understood" are two phrases I am not used to seeing in close proximity to each other. I think with an "increasingly" in between it's technically true - the field has any model at all now, and that wasn't true in the past, and by that token the well-understoodness is increasing. But that sentence could also be iterpreted as saying that the field is well-understood now, and is becoming even better understood as time passes. And I think you'd probably struggle to find an immunologist who would describe their field as "well-understood". My experience has been that for most basic practical questions the answer is "it depends", and, upon closet examination, "it depends on some stuff that nobody currently knows". Now that was more than 10 years ago, so maybe the field has matured a lot since then. But concretely, I expect if you were to go up to an immunologist and say "I'm developing a novel peptide vaccine from the specifc abc surface protein of the specific xyz virus. Can you tell me whether this will trigger an autoimmune response due to cross-reactivity" the answer is going to be something more along the lines of "lol no, run in vitro tests followed by trials (you fool!)" and less along the lines of "sure, just plug it in to this off-the-shelf software".

2Davidmanheim3h

I agree that we do not have an exact model for anything in immunology, unlike physics, and there is a huge amount of uncertainty. But that's different than saying it's not well-understood; we have clear gold-standard methods for determining answers, even if they are very expensive. This stands in stark contrast to AI, where we don't have the ability verify that something works or is safe at all without deploying it, and even that isn't much of a check on its later potential for misuse. But aside from that, I think your position is agreeing with mine much more than you imply. My understanding is that we have newer predictive models which can give uncertain but fairly accurate answers to many narrow questions. (Older, non-ML methods also exist, but I'm less familiar with them.) In your hypothetical case, I expect that the right experts can absolutely give indicative answers about whether a novel vaccine peptide is likely or unlikely to have cross-reactivity with various immune targets, and the biggest problem is that it's socially unacceptable to assert confidence in anything short of tested and verified case. But the models can get, in the case of the Zhang et al paper above, 70% accurate answers, which can help narrow the problem for drug or vaccine discovery, then they do need to be followed with in vitro tests and trials.

2Davidmanheim8h

I'm arguing exactly the opposite; experts want to make comparisons carefully, and those trying to transmit the case to the general public should, at this point, stop using these rhetorical shortcuts that imply wrong and misleading things.

Shortform

lc37m20

Robin Hanson has apparently asked the same thing. It seems like such a bizarre question to me:

Most people do not have the constitution or agency for criminal murder
Most companies do not have secrets large enough that assassinations would reduce the size of their problems on expectation
Most people who work at large companies don't really give a shit if that company gets fined and so they don't have the motivation to personally risk anything organizing murders to prevent lawsuits

4ChristianKl16h

Most companies don't threaten their employees with physical violence. According to another Boeing whistleblower Sam Salehpour, that seems to happen at Boeing. Being a defense contractor, I would expect Boeing corporate to have better relationships with the kind of people you would hire for such a task than corporations.

Reviewing the Structure of Current AI Regulations

Deric Cheng, Elliot_Mckernon

This report is one in a series of ~10 posts comprising a 2024 State of the AI Regulatory Landscape Review, conducted by the Governance Recommendations Research Program at Convergence Analysis. Each post will cover a specific domain of AI governance (such as incident reporting, safety evals, model registries, and more). We’ll provide an overview of existing regulations, focusing on the US, EU, and China as the leading governmental bodies currently developing AI legislation. Additionally, we’ll discuss the relevant context behind each domain and conduct a short analysis.

This series is intended to be a primer for policymakers, researchers, and individuals seeking to develop a high-level overview of the current AI governance space. We’ll publish individual posts on our website and release a comprehensive report at the end of this series.

In this post,...

(Continue Reading – 3875 more words)

Rapid capability gain around supergenius level seems probable even without intelligence needing to improve intelligence

Towards_Keeperhood, Davanchama

20h

TLDR:

Around Einstein-level, relatively small changes in intelligence can lead to large changes in what one is capable to accomplish.
1. E.g. Einstein was a bit better than the other best physi at seeing deep connections and reasoning, but was able to accomplish much more in terms of impressive scientific output.
There are architectures where small changes can have significant effects on intelligence.
1. E.g. small changes in human-brain-hyperparameters: Einstein’s brain didn’t need to be trained on 3x the compute than normal physics professors for him to become much better at forming deep understanding, even without intelligence improving intelligence.

Einstein and the heavytail of human intelligence

1905 is often described as the "annus mirabilis" of Albert Einstein. He founded quantum physics by postulating the existence of (light) quanta, explained Brownian motion, introduced the special relativity theory and...

(Continue Reading – 1063 more words)

Towards_Keeperhood1h10

I think research on what you propose should definitely not be public and I'd recommend against publicly trying to push this alignment agenda.

1Towards_Keeperhood1h

(I think) Planck found the formula that matched the empirically observed distribution, but had no explanation for why it should hold. Einstein found the justification for this formula.

1RussellThor8h

OK but if that were true then there would have been many more Einstein like breakthroughs since then. More likely is that such low hanging fruit have been plucked and a similar intellect is well into diminishing returns. That is given our current technological society and >50 year history of smart people trying to work on everything if there are such breakthroughs to be made, then the IQ required is now higher than in Einsteins day.

4Lukas_Gloor13h

I lean towards agreeing with the takeaway; I made a similar argument here and would still bet on the slope being very steep inside the human intelligence level.

An Interpretability Illusion for Activation Patching of Arbitrary Subspaces

Georg Lange, Alex Makelov, Neel Nanda

Ω 348mo

Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort

We would like to thank Atticus Geiger for his valuable feedback and in-depth discussions throughout this project.

tl;dr:

Activation patching is a common method for finding model components (attention heads, MLP layers, …) relevant to a given task. However, features rarely occupy entire components: instead, we expect them to form non-basis-aligned subspaces of these components.

We show that the obvious generalization of activation patching to subspaces is prone to a kind of interpretability illusion. Specifically, it is possible for a 1-dimensional subspace patch in the IOI task to significantly affect predicted probabilities by activating a normally dormant pathway outside the IOI circuit. At the same time, activation patching the entire MLP layer where this subspace lies has no such effect. We call this an "MLP-In-The-Middle" illusion.

We show a simple mathematical model of how this situation may arise more generally, and a priori / heuristic arguments for why it may be common in real-world LLMs.

Introduction

The linear representation hypothesis suggests that language models represent concepts as meaningful directions (or subspaces, for non-binary features) in the much larger space of possible activations. A central goal of mechanistic interpretability is to discover these subspaces and map them to interpretable variables, as they form the “units” of model computation.

However, the residual stream activations (and maybe even the neuron activations!) mostly don’t have a privileged basis. This means that many meaningful subspaces won’t be basis-aligned; rather than iterating over possible neurons and sets of neurons, we need to consider arbitrary subspaces of activations. This is a much larger search space! How can we navigate it?

A natural approach to check “how well” a subspace represents a concept is to use a subspace analogue of the activation patching technique. You run the model on input A, but with the activation along the subspace taken from an input B that differs from A only in the value of the concept in question. If the subspace encodes the information used by the model to distinguish B from A, we expect to see a corresponding change in model behavior (compared to just running on A).

Surpri...

Filip Sondej1h10

What if we constrain v to be in some subspace that is actually used by the MLP? (We can get it from PCA over activations on many inputs.)

This way v won't have any dormant component, so the MLP output after patching also cannot use that dormant pathway.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Beauty and the Bets

Ape in the coat

1mo

This is the ninth post in my series on Anthropics. The previous one is The Solution to Sleeping Beauty.

Introduction

There are some quite pervasive misconceptions about betting in regards to the Sleeping Beauty problem.

One is that you need to switch between halfer and thirder stances based on the betting scheme proposed. As if learning about a betting scheme is supposed to affect your credence in an event.

Another is that halfers should bet at thirders odds and, therefore, thirdism is vindicated on the grounds of betting. What do halfers even mean by probability of Heads being 1/2 if they bet as if it's 1/3?

In this post we are going to correct them. We will understand how to arrive to correct betting odds from both thirdist and halfist positions, and...

(Continue Reading – 3462 more words)

Duschkopf1h10

„Whether or not your probability model leads to optimal descision making is the test allowing to falsify it.“

Sure, I don‘t deny that. What I am saying is, that your probability model don‘t tell you which probability you have to base on a certain decision. If you can derive a probability from your model and provide a good reason to consider this probability relevant to your decision, your model is not falsified as long you arrive at the right decision. Suppose a simple experiment where the experimenter flips a fair coin and you have to guess if Tails or Hea... (read more)

use smileys! (reflections on how to make society's interpretive priors more charitable)

Emrik

Hi : )

I used to use smileys in my writing all the time (more than I do now!). but then I read Against Disclaimers, and I thought that every time I used a smiley I wud make people who don't use smileys seem less friendly (bc my conspicuous-friendliness wud be available as a contrast to others' behaviour). so instead, my strategy for maximizing friendliness in the world became:

if I just have the purest of kindness in my heart while I interacting with ppl, and use plain words with no extra signalling, I will make plain words seem more friendly in general.

this was part of a general heuristic strategy: "to marginally move society in the direction of a better interpretive equilibrium, just act like that equilibrium is already...

(See More – 101 more words)

Does reducing the amount of RL for a given capability level make AI safer?

Chris_Leong, porby

Some people have suggested that a lot of the danger of training a powerful AI comes from reinforcement learning. Given an objective, RL will reinforce any method of achieving the objective that the model tries and finds to be successful including things like deceiving us or increasing its power.

If this were the case, then if we want to build a model with capability level X, it might make sense to try to train that model either without RL or with as little RL as possible. For example, we could attempt to achieve the objective using imitation learning instead.

However, if, for example, the alternate was imitation learning, it would be possible to push back and argue that this is still a black-box that uses gradient descent so we...

(See More – 82 more words)

2Chris_Leong9h

You mention that society may do too little of the safer types of RL. Can you clarify what you mean by this?

5porby10h

Calling MuZero RL makes sense. The scare quotes are not meant to imply that it's not "real" RL, but rather that the category of RL is broad enough that it belonging to it does not constrain expectation much in the relevant way. The thing that actually matters is how much the optimizer can roam in ways that are inconsistent with the design intent. For example, MuZero can explore the superhuman play space during training, but it is guided by the structure of the game and how it is modeled. Because of that structure, we can be quite confident that the optimizer isn't going to wander down a path to general superintelligence with strong preferences about paperclips.

Steven Byrnes2h40

Right, and that wouldn’t apply to a model-based RL system that could learn an open-ended model of any aspect of the world and itself, right?

I think your “it is nearly impossible for any computationally tractable optimizer to find any implementation for a sparse/distant reward function” should have some caveat that it only clearly applies to currently-known techniques. In the future there could be better automatic-world-model-builders, and/or future generic techniques to do automatic unsupervised reward-shaping for an arbitrary reward, such that AIs could find out-of-the-box ways to solve hard problems without handholding.

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Introduction

Einstein and the heavytail of human intelligence

tl;dr:

Introduction

Introduction

Fooming Shoggoths Dance Concert

June 1st at LessOnline