Leon Lang

I'm a PhD student at the University of Amsterdam. I have research experience in multivariate information theory and equivariant deep learning and recently got very interested into AI alignment. https://langleon.github.io/

Wiki Contributions

Comments

"AI achieves silver-medal standard solving International Mathematical Olympiad problems"

Leon Lang2d20

The news is not very old yet. Lots of potential for people to start freaking out.

Leon Lang's Shortform

Leon Lang6d20

One question: Do you think Chinchilla scaling laws are still correct today, or are they not? I would assume these scaling laws depend on the data set used in training, so that if OpenAI found/created a better data set, this might change scaling laws.

Do you agree with this, or do you think it's false?

Leon Lang's Shortform

Leon Lang8d140

https://x.com/sama/status/1813984927622549881

According to Sam Altman, GPT-4o mini is much better than text-davinci-003 was in 2022, but 100 times cheaper. In general, we see increasing competition to produce smaller-sized models with great performance (e.g., Claude Haiku and Sonnet, Gemini 1.5 Flash and Pro, maybe even the full-sized GPT-4o itself). I think this trend is worth discussing. Some comments (mostly just quick takes) and questions I'd like to have answers to:

Should we expect this trend to continue? How much efficiency gains are still possible? Can we expect another 100x efficiency gain in the coming years? Andrej Karpathy expects that we might see a GPT-2 sized "smart" model.
What's the technical driver behind these advancements? Andrej Karpathy thinks it is based on synthetic data: Larger models curate new, better training data for the next generation of small models. Might there also be architectural changes? Inference tricks? Which of these advancements can continue?
Why are companies pushing into small models? I think in hindsight, this seems easy to answer, but I'm curious what others think: If you have a GPT-4 level model that is much, much cheaper, then you can sell the service to many more people and deeply integrate your model into lots of software on phones, computers, etc. I think this has many desirable effects for AI developers:
- Increase revenue, motivating investments into the next generation of LLMs
- Increase market-share. Some integrations are probably "sticky" such that if you're first, you secure revenue for a long time.
- Make many people "aware" of potential usecases of even smarter AI so that they're motivated to sign up for the next generation of more expensive AI.
- The company's inference compute is probably limited (especially for OpenAI, as the market leader) and not many people are convinced to pay a large amount for very intelligent models, meaning that all these reasons beat reasons to publish larger models instead or even additionally.
What does all this mean for the next generation of large models?
- Should we expect that efficiency gains in small models translate into efficiency gains in large models, such that a future model with the cost of text-davinci-003 is massively more capable than today's SOTA? If Andrej Karpathy is right that the small model's capabilities come from synthetic data generated by larger, smart models, then it's unclear to me whether one can train SOTA models with these techniques, as this might require an even larger model to already exist.
- At what point does it become worthwhile for e.g. OpenAI to publish a next-gen model? Presumably, I'd guess you can still do a lot of "penetration of small model usecases" in the next 1-2 years, leading to massive revenue increases without necessarily releasing a next-gen model.
- Do the strategies differ for different companies? OpenAI is the clear market leader, so possibly they can penetrate the market further without first making a "bigger name for themselves". In contrast, I could imagine that for a company like Anthropic, it's much more important to get out a clear SOTA model that impresses people and makes them aware of Claude. I thus currently (weakly) expect Anthropic to more strongly push in the direction of SOTA than OpenAI.

Last 50 Spots until 31th July - LessWrong Community weekend

Leon Lang11d63

I went to this event in 2022 and it was lovely. Will come again this year. I recommend coming!

A simple case for extreme inner misalignment

Leon Lang13d30

Thanks for the answer!

But basically, by "simple goals" I mean "goals which are simple to represent", i.e. goals which have highly compressed representations

It seems to me you are using "compressed" in two very different meanings in part 1 and 2. Or, to be fairer, I interpret the meanings very differently.

I try to make my view of things more concrete to explain:

Compressed representations: A representation is a function from observations of the world state $O$ (or sequences of such observations) into a representation space $R$ of "features". That this is "compressed" means (a) that in $R$ , only a small number of features are active at any given time and (b) that this small number of features is still sufficient to predict/act in the world.

Goals building on compressed representations: A goal is a (maybe linear) function $U : R \to R$ from the representation space into the real numbers. The goal "likes" some features and "dislikes" others. (Or if it is not entirely linear, then it may like/dislike some simple combinations/compositions of features)

It seems to me that in part 2 of your post, you view goals as compositions $U \circ f : O \to R$ . Part 1 says that $f$ is highly compressed. But it's totally unclear to me why the composition $U \circ f$ should then have the simplicity properties you claim in part 2, which in my mind don't connect with the compression properties of $f$ as I just defined them.

A few more thoughts:

The notion of "simplicity" in part $2$ seems to be about how easy it is to represent a function -- i.e., the space of parameters with which the function $U \circ f$ is represented is simple in your part 2.
The notion of "compression" in part 1 seems to be about how easy it is to represent an input -- i.e., is there a small number of features such that their activation tells you the important things about the input?
These notions of simplicity and compression are very different. Indeed, if you have a highly compressed representation $f$ as in part 1, I'd guess that $f$ necessarily lives in a highly complex space of possible functions with many parameters, thus the opposite of what seems to be going on in part 2.

This is largely my fault since I haven't really defined "representation" very clearly, but I would say that the representation of the concept of a dog should be considered to include e.g. the neurons representing "fur", "mouth", "nose", "barks", etc. Otherwise if we just count "dog" as being encoded in a single neuron, then every concept encoded in any neuron is equally simple, which doesn't seem like a useful definition.
(To put it another way: the representation is the information you need to actually do stuff with the concept.)

I'm confused. Most of the time, when seeing a dog, most of what I need is actually just to know that it is a "dog", so this is totally sufficient to do something with the concept. E.g., if I walk on the street and wonder "will this thing bark?", then knowing "my dog neuron activates" is almost enough.

I'm confused for a second reason: It seems like here you want to claim that the "dog" representation is NOT simple (since it contains "fur", "mouth", etc.). However, the "dog" representation needs lots of intelligence and should thus come along with compression, and if you equate compression and simplicity, then it seems to me like you're not consistent. (I feel a bit awkward saying "you're not consistent", but I think it's probably good if I state my honest state of mind at this moment).

To clarify my own position, in line with my definition of compression further above: I think that whether representation is simple/compressed is NOT a property of a single input-output relation (like "pixels of dog gets mapped to dog-neuron being activated"), but instead a property of the whole FUNCTION that maps inputs to representations. This function is compressed if for any given input, only a small number of neurons in the last layer activate, and if these can be used (ideally in a linear way) for further predictions and for evaluating goal-achievement.

I agree that most people who say they are hedonic utilitarians are not 100% committed to hedonic utilitarianism. But I still think it's very striking that they at least somewhat care about making hedonium. I claim this provides an intuition pump for how AIs might care about squiggles too.

Okay, I agree with this, fwiw. :) (Though I may not necessarily agree with claims about how this connects to the rest of the post)

A simple case for extreme inner misalignment

Leon Lang14d30

Thanks for the post!

a. How exactly do 1 and 2 interact to produce 3?

I think the claim is along the lines of "highly compressed representations imply simple goals", but the connection between compressed representations and simple goals has not been argued, unless I missed it. There's also a chance that I simply misunderstand your post entirely.

b. I don't agree with the following argument:

Decomposability over space. A goal is decomposable over space if it can be evaluated separately in each given volume of space. All else equal, a goal is more decomposable if it's defined over smaller-scale subcomponents, so the most decomposable goals will be defined over very small slices of space—hence why we're talking about molecular squiggles. (By contrast, you can't evaluate the amount of higher-level goals like "freedom" or "justice" in a nanoscale volume, even in principle.)

The classical ML-algorithm that evaluates features separately in space is a CNN. That doesn't mean that features in CNNs look for tiny structures, though: The deeper into the CNN you are, the more complex the features get. Actually, deep CNNs are an example of what you describe in argument 1: The features in later layers of CNNs are highly compressed, and may tell you binary information such as "is there a dog", but they apply to large parts of the input image.

Therefore, I'd also expect that what an AGI would care about are ultimately larger-scale structures since the AGI's features will nontrivially depend on the interaction of larger parts in space (and possibly time, e.g. if the AGI evaluates music, movies, etc.).

c. I think this leaves the confusion why philosophers end up favoring the analog of squiggles when they become hedonic utilitarians. I'd argue that the premise may be false since it's unclear to me how what philosophers say they care about ("henonium") connects with what they actually care about (e.g., maybe they still listen to complex music, build a family, build status through philosophical argumentation, etc.)

Leon Lang's Shortform

Leon Lang26d6630

You should all be using the "Google Scholar PDF reader extension" for Chrome.

Features I like:

References are linked and clickable
You get a table of contents
You can move back after clicking a link with Alt+left

Screenshot:

Examples of Highly Counterfactual Discoveries?

Leon Lang3mo30

I guess (but don't know) that most people who downvote Garrett's comment overupdated on intuitive explanations of singular learning theory, not realizing that entire books with novel and nontrivial mathematical theory have been written on it.

A couple productivity tips for overthinkers

Leon Lang3mo30

I do all of these except 3, and implementing a system like 3 is among my deprioritized things in my ToDo-list. Maybe I should prioritize it.

Transformers Represent Belief State Geometry in their Residual Stream

Leon Lang3mo10

Yes the first! Thanks for the link!

LESSWRONG
LW

Posts

Wiki Contributions

Comments