Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a special post for quick takes by Quintin Pope. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
9 comments, sorted by Click to highlight new comments since: Today at 5:16 PM

If singular learning theory turns out to capture a majority of the inductive biases of deep learning, then I don't think there will be anything "after" deep learning. I think deep learning represents the final paradigm for capable learning systems. 

Any learning systems worth considering must have a lot of flexible internal representational capacity, which it can adapt to model external reality, as well as some basically local update rule for making the system better at modeling reality. Unless this representational capacity is specifically initialized to respect every possible symmetry in the data it will ever try to understand (which seems ~impossible, and like it defeats the entire point of having a learning system at all), then the internal state of the system's representations will be underdetermined by the data at hand.

This means the system will have many internal states whose behaviors are ~ functionally equivalent under whatever learning rules it uses to adapt to data. At this point, singular learning theory applies, regardless of the specifics of your update rule. You might not have an explicitly defined "loss function", you might not use something you call "gradient descent" to modify the internals of your system, and you might not even use any matrix math at all, but at its core, your system will basically be doing deep learning. It's inductive biases will mostly come from how the geometry of the data you provide it interacts with the local properties of the update rule, and you'll see it make similar generalizations to a deep learning system (assuming both have learned from a large enough amount of data).

Whether singular learning theory actually yields you anything useful when your optimiser converges to the largest singularity seems very much architecture dependent though? If you fit a 175 billion degree polynomial to the internet to do next token prediction, I think you'll get out nonsense, not GPT-3. Because a broad solution in the polynomial landscape does not seem to equal a Kolomogorov-simple solution to the same degree it does with MLPs or transformers.

Likewise, there doesn't seem to be anything saying you can't have an architecture with an even better simplicity and speed bias than the MLP family has.

This seems correct to me.

One research direction I'd like to see is basically "extracting actionable intel from language model malware". 

I expect that the future will soon contain self-replicating AI viruses, that come with (small) LMs finetuned for hacking purposes, so that the malware can more flexibly adapt to whatever network it's trying to breach and recognize a wider range of opportunities for extracting money from victims. 

Given this, I expect there's value in having some tools ready to help dissect the behavioral patterns of malware LMs, which might reveal important information about how they were trained, how they attack targets, how their C&C infrastructure works, and how to subvert that infrastructure.

Current language models are probably sometimes sort-of sentient, and they probably sometimes suffer when instantiated as certain types of persona. 

Computationally speaking, sentience is probably actually a mix of different types of computational processes that collectively handle stuff like "what's the relationship between external events and my own future computations" or "what sort of computations am I currently running, and what sort of computations do I want to run in the near future". I doubt it's some binary "yes" / "no" (or even a continuous scale) of "the true essence of sentience", just like "being alive" isn't related to some "true essence of life", but a composite measure of many different types of chemical processes that collectively implement stuff like "maintaining homeostasis" and "gather / distribute energy in a usable form".

Given that, it seems likely that language models implement at least some of the computational sub-components of sentience. In particular, certain personas appear to have some notion of "what sorts of thoughts am I currently instantiating?" (even if it's only accessible to them by inferring from past token outputs), and they sometimes appear to express preferences in the category of "I would prefer to be thinking in a different way to what I currently am". I think that at least some of these correspond to morally relevant negative experiences.

Do you think a character being imagined by a human is ever itself sentient?

That seems extremely likely to me (that it happens at all, not that it's common).

If you imagine the person vividly enough, you'll start to feel their emotions yourself...

Idea for using current AI to accelerate medical research: suppose you were to take a VLM and train it to verbally explain the differences between two image data distributions. E.g., you could take 100 dog images, split them into two classes, insert tiny rectangles into class 1, feed those 100 images into the VLM, and then train it to generate the text "class 1 has tiny rectangles in the images". Repeat this for a bunch of different augmented datasets where we know exactly how they differ, aiming for a VLM with a general ability to in-context learn and verbally describe the differences between two sets of images. As training processes, keep making there be more and subtler differences, while training the VLM to describe all of them.

Then, apply the model to various medical images. E.g., brain scans of people who are about to develop dementia versus those who aren't, skin photos of malignant and non-malignant blemishes, electron microscope images of cancer cells that can / can't survive some drug regimen, etc. See if the VLM can describe any new, human interpretable features.

The VLM would generate a lot of false positives, obviously. But once you know about a possible feature, you can manually investigate whether it holds to distinguish other examples of the thing you're interested in. Once you find valid features, you can add those into the training data of the VLM, so it's no longer just trained on synthetic augmentations.

You might have to start with real datasets that are particularly easy to tell apart, in order to jumpstart your VLM's ability to accurately describe the differences in real data.

The other issue with this proposal is that it currently happens entirely via in context learning. This is inefficient and expensive (100 images is a lot for one model at once!). Ideally, the VLM would learn the difference between the classes by actually being trained on images from those classes, and learn to connect the resulting knowledge to language descriptions of the associated differences through some sort of meta learning setup. Not sure how best to do that, though.