What I Was Thinking About Before Alignment

Every single piece has been tweaked toward the same end goal

I feel like there'll be a better way to say this sentence once we figure out the answer to your first question,

How can we recognize adaptive systems in the wild? What universal behaviors indicate an adaptive optimizer?

It most definitely seems to make sense to say that systems can have goals, in a "if it looks like a duck it makes sense to call it a duck" kind of way. But at the same time, every single piece hasn't been tweaked to the same end goal as the system. They are each tweaked towards their own survival, and that's somewhat aligned with the system's survival.

Something I wish there are more lesswrong posts for (or at least I wish I've seen more lesswrong posts for) is posts exploring alignment in the context of :

Organisms and their smaller replicator components (organisms < cells, cells < transposons& endoviruses & organelles)
Social thingies and their smaller sorta-replicator components (religions < religious ideas, companies < replicating Management ideas)

If you have your favorite post that falls into the above genre or mentions something to that effect, please absolutely link me to it! I'd love to read more.

I'm only halfway through the A-Z sequence, so I'd also very much appreciate it if you could point to things in there to get me excited about progressing through it!

[-]Ricardo Meneghin4y80

I don't think we have hope of developing such tools, at least not in a way that looks like anything we had in the past. In the past we have been able to analyse large systems by throwing away an immense amount of detail - it turns out that you don't need the specific position of atoms to predict the movement of the planets, and you don't need the details to predict all of the other things we have successfully predicted with traditional math.

With the systems you are describing, this is simply impossible. Changing a single bit in a computer can change its output completely, so you can't build a simple abstraction that predicts it, you need to simulate it completely.

We already have a way of taking immense amounts of complicated data and finding patterns in it, it's machine learning itself. If you want to translate what it learned into human readable descriptions, you just have to incorporate language in it - humans after all can describe their reasoning steps and why they believe what they believe (maybe not easily).

Google throws tremendous amounts of data and computational resources into training neural networks, but decoding the internal models used by those networks? We lack the mathematical tools to even know where to start.

I predict this will be done in the coming years by using large multimodal models to analyse neural network parameters, or to explain their own workings.

[-]Thomas Kwa4y40

Changing a single bit in a computer can change its output completely, so you can't build a simple abstraction that predicts it, you need to simulate it completely.

Biology is complex, but changing a single molecule in a bacterium or neuron in a brain doesn't completely change the output because they're evolved to be robust to such things

[-]Rossin4y40

I’m not sure the problem in biology is decoding. At least not in the same sense it is with neural networks. I see the main difficulty in biology more one of mechanistic inference where a major roadblock may be getting better measurements of what is going on in cells over time rather some algorithm that’s just going to be able to overcome the fact that you’re getting both very high levels of molecular noise in biological data and single snapshots in time that are difficult to place in context. With a neural network you have the parameters and it seems reasonable to say you just need some math to make it more interpretable.

Whereas in biology I think we likely need both better measurements and better tools. I’m not sure the same tools would be particularly applicable to the ai interpretability problem either.

If, for example, I managed to create mathematical tools to reliably learn mechanistic dependencies between proteins and/or genes from high dimensional biological data sets, it’s not clear to me that would be easily applicable to extracting bayes nets from large neural networks.

I’m coming at this from a comp bio angle so it’s possible I’m just not seeing the connections well, having not worked in both fields.

[-]Rodrigo Surw4y40

Strong-voted. This is so exciting.
Any specific research avenues where AI and economics research could overlap?

[-]johnswentworth4y80

Around the time I first got into alignment, I was thinking about how to model markets as agents (e.g. what beliefs does a market as a whole have? What goals does it have?). That turned into Why Subagents?.

I also spent a little bit of time reading up on Theory of the Firm, looking for alignment-relevant ideas; there's a lot of stuff there about aligning employees and firms, or when it makes sense for a firm to outsource (i.e. use "subagents") vs do things in-house, etc. That eventually led to the Pointers Problem post (via the ideas in Incentive Design With Imperfect Credit Allocation).

I expect there's plenty more useful analogies to mine on either of those paths, and probably many other paths besides. Though note that this does require a nontrivial skill: one needs to be able to boil down the generalizable "core idea" of an argument, in a form which can carry over to another field.

[-]P.4y30

What would such a representation look like for a computer? There might exist some method for computing how the circuits are divided into modules and submodules, but how would you represent what they do? You don’t expect it to be annotated in natural language, do you?

[-]P.4y30

I mean, just in case I wasn’t clear enough, you want a program that takes in a representation of some system and outputs something a human can understand, right? But even if you could automatically divide a system into a tree of submodules such that a human could in principle describe how any one works in terms of short descriptions of the function of its submodules, there is no obvious way of automatically computing those descriptions. So if you gave a circuit diagram of a CPU as the input to that universal translator, what do you want it to output?

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

78

What I Was Thinking About Before Alignment

78

78

The Scientific Bottleneck

The Problem: General

Bottleneck

Questions