LESSWRONG
LW

Roman Malov's Shortform

by Roman Malov
19th Dec 2024
1 min read
30

3

This is a special post for quick takes by Roman Malov. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Roman Malov's Shortform
30Roman Malov
4Gyrodiot
3Roman Malov
2Morpheus
2Roman Malov
2Morpheus
17Roman Malov
11Roman Malov
2Dagon
1Roman Malov
2Dagon
2Vladimir_Nesov
2dr_s
2Vladimir_Nesov
1Roman Malov
2Vladimir_Nesov
6Roman Malov
12Buck
5Garrett Baker
4Viliam
2Garrett Baker
2the gears to ascension
2Garrett Baker
5Thane Ruthenis
2Richard_Kennaway
2CstineSublime
2Garrett Baker
3Roman Malov
2Roman Malov
1Roman Malov
30 comments, sorted by
top scoring
Click to highlight new comments since: Today at 5:18 PM
[-]Roman Malov3mo306

It doesn't take 400 years to learn physics and get to the frontier.

But staying on the frontier seems to be a really hard job. Lots of new research comes every day, and scientists struggle to follow it. New research has lots of value while it's hot, and loses it as the field progresses and finds itself a part of general theory (and learning it is a much more worthwhile use of time).

Which does introduce the question: if you are not currently at the cutting edge and actively advancing your field, why follow new research at all? After a bit of time, the field would condense the most important and useful research into neat textbooks and overview articles, and reading them when they appear would be a much more efficient use of time. While you are not at the cutting edge — read condensations of previous works until you get there.

Also, it seems like there is not much of that in the field of alignment. I want there to be more work on unifying (previously frontier) alignment research and more effort to construct paradigms in this preparadigmatic field (but maybe I just haven't looked hard enough).

Reply
[-]Gyrodiot3mo42

Two separate points:

  • compared to physics, the field of alignment has a slow-changing set of questions (e.g. corrigibility, interpretability, control, goal robustness, etc.) but a fast-evolving subject matter, as capability progresses. I use the analogy of a biologist suddenly working on a place where evolution runs 1000x faster, some insights get stale very fast and it's hard to know which ones in advance. Keeping up with the frontier is, then, used to know whether one's work still seems relevant (or where to send newcomers). Agent foundations as a class of research agendas was the answer to this volatility, but progress is slow and the ground keeps shifting.
  • there is some effort to unify alignment research, or at least provide a textbook to get to the frontier. My prime example is the AI Safety Atlas, I would also consider the BlueDot courses as structure-building, AIsafety.info as giving some initial directions. There's also a host of papers attempting to categorize the sub-problems but they're not focused on tentative answers.
Reply
[-]Roman Malov2mo30

A much better version of this idea: https://slatestarcodex.com/2017/11/09/ars-longa-vita-brevis/

Reply
[-]Morpheus3mo20

Also, it seems like there is not much of that in the field of alignment. I want there to be more work on unifying (previously frontier) alignment research and more effort to construct paradigms in this preparadigmatic field (but maybe I just haven't looked hard enough)

I am surprised regarding the lack of distillation claim. I'd naively expected that to be more neglected in physics compared to alignment. Is there something in particular that you think could be more distilled?

Regarding research that tries to come up with new paradigms, here are a few reasons why you might not be observing that much: I guess that is less funded by the big labs and is spread across all kinds of orgs or individuals. Maybe check MIRI, PIBBBS, ARC (theoretical research), Conjecture check who went to ILIAD. More of these researchers didn't publish all their research compared to AI safety researchers at AGI labs, so you would not have been aware it was going on? Some are also actively avoiding researching things that could be easily applied and tested, because of capability externalities (I think Vanessa Kosoy mentions this somewhere in the YouTube videos on Infrabayesianism).

Reply
[-]Roman Malov3mo20

Is there something in particular that you think could be more distilled?

What I had in mind is something like a more detailed explanation of recent reward hacking/misalignment results. Like, sure, we have old arguments about reward hacking and misalignment, but what I want is more gears for when particular reward hacking would happen in which model class.

Maybe check MIRI, PIBBBS, ARC (theoretical research), Conjecture check who went to ILIAD.

Those are top-down approaches, where you have an idea and then do research for it, which is, of course, useful, but that's doing more frontier research via expanding surface area. Trying to apply my distillation intuition to them would be like having some overarching theory unifying all approaches, which seems super hard and maybe not even possible. But looking at the intersection of pairs of agendas might prove useful.

Reply1
[-]Morpheus3mo20

The neuroscience/psychology rather than ml side of the alignment problem seems quite neglected (because it harder on the one hand, but it's easier to not work on something capabilities related if you just don't focus on the cortex). There's reverse engineering human social instincts. In principle would benefit from more high quality experiments in mice, but those are expensive.

Reply
[-]Roman Malov9mo174

I recently prepared an overview lecture about research directions in AI alignment for the Moscow AI Safety Hub. I had limited time, so I did the following: I reviewed all the sites on the AI safety map, examined the 'research' sections, and attempted to classify the problems they tackle and the research paths they pursue. I encountered difficulties in this process, partly because most sites lack a brief summary of their activities and objectives (Conjecture is one of the counterexamples). I believe that the field of AI safety would greatly benefit from improved communication, and providing a brief summary of a research direction seems like low-hanging fruit.

Reply
[-]Roman Malov2mo110

Why learning about determinism leads to confusion about free will?

When someone is doing physics (tries to find out what happens with a physical system knowing it initial conditions), they are performing the transformation from the time-consuming-but-easy-to-express form of connecting the initial conditions to the end result (physical laws), to a form of a single entry in the giant look-up table which matches initial conditions to the end result (which is not-time-consuming-but-harder-to-express form), essentially flattening out the time dimension. That creates a feeling that the process that they are analyzing is pre-determined, that this giant look-up table already exists. And when they apply it to themselves, this can create a feeling of no control over their own actions, like those observation-action pairs are drawn from that pre-existing table. But this table doesn't actually exist; they still need to perform the computation to get to the action; there is no way around it. Wherever the process is performed, that process is the person.

In other words, when people do physics on simple enough systems that they can fit in their head both the initial conditions and the end result and the connection between them, they feel a sense of "machineness" about those systems. They can overgeneralize that feeling over all physical systems (like humans), missing out on the fact that this feeling should only be felt when they actually can fit the model of the system (and initial conditions/end results entries) in their head, which they don't in the case of humans.

Reply
[-]Dagon2mo20

They can overgeneralize that feeling over all physical systems (like humans), missing out on the fact that this feeling should only be felt

I don't follow why this is "overgeneralize" rather than just "generalize".  Are you saying it's NOT TRUE for complex systems, or just that we can't fit it in our heads?   I can't compute the Mandelbrot Set in my head, and I can't measure initial conditions well enough to predict a multi-arm pendulum beyond a few seconds.  But there's no illusion of will for those things, just a simple acknowledgement of complexity.

Reply
[-]Roman Malov2mo1-2

The "will" is supposedly taken away by GLUT, which is possible to create and have a grasp of it for small systems, then people (wrongly) generalize this for all systems including themselves. I'm not claiming that any object that you can't predict has a free will, I'm saying that having ruled out free will from a small system will not imply lack of free will in humans. I'm claiming "physicality ⇏ no free will" and "simplicity ⇒ no free will", I'm not claiming "complexity ⇒ free will".

Reply
[-]Dagon2mo20

Hmm.  What about the claim "pysicality -> no free will".  This is the more common assertion I see, and the one I find compelling.  

The simplicity/complexity I often see attributed to "consciousness" (and I agree: complexity does not imply consciousness, but simplicity denies it), but that's at least partly orthogonal to free will.

Reply
[-]Vladimir_Nesov2mo20

I'm claiming ... "simplicity ⇒ no free will"

Consider the ASP problem, where the agent gets to decide whether it can be predicted, whether there is a dependence of the predictor on the agent. The agent can destroy the dependence by knowing too much about the predictor and making use of that knowledge. So this "knowing too much" (about the predictor) is what destroys the dependence, but it's not just a consequence of the predictor being too simple, but rather of letting an understanding of predictor's behavior precede agent's behavior. It's in the agent's interest to not let this happen, to avoid making use of this knowledge (in an unfortunate way), to maintain the dependence (so that it gets to predictably one-box).

So here, when you are calling something simple as opposed to complicated, you are positing that its behavior is easy to understand, and so it's easy to have something else make use of knowledge of that behavior. But even when it's easy, it could be avoided intentionally. So even simple things can have free will (such as humans in the eyes of a superintelligence), from a point of view that decides to avoid knowing too much, which can be a good thing to do, and as the ASP problem illustrates can influence said behavior (the behavior could be different if not known, as the fact of not-being-known could happen to be easily knowable to the behavior).

Reply
[-]dr_s2mo20

I'd say this is correct, but it's also deeply counterintuitive. We don't feel like we are just a process performing itself, or at least that's way too abstract to wrap our heads around. The intuitive notion of free will is IMO something like the following:

had I been placed ten times in exactly the same circumstances, with exactly the same input conditions, I could theoretically have come up with different courses of action in response to them, even though one of them may make a lot more sense for me, based on some kind of ineffable non-deterministic quality that however isn't random either, but it's the manifestation of a self that exists somehow untethered from the laws of causality

Of course not exactly worded that way in most people's minds, but I think that's really the intuition that clashes against pure determinism. It's a materialistic viewpoint, and lots of people are consciously or not dualists - implicitly assuming there's one special set of rules that applies to the self/mind/soul that doesn't apply to everything else.

Reply
[-]Vladimir_Nesov2mo20

Some confusion remains appropriate, because for example there is still no satisfactory account of a sense in which the behavior of one program influences the behavior of another program (in the general case, without constructing these programs in particular ways), with neither necessarily occurring within the other at the level of syntax. In this situation, the first program could be said to control the second (especially if it understands what's happening to it), or the second program could be said to perform analysis of (reason about) the first.

Reply
[-]Roman Malov2mo10

What do you mean by programs here?

Reply
[-]Vladimir_Nesov2mo20

Just Turing machines / lambda terms, or something like that. And "behavior" is however you need to define it to make a sensible account of the dependence between "behaviors", or of how one of the "behaviors" produces a static analysis of the other. The intent is to capture a key building block of acausal consequentialism in a computational setting, which is one way of going about formulating free will in a deterministic world.

(You don't just control the physical world through your physical occurrence in it, but also for example through the way other people are reasoning about your possible behaviors, and so an account that simply looks for your occurrence in the world as a subterm/part misses an important aspect of what's going on. As Turing machines also illustrate, not having subterm/part structure.)

Reply
[-]Roman Malov2mo60

What is the operation with money that represents destruction of value?

Money is a good approximation for what people value. Value can be destroyed. But what should I do to money to destroy the value it encompasses?

I might feel bad if somebody stole my wallet, but that money hasn't been destroyed; it is just now going to bring utility to another human, and if I (for some weird reason) value the quality of life of the robber just as much as my own, I wouldn't even think something bad has happened.

If I actually destroy money, like burn it to ashes, then there will be less money in circulation, which will increase the value of each banknote, making everyone a bit richer (and me a little poorer). So is it balanced in that case?

Maybe I need to read some economics, please recommend me some book which would dissolve the question.

Reply
[-]Buck2mo124

Buy something with it and destroy that.

Reply
[-]Garrett Baker2mo50

If you are destroying something you own, you would value the destruction of that thing more than any other use you have for that thing and any price you could sell it for on the market, so this creates value in the sense that there is no deadweight loss to the relevant transactions/actions.

Reply1
[-]Viliam2mo41

This sounds like by definition value cannot be destroyed intentionally.

Reply
[-]Garrett Baker2mo20

You can destroy others’ value intentionally, but only in extreme circumstances where you’re not thinking right or have self-destructive tendencies can you “intentionally” destroy your own value. But then we hardly describe the choices such people make as “intentional”. Eg the self-destructive person doesn’t “intend” to lose their friends by not paying back borrowed money. And those gambling at the casino, despite not thinking right, can’t be said to “intend” to lose all their money, though they “know” the chances they’ll succeed.

Reply
[-]the gears to ascension2mo20

You might not value the destruction as much as others valued the thing you destroyed. In other words, you're assuming homo economicus, I'm not.

Reply
[-]Garrett Baker2mo20

To complete your argument, ‘and therefore the action has some deadweight loss associated with it, meaning its destroying value’.

But note that by the same logic, any economic activity destroys value, since you are also not homo economicus when you buy ice cream, and there will likely be smarter things you can do with your money, or better deals. Therefore buying ice cream, or doing anything else destroys value.

But that is absurd, and we clearly don’t have a so broad definition of “destroy value”. So your argument proves too much.

Reply
[-]Thane Ruthenis2mo50

Money is a claim on things other people value. You can't destroy value purely by doing something with your claim on that value.

Except the degenerate case of "making yourself or onlookers sad by engaging in self-destructive behaviors where you destroy your claim on resources", I guess. But it's not really an operation purely with money.

Hmm, I guess you can make something's success conditional on your having money (e. g., a startup backed by your investments), and then deliberately destroy your money, dooming the thing. But that's a very specific situation and it isn't really purely about the money either; it's pretty similar to "buy a thing and destroy it". Closest you can get, I think?

(Man, I hope this is just a concept-refinement exercise and I'm not giving someone advice on how to do economics terrorism.)

Reply1
[-]Richard_Kennaway2mo20

(Epistemic status: not an economist.)

Money is not value, but the absence of value. Where money is, it can be spent, replacing the money by the thing bought. The money moves to where the thing was.

Money is like the empty space in a sliding-block puzzle. You must have the space to be able to slide the blocks around, instead of spotting where you can pull out several at once and put them back in a different arrangement.

Money is the slack in a system of exchange that would otherwise have to operate by face-to-face barter or informal systems of credit. Informal, because as soon as you formalise it, you've reinvented money.

Reply
[-]CstineSublime2mo2-2

IANAE. This is a really interesting riddle. Because even in incidents of fraud or natural disaster, from an economic standpoint the intrinsic value isn't lost: if a distillery full of barrels of whisky goes up in flames and there's nothing recoverable - then elsewhere in the whisky market you would presume that the prices would go up as scarcity is now greater than demand and you would expect that "loss" to be dispersed as a gain through their competitors - you would think. (Not to mention the expenditure of the distiller to their suppliers and employees - any money that changed hands they keep - so the Opportunity Cost of the whisky didn't go up in smoke). 

I say "you would think" because Price elasticity is it isn't necessarily instantaneous nor is it perfect - the correction in prices can be delayed especially if information is delayed. Like you said - money is a good approximation of what people value but there is a certain amount of noise and lag.

For example, what if there is no elasticity in whisky markets? What if there was already an oversupply and the distiller was never going to recoup their investment (if the fire didn't wipe them out). It's really interesting because in theory they would  have to drop their prices until someone would buy it. But not only is information not instantaneous, there's no certainty that it would happen like that.

You might be interested in reading George Soros' speech on Reflexivity which describes how sometimes the intrinsic value of things (like financial securities) and their market value grow further or closer together. What's interesting is that if perception and prices rise, this can actually have a changing effect on intrinsic value higher or lower. 

No one ever knows precisely what the intrinsic value is at, and since it is reflexive and affected by the market value, this makes it much more elusive.

Really somewhere along the line value is being created, because whenever someone develops a more efficient means of producing the same output that is making the value of a dollar increase since the same output can be bought for less. That suggests that value can also be destroyed if those techniques or abilities are lost (i.e. the last COBOL coder dies and there's no one to replace him so they have to use a less efficient system) - but I think most real world examples of it are probably due to poor flow of information or misinformation.

At the end of the day it all feels suspiciously close to Aristotle's Potentiality and Actuality Dichotomy.

Reply
[-]Garrett Baker2mo20

Just buy something with negative externalities. Eg invest in the piracy stock exchange.

Reply
[-]Roman Malov11d33

People often say, "Oh, look at this pathetic mistake AI made; it will never be able to do X, Y, or Z." But they would never say to a child who made a similar mistake that they will never amount to doing X, Y, or Z, even though the theoretical limits on humans are much lower than for AI.

Reply
[-]Roman Malov12d20

Idea status: butterfly idea

In real life, there are too many variables to optimize each one. But if a variable is brought to your attention, it is probably important enough to consider optimizing it.

Negative example: you don’t see your eyelids; they are doing their job of protecting your eyes, so there’s no need to optimize them.

Positive example: you tie your shoelaces; they are the focus of your attention. Can this process be optimized? Can you learn to tie shoelaces faster, or learn a more reliable knot?

Humans already do something like this, but mostly consider optimizing a variable when it annoys them. I suggest widening the consideration space because the “annoyance” threshold is mostly emotional and therefore probably optimized for a world with far fewer variables and much smaller room for improvement (though I only know evolutionary psychology at a very surface level and might be wrong).

Reply
[-]Roman Malov5mo10

Rule and Example

Rules can generate examples. For instance: DALLE-3 is a rule according to which different examples (images) are generated.

From examples, rules can be inferred. For example: with a sufficient dataset of images and their names, a DALLE-3 model can be trained on it.

In computer science, there is a concept called Kolmogorov complexity of data. It is (roughly) defined as the length of the shortest program capable of producing that data.

Some data are simple and can be compressed easily; some are complex and harder to compress. In a sense, the task of machine learning is to find a program of a given size that serves as a "compression" of the dataset.

In the real world, although knowing the underlying rule is often very useful, sometimes it is more practical to use a giant look-up table (GLUT) of examples. Sometimes you need to memorize the material instead of trying to "understand" it.

Sometimes there are examples that are more complex than the rule that generated them. For example, in the interval [0;1] (which is quite easy to describe, the rule being: all numbers are not greater than 1 and not less than 0), there exists a number containing all the works of Shakespeare (which definitely cannot be compressed to a description comparable to that of the interval [0;1]). 

Or, сonsider the program that outputs every natural number from 1 to 101020 (which is very short, because the Kolmogorov complexity of 101020 is low) will at some point produce a binary encoding of LOTR. In that case, the complexity lies in the starting index, the map for finding the needle in the haystack is as valuable (and as complex) as the needle itself.

Properties follow from rules. It is not necessary to know about every example of a rule in order to have some information about all of them. Moreover, all examples together can have less information (or Kolmogorov complexity) than sum of individual Kolmogorov complexities (as in example above).

Reply
Moderation Log
More from Roman Malov
View more
Curated and popular this week
30Comments
Mentioned in
12An Analogy for Interpretability