LESSWRONG
LW

tailcalled's Shortform — LessWrong

tailcalled's Shortform

24th Oct 2021

1 min read

6

This is a special post for quick takes by tailcalled. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

61Rationalists are missing a core piece for agent-like structure (energy vs information overload)

19More accurate models can be worse

tailcalled's Shortform

Rendering 277/278 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 1:43 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]tailcalled9mo164

Thesis: Everything is alignment-constrained, nothing is capabilities-constrained.

Examples:

"Whenever you hear a headline that a medication kills cancer cells in a petri dish, remember that so does a gun." Healthcare is probably one of the biggest constraints on humanity, but the hard part is in coming up with an intervention that precisely targets the thing you want to treat, I think often because knowing what exactly that thing is is hard.
Housing is also obviously a huge constraint, mainly due to NIMBYism. But the idea that NIMBYism is due to people using their housing for investments seems kind of like a cope, because then you'd expect that when cheap housing gets built, the backlash is mainly about dropping investment value. But the vibe I get is people are mainly upset about crime, smells, unruly children in schools, etc., due to bad people moving in. Basically high housing prices function as a substitute for police, immigration rules and teacher authority, and those in turn are compromised less because we don't know how to e.g. arm people or discipline children, and more because we aren't confident enough about the targeting (alignment problem), and because we have a hope that

... (read more)

4tailcalled9mo

(Certifications and regulations promise to solve this, but they face the same problem: they don't know what requirements to put up, an alignment problem.)

3ProgramCrafter9mo

An interesting framing! I agree with it. As another example: in principle, one could make a web server use an LLM connected to database to serve any requests, not coding anything. It would even work... till the point someone would convince the model to rewrite the database to their whims! (A second problem is that normal site should be focused on something, in line with famous "if you can explain anything, your knowledge is zero".)

2Mateusz Bagiński5mo

Reading this made me think that the framing "Everything is alignment-constrained, nothing is capabilities-constrained." is a rathering and that a more natural/joint-carving framing is:

2Noosphere899mo

Partially disagree, but only partially. I think the big thing that makes multi-alignment disproportionately hard in a way that isn't the case for the alignment problem of AI being aligned to a single person, is due to the lack of a ground truth, combined with severe enough value conflicts being common enough that alignment is probably conceptually impossible, and the big reason our society stays stable is precisely because people depend on each other for their lives, and one of the long-term effects of AI is to make at least a few people no longer be dependent on others for long, healthy lives, which predicts that our society will increasingly no longer matter to powerful actors that set up their own nations, ala seasteading. More below: https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem#KmqfavwugWe62CzcF Or this quote by me:

[-]tailcalled4y140

If a tree falls in the forest, and two people are around to hear it, does it make a sound?

I feel like typically you'd say yes, it makes a sound. Not two sounds, one for each person, but one sound that both people hear.

But that must mean that a sound is not just auditory experiences, because then there would be two rather than one. Rather it's more like, emissions of acoustic vibrations. But this implies that it also makes a sound when no one is around to hear it.

2Dagon4y

I think this just repeats the original ambiguity of the question, by using the word "sound" in a context where the common meaning (air vibrations perceived by an agent) is only partly applicable. It's still a question of definition, not of understanding what actually happens.

3tailcalled2y

But the way to resolve definitional questions is to come up with definitions that make it easier to find general rules about what happens. This illustrates one way one can do that, by picking edge-cases so they scale nicely with rules that occur in normal cases. (Another example would be 1 as not a prime number.)

2Dagon2y

My recommended way to resolve (aka disambiguate) definitional questions is "use more words". Common understandings can be short, but unusual contexts require more signals to communicate.

1Bert4y

I think we're playing too much with the meaning of "sound" here. The tree causes some vibrations in the air, which leads to two auditory experiences since there are two people

[-]tailcalled5mo12-93

Preregistering predictions:

The world will enter a golden age
The Republican party will soon abandon Trumpism and become much better
The Republican party will soon come with a much more pro-trans policy
The Republican party will double down on opposition to artificial meat, but adopt a pro-animal-welfare attitude too
In the medium term, excess bureaucracy will become a much smaller problem, essentially solved
Spirituality will make a big comeback, with young people talking about karma and God(s) and sin and such
AI will be abandoned due to bad karma
There will be a lot of "retvrn" (to farming, to handmade craftsmanship, etc.)
Medical treatment will improve a lot, but not due to any particular technical innovation
Architecture will become a lot more elaborate and housing will become a lot more communal

No, I'm not going to put probabilities on them, and no, I'm not going to formalize these well enough that they can be easily scored, plus they're not independent so it doesn't make sense to score them independently.

[-]1a3orn5mo1812

Reading this feels like a normie might feel reading Kokotajlo's prediction that energy use might increase 1000x in the next two decades; like, you hope there's a model behind it, but you don't know what it is, and you're feeling pretty damn skeptical in the meantime.

8stavros5mo

What's the crux? Or what's the most significant piece of evidence you could imagine coming across that would update you against these predictions?

5Viliam5mo

Please explain. This part seems even less likely than the golden age of return to farming.

2tailcalled5mo

It's not exactly that AI won't be used, but it will basically just be used as a more flexible interface to text. Any capabilities it develops will be in a "bag of heuristics" sense, and the bag of heuristics will lack behind on more weighty matters because people with a clue decide not to offer more heuristics to it. More flexible interfaces to text are of limited interest.

5faul_sname5mo

Which of the following do you additionally predict? 1. Sleep time will desynchronize from local day/night cycles 2. Investment strategies based on energy return on energy invested (EROEI) will dramatically outperform traditional financial metrics 3. none of raw compue, data, or bandwidth constraints will turn out to be the reason AI has not reached human capability levels 4. Supply chains will deglobalize 5. People will adopt a more heliocentric view

3tailcalled5mo

Sleep time will synchronize more closely to local day/night cycles. No strong opinion. Finance will lose its relevance. Lack of AI consciousness and preference not to use AI will turn out to be the reason AI will never reach human level. Quite likely partially, but probably there will also be a growth in esoteric products, which might actually lead to more international trade on a quantitative level. We are currently in a high-leverage situation where the way the moderate-term future sees our position in the universe is especially sensitive to perturbations. But rationalist-empiricist-reductionists opt out of the ability to influence this, and instead the results of future measurement instruments will depend on what certain non-rationalist-empiricist-reductionists do.

2faul_sname5mo

Telepathy?

3tailcalled5mo

For most practical purposes we already have that. What would you do with telepathy that you can't do with internet text messaging?

2faul_sname5mo

Any protocol can be serialized, so in principle if you had the hardware and software necessary to translate from and to the "neuralese" dialect of the sender and recipient, you could serialize that as text over the wire. But I think the load-bearing part is the ability to read, write, and translate the experiences that are upstream of language. One could expect "everyone can visceral understand the lived experiences of others" to lead to a golden age as you describe, though it doesn't really feel like your world model. But conditioning on it not being something about the flows of energy that come from the sun and the ecological those flows of energy flow through, it's still my guess for generating those predictions (under the assumption that the predictions were generated by "find something I think is true and underappreciated about the world, come up with the wildest implications according to the lesswrong worldview, phrase them narrowly enough to seem crackpottish, don't elaborate")

3tailcalled5mo

Ah. Not quite what you're asking about, but omniscience through higher consciousness is likely under my scenario. Not sure what you mean by "phrase them narrowly enough to seem crackpottish". I would seem much more crackpottish if I gave the underlying logic behind it, unless maybe I bring in a lot of context.

5Mateusz Bagiński5mo

Can you give some reasons why you think all that or some of all that?

3LVSN5mo

so there's like an ultimate thing that your set of predictions is about, and you're holding off on saying what is to be vindicated until some time that you can say "this is exactly/approximately what i was saying would happen"? im not trying to be negative; i can still see utility in that if that's a fair assessment but i want to know why, when you say you called it, this was the thing you wanted to have been called

5Raemon5mo

fwiw I prefer people to write posts like this than-not, on the margin. I think operationalizing things is quite hard, I think the right norm is "well, you get a lot less credit for vague predictions with a lot of degrees of freedom", but, it's still good practice IMO to be in the habit of concretely predicting things.

2quetzal_rainbow5mo

Who's gonna do that? It's not like we have enough young people for rapid cultural evolution.

2tailcalled5mo

"Disappointed" as in disappointed in me for making such predictions or disappointed in the world if the predictions turn out true?

8MondSemmel5mo

At a guess, disappointment at the final paragraph. Without a timeline, specificity, or justification, what's the point of calling this "preregistered predictions"?

1Canaletto4mo

Can you give some time horizon on this? Like, 5 years, 10 years, 20 years?

2tailcalled4mo

Recently I've been starting to think it could go many other ways than my predictions above suggest. So it's probably safer to say that the futurist/rationalist predictions are all wrong than that any particular prediction I can make is right. I'm still mostly optimistic though.

1Canaletto4mo

The main difference being the "NNs fail to work in many ways, no digital human analog for sure, agents stay at the same "plays this one game very well" stage, but a lot of tech progress in other ways"?

[-]tailcalled1y*110

Finally gonna start properly experimenting on stuff. Just writing up what I'm doing to force myself to do something, not claiming this is necessarily particularly important.

Llama (and many other models, but I'm doing experiments on Llama) has a piece of code that looks like this:

h = x + self.attention(self.attention_norm(x), start_pos, freqs_cis, mask)
out = h + self.feed_forward(self.ffn_norm(h))

Here, out is the result of the transformer layer (aka the residual stream), and the vectors self.attention(self.attention_norm(x), start_pos, freqs_cis, mask) and self.feed_forward(self.ffn_norm(h)) are basically where all the computation happens. So basically the transformer proceeds as a series of "writes" to the residual stream using these two vectors.

I took all the residual vectors for some queries to Llama-8b and stacked them into a big matrix M with 4096 columns (the internal hidden dimensionality of the model). Then using SVD, I can express $M = \sum_{i} s_{i} (u_{i} \otimes v_{i})$ , where the $u$ 's and $v$ 's are independent units vectors. This basically decomposes the "writes" into some independent locations in the residual stream (u's), some lat... (read more)

2tailcalled1y

Ok, so I've got the clipping working. First, some uninterpretable diagrams: In the bottom six diagrams, I try taking varying number (x-axis) of right singular vectors (v's) and projecting down the "writes" to the residual stream to the space spanned by those vectors. The obvious criterion to care about is whether the projected network reproduces the outputs of the original network, which here I operationalize based on the log probability the projected network gives to the continuation of the prompt (shown in the "generation probability" diagrams). This appears to be fairly chaotic (and low) in the 1-300ish range, and then stabilizes while still being pretty low in the 300ish-1500ish range, and then finally converges to normal in the 1500ish to 2000ish range, and is ~perfect afterwards. The remaining diagrams show something about how/why we have this pattern. "orig_delta" concerns the magnitude of the attempted writes for a given projection (which is not constant because projecting in earlier layers will change the writes by later layers), and "kept_delta" concerns the remaining magnitude after the discarded dimensions have been projected away. In the low end, "kept_delta" is small (and even "orig_delta" is a bit smaller than it ends up being at the high end), indicating that the network fails to reproduce the probabilities because the projection is so aggressive that it simply suppresses the network too much. Then in the middle range, "orig_delta" and "kept_delta" explodes, indicating that the network has some internal runaway dynamics which normally would be suppressed, but where the suppression system is broken by the projection. Finally, in the high range, we get a sudden improvement in loss, and a sudden drop in residual stream "write" size, indicating that it has managed to suppress this runaway stuff and now it works fine.

2tailcalled1y

An implicit assumption I'm making when I clip off from the end with the smallest singular values is that the importance of a dimension is proportional to its singular values. This seemed intuitively sensible to me ("bigger = more important"), but I thought I should test it, so I tried clipping off only one dimension at a time, and plotting how that affected the probabilities: Clearly there is a correlation, but also clearly there's some deviations from that correlation. Not sure whether I should try to exploit these deviations in order to do further dimension reduction. It's tempting, but it also feels like it starts entering sketchy territories, e.g. overfitting and arbitrary basis picking. Probably gonna do it just to check what happens, but am on the lookout for something more principled.

2tailcalled1y

Back to clipping away an entire range, rather than a single dimension. Here's ordering it by the importance computed by clipping away a single dimension: Less chaotic maybe, but also much slower at reaching a reasonable performance, so I tried a compromise ordering that takes both size and performance into account: Doesn't seem like it works super great tbh. Edit: for completeness' sake, here's the initial graph with log-surprise-based plotting.

2tailcalled1y

To quickly find the subspace that the model is using, I can use a binary search to find the number of singular vectors needed before the probability when clipping exceeds the probability when not clipping. A relevant followup is what happens to other samples in response to the prompt when clipping. When I extrapolate "I believe the meaning of life is" using the 1886-dimensional subspace from , I get: Which seems sort of vaguely related, but idk. Another test is just generating without any prompt, in which case these vectors give me: Using a different prompt: I can get a 3329-dimensional subspace which generates: or Another example: can yield 2696 dimensions with or And finally, can yield the 2518-dimensional subspace: or

2tailcalled1y

Given the large number of dimensions that are kept in each case, there must be considerable overlap in which dimensions they make use of. But how much? I concatenated the dimensions found in each of the prompts, and performed an SVD of it. It yielded this plot: ... unfortunately this seems close to the worst-case scenario. I had hoped for some split between general and task-specific dimensions, yet this seems like an extremely uniform mixture.

2tailcalled1y

If I look at the pairwise overlap between the dimensions needed for each generation: ... then this is predictable down to ~1% error simply by assuming that they pick a random subset of the dimensions for each, so their overlap is proportional to each of their individual sizes.

2tailcalled1y

Oops, my code had a bug so only self.attention(self.attention_norm(x), start_pos, freqs_cis, mask) and not self.feed_forward(self.ffn_norm(h)) was in the SVD. So the diagram isn't 100% accurate.

[-]tailcalled1y90

Thesis: while consciousness isn't literally epiphenomenal, it is approximately epiphenomenal. One way to think of this is that your output bandwidth is much lower than your input bandwidth. Another way to think of this is the prevalence of akrasia, where your conscious mind actually doesn't have full control over your behavior. On a practical level, the ecological reason for this is that it's easier to build a general mind and then use whatever parts of the mind that are useful than to narrow down the mind to only work with a small slice of possibilities. This is quite analogous to how we probably use LLMs for a much narrower set of tasks than what they were trained for.

2Seth Herd1y

Consciousness is not at all epiphenomenal, it's just not the whole mind and not doing everything. We don't have full control over our behavior, but we have a lot. While the output bandwidth is low, it can be applied to the most important things.

2tailcalled1y

Maybe a point that was missing from my thesis is that one can have a higher-level psychological theory in terms of life-drives and death-drives which then addresses the important phenomenal activities but doesn't model everything. And then if one asks for an explanation of the unmodelled part, the answer will have to be consciousness. But then because the important phenomenal part is already modelled by the higher-level theory, the relevant theory of consciousness is ~epiphenomenal.

2Seth Herd1y

I guess I have no idea what you mean by "consciousness" in this context. I expect consciousness to be fully explained and still real. Ah, consciousness. I'm going to mostly save the topic for if we survive AGI and have plenty of spare time to clarify our terminology and work through all of the many meanings of the word. Edit - or of course if something else was meant by consciousness, I expect a full explanation to indicate that thing isn't real at all. I'm an eliminativist or a realist depending on exactly what is meant. People seem to be all over the place on what they mean by the word.

2tailcalled1y

A thermodynamic analogy might help: Reductionists like to describe all motion in terms of low-level physical dynamics, but that is extremely computationally intractable and arguably also misleading because it obscures entropy. Physicists avoid reductionism by instead factoring their models into macroscopic kinetics and microscopic thermodynamics. Reductionistically, heat is just microscopic motion, but microscopic motion that adds up to macroscopic motion has already been factored out into the macroscopic kinetics, so what remains is microscopic motion that doesn't act like macroscopic motion, either because it is ~epiphenomenal (heat in thermal equilibrium) or because it acts very different from macroscopic motion (heat diffusion). Similarly, reductionists like to describe all psychology in terms of low-level Bayesian decision theory, but that is extremely computationally intractable and arguably also misleading because it obscures entropy. You can avoid reductionism by instead factoring models into some sort of macroscopic psychology-ecology boundary and microscopic neuroses. Luckily Bayesian decision theory is pretty self-similar, so often the macroscopic psychology-ecology boundary fits pretty well with a coarse-grained Bayesian decision theory. Now, similar to how most of the kinetic energy in a system in motion is usually in the microscopic thermal motion rather than in the macroscopic motion, most of the mental activity is usually with the microscopic neuroses instead of the macroscopic psychology-ecology. Thus, whenever you think "consciousness", "self-awareness", "personality", "ideology", or any other broad and general psychological term, it's probably mostly about the microscopic neuroses. Meanwhile, similar to how tons of physical systems are very robust to wide ranges of temperatures, tons of psychology-ecologies are very robust to wide ranges of neuroses. As for what "consciousness" really means, idk, currently I'm thinking it's tightly intertwin

[-]tailcalled1y90

Thesis: There's three distinct coherent notions of "soul": sideways, upwards and downwards.

By "sideways souls", I basically mean what materialists would translate the notion of a soul to: the brain, or its structure, so something like that. By "upwards souls", I mean attempts to remove arbitrary/contingent factors from the sideways souls, for instance by equating the soul with one's genes or utility function. These are different in the particulars, but they seem conceptually similar and mainly differ in how they attempt to cut the question of identity (ide... (read more)

2Dagon1y

I'm having trouble following whether this categories the definition/concept of a soul, or the causality and content of this conception of soul. Is "sideways soul" about structure and material implementation, or about weights and connectivity, independent of substrate? WHICH factors are removed from upwards ("genes" and "utility function" are VERY different dimensions, both tiny parts of what I expect create (for genes) or comprise (for utility function) a soul. What about memory? multiple levels of value and preferences (including meta-preferences in how to abstract into "values")? Putting "downwards" supernatural ideas into the same framework as more logical/materialist ideas confuses me - I can't tell if that makes it a more useful model or less.

4tailcalled1y

When you get into the particulars, there are multiple feasible notions of sideways soul, of which material implementation vs weights and connectivity are the main ones. I'm most sympathetic to weights and connectivity. I have thought less about and seen less discussion about upwards souls. I just mentioned it because I'd seen a brief reference to it once, but I don't know anything in-depth. I agree that both genes and utility function seem incomplete for humans, though for utility maximizers in general I think there is some merit to the soul == utility function view. Memory would usually go in sideways soul, I think. idk Sideways vs upwards vs downwards is more meant to be a contrast between three qualitatively distinct classes of frameworks than it is meant to be a shared framework.

2Seth Herd1y

Excellent! I like the move of calling this "soul" with no reference to metaphysical souls. This is highly relevant to discussions of "free will" if the real topic is self-determination - which it usually is.

2tailcalled1y

"Downwards souls are similar to the supernatural notion of souls" is an explicit reference to metaphysical souls, no?

2Seth Herd1y

um, it claims to be :) I don't think that's got much relationship to the common supernatural notion of souls. But I read it yesterday and forgot that you'd made that reference.

2tailcalled1y

What special characteristics do you associate with the common supernatural notion of souls which differs from what I described?

2Nathan Helm-Burger1y

The word 'soul' is so tied in my mind to implausible metaphysical mythologies that I'd parse this better if the word were switched for something like 'quintessence' or 'essential self' or 'distinguishing uniqueness'.

2tailcalled1y

What implausible metaphysical mythologies is it tied up with? As mentioned in my comment, downwards souls seem to satisfy multiple characteristics we'd associate with mythological souls, so this and other things makes me wonder if the metaphysical mythologies might actually be more plausible than you realize.

[-]tailcalled1y*80

Thesis: in addition to probabilities, forecasts should include entropies (how many different conditions are included in the forecast) and temperatures (how intense is the outcome addressed by the marginal constraint in this forecast, i.e. the big-if-true factor).

I say "in addition to" rather than "instead of" because you can't compute probabilities just from these two numbers. If we assume a Gibbs distribution, there's the free parameter of energy: ln(P) = S - E/T. But I'm not sure whether this energy parameter has any sensible meaning with more general ev... (read more)

[-]tailcalled1y84

Thesis: whether or not tradition contains some moral insights, commonly-told biblical stories tend to be too sparse to be informative. For instance, there's no plot-relevant reason why it should be bad for Adam and Eve to have knowledge of good and evil. Maybe there's some interpretation of good and evil where it makes sense, but it seems like then that interpretation should have been embedded more properly in the story.

5UnderTruth1y

It is worth noting that, in the religious tradition from which the story originates, it is Moses who commits these previously-oral stories to writing, and does so in the context of a continued oral tradition which is intended to exist in parallel with the writings. On their own, the writings are not meant to be complete, both in order to limit more advanced teachings to those deemed ready for them, as well as to provide occasion to seek out the deeper meanings, for those with the right sort of character to do so.

4tailcalled1y

This makes sense. The context I'm thinking of is my own life, where I come from a secular society with atheist parents, and merely had brief introductions to the stories from bible reading with parents and Christian education in school. (Denmark is a weird society - few people are actually Christian or religious, so it's basically secular, but legally speaking we are Christian and do not have separation between Church and state, so there are random fragments of Christianity we run into.)

2lemonhope1y

What? Nobody told me. Where did you learn this

6Garrett Baker1y

This is the justification behind the talmud

[-]tailcalled1mo71

I've switched from considering uploading to be obviously possible at sufficient technological advancement to considering it probably intractable. More specifically, I expect the mind to be importantly shaped by a lot of rarely-activating mechanisms, which are intractable to map out. You could probably eventually make a sort of "zombie upload" that ignores those mechanisms, but it would be unable to update to new extreme conditions.

[-]Mo Putera1mo110

What changed your mind? Any rabbit holes in particular I can go down?

2tailcalled1mo

Idk, the shift happened a while ago. Maybe mostly just reflecting on how evolution acts on a holistic scale, making it easy to incorporate "gradients" from events that occur only one or a few times in one's lifetime, if these events have enough effect on survival/reproduction. Part of a bigger change in priors towards the relevance of long tails associated with my LDSL sequence.

2faul_sname1mo

Yeah, until recently I thought the same thing, based on my belief that distilling a teacher model which has been trained by RL into a student model preserved not just distributions over outputs but also mostly preserved the mechanisms behind those outputs. Which as far as I can tell was an incorrect belief.

2quetzal_rainbow1mo

Counterpoint: brain is extremely energy-demanding and every part of it that is not working often gets brutally selected out.

2tailcalled1mo

Once you focus on "parts" of the brain, you're restricting consideration to mechanisms that are activated at sufficient scale to need to balloon up. I would expect the rarely-activating mechanisms to be much smaller in a physical sense than "parts" of the brain are

[-]tailcalled1y70

Thesis: one of the biggest alignment obstacles is that we often think of the utility function as being basically-local, e.g. that each region has a goodness score and we're summing the goodness over all the regions. This basically-guarantees that there is an optimal pattern for a local region, and thus that the global optimum is just a tiling of that local optimal pattern.

Even if one adds a preference for variation, this likely just means that a distribution of patterns is optimal, and the global optimum will be a tiling of samples from said distribution.

T... (read more)

[-]tailcalled8mo5-1

Are we missing a notion of "simulacrum level 0"? That is, in order to accurately describe the truth, we need some method of synchronizing on a common language. In the beginning of a human society, this can be basic stuff like pointing at objects and making sounds in order to establish new words. But also, I would be inclined to say that more abstract stuff like discussing the purpose for using the words or planning truth-determination-procedures also go in simulacrum level 0. I'd say the entire discussion of simulacrum levels goes within simulacrum level 0... (read more)

[-]tailcalled1y50

Current agent models like argmax entirely lack any notion of "energy". Not only does this seem kind of silly on its own, I think it also leads to missing important dynamics related to temperature.

[-]tailcalled4y50

I think I've got it, the fix to the problem in my corrigibility thing!

So to recap: It seems to me that for the stop button problem, we want humans to control whether the AI stops or runs freely, which is a causal notion, and so we should use counterfactuals in our utility function to describe it. (Dunno why most people don't do this.) That is, if we say that the AI's utility should depend on the counterfactuals related to human behavior, then it will want to observe humans to get input on what to do, rather than manipulate them, because this is the only wa... (read more)

2tailcalled4y

It also might be vulnerable to some variant of the critiques that were first raised against it, because now the conditional introduces a link between its policy and the scenarios it faces, but I can't immediately construct a case where it happens, because the conditional would still somewhat tend to sabotage the obvious approaches. This sort of unclarity is kind of concerning when it comes to the idea.

2tailcalled4y

Like suppose the AI immediately very publically does something that looks very unsafe. Say grabs control over the stop button and starts mass-producing paperclips in an extremely publically visible way. This would probably lead to people wanting to stop it. So therefore, if it has a policy like that, the |S conditional would lead to people quickly wanting to stop it. This means that in the |S branch, it can quickly determine whether it is in the f|S branch or the s|S branch; in the f|S case, it can then keep going with whatever optimization V specified, while in the s|S case, it can then immediately shut down itself. But the reason I think the AI *wouldn't* do this is, what about the |F branch? If you condition on humans not wanting to press the stop button even though there's a clearly unaligned AI, what sort of situation could produce this? I have trouble imagining it, because it seems like it would need to be pretty extreme. The best ideas I can come up with is stuff like "black hole swallows the earth", but this would rank pretty low in the AI's utility function, and therefore it would avoid acting this way in order to have a reasonable |F branch. But this does not seem like sane reasoning on the AI's side to me, so it seems like this should be fixed. And of course, fixed in a principled rather than unprincipled way.

[-]tailcalled1y4-9

I was surprised to see this on twitter:

I mean, I'm pretty sure I knew what caused it (this thread or this market), and I guess I knew from Zack's stuff that rationalist cultism had gotten pretty far, but I still hadn't expected that something this small would lead to being blocked.

[-]Eli Tyre1y*123

FYI: I have a low bar for blocking people who have according-to-me bad, overconfident, takes about probability theory, in particular. For whatever reason, I find people making claims about that topic, in particular, really frustrating. ¯\_(ツ)_/¯

The block isn't meant as a punishment, just a "I get to curate my online experience however I want."

2tailcalled1y

I think blocks are pretty irrelevant unless one conditions on the particular details of the situation. In this case I think the messages I were sharing are very important. If you think my messages are instead unimportant or outright wrong, then I understand why you would find the block less interesting, but in that case I don't think we can meaningfully discuss it without knowing why you disagree with the messages.

[-]Eli Tyre1y*111

I'm not particularly interested in discussing it in depth. I'm more like giving you a data-point in favor of not taking the block personally, or particularly reading into it.

(But yeah, "I think these messages are very important", is likely to trigger my personal "bad, overconfident takes about proabrbility theory" neurosis.)

[-]Rana Dexsin1y1111

This is awkwardly armchair, but… my impression of Eliezer includes him being just so tired, both specifically from having sacrificed his present energy in the past while pushing to rectify the path of AI development (by his own model thereof, of course!) and maybe for broader zeitgeist reasons that are hard for me to describe. As a result, I expect him to have entered into the natural pattern of having a very low threshold for handing out blocks on Twitter, both because he's beset by a large amount of sneering and crankage in his particular position and because the platform easily becomes a sinkhole in cognitive/experiential ways that are hard for me to describe but are greatly intertwined with the aforementioned zeitgeist tiredness.

Something like: when people run heavily out of certain kinds of slack for dealing with The Other, they reach a kind of contextual-but-bleed-prone scarcity-based closed-mindedness of necessity, something that both looks and can become “cultish” but where reaching for that adjective first is misleading about the structure around it. I haven't succeeded in extracting a more legible model of this, and I bet my perception is still skew to the reality, but I'... (read more)

[-]Elizabeth1y*116

I disagree with the sibling thread about this kind of post being “low cost”, BTW; I think adding salience to “who blocked whom” types of considerations can be subtly very costly.

I agree publicizing blocks has costs, but so does a strong advocate of something with a pattern of blocking critics. People publicly announcing "Bob blocked me" is often the only way to find out if Bob has such a pattern.

I do think it was ridiculous to call this cultish. Tuning out critics can be evidence of several kinds of problems, but not particularly that one.

5tailcalled1y

I agree that it is ridiculous to call this cultish if this was the only evidence, but we've got other lines of evidence pointing towards cultishness, so I'm making a claim of attribution more so than a claim of evidence.

3M. Y. Zuo1y

Blocking a lot isn’t necessarily bad or unproductive… but in this case it’s practically certain blocking thousands will eventually lead to blocking someone genuinely more correct/competent/intelligent/experienced/etc… than himself, due to sheer probability. (Since even a ‘sneering’ crank is far from literal random noise.) Which wouldn’t matter at all for someone just messing around for fun, who can just treat X as a text-heavy entertainment system. But it does matter somewhat for anyone trying to do something meaningful and/or accomplish certain goals. In short, blocking does have some, variable, credibility cost. Ranging from near zero to quite a lot, depending on who the blockee is.

2tailcalled1y

Eliezer Yudkowsky being tired isn't an unrelated accident though. Bayesian decision theory in general intrinsically causes fatigue by relying on people to use their own actions to move outcomes instead of getting leverage from destiny/higher powers, which matches what you say about him having sacrificed his present energy for this. Similarly, "being Twitterized" is just about stewing in garbage and cursed information, such that one is forced to filter extremely aggressively, but blocking high-quality information sources accelerates the Twitterization by changing the ratio of blessed to garbage/cursed information. On the contrary, I think raising salience of such discussions helps clear up the "informational food chain", allowing us to map out where there are underused opportunities and toxic accumulation.

6Richard_Kennaway1y

It seems likely to me that Eliezer blocked you because he has concluded that you are a low-quality information source, no longer worth the effort of engaging with.

4tailcalled1y

I agree that this is likely Eliezer's mental state. I think this belief is false, but for someone who thinks it's true, there's of course no problem here.

6Richard_Kennaway1y

Please say more about this. Where can I get some?

6tailcalled1y

Working on writing stuff but it's not developed enough yet. To begin with you can read my Linear Diffusion of Sparse Lognormals sequence, but it's not really oriented towards practical applications.

2Richard_Kennaway1y

I will look forward to that. I have read the LDSL posts, but I cannot say that I understand them, or guess what the connection might be with destiny and higher powers.

2tailcalled1y

One of the big open questions that the LDSL sequence hasn't addressed yet is, what starts all the lognormals and why are they so commensurate with each other. So far, the best answer I've been able to come up with is a thermodynamic approach (hence my various recent comments about thermodynamics). The lognormals all originate as emanations from the sun, which is obviously a higher power. They then split up and recombine in various complicated ways. As for destiny: The sun throws in a lot of free energy, which can be developed in various ways, increasing entropy along the way. But some developments don't work very well, e.g. self-sabotaging (fire), degenerating (parasitism leading to capabilities becoming vestigial), or otherwise getting "stuck". But it's not all developments that get stuck, some developments lead to continuous progress (sunlight -> cells -> eukaryotes -> animals -> mammals -> humans -> society -> capitalism -> ?). This continuous progress is not just accidental, but rather an intrinsic part of the possibility landscape. For instance, eyes have evolved in parallel to very similar structures, and even modern cameras have a lot in common with eyes. There's basically some developments that intrinsically unblock lots of derived developments while preferentially unblocking developments that defend themselves over developments that sabotage themselves. Thus as entropy increases, such developments will intrinsically be favored by the universe. That's destiny. Critically, getting people to change many small behaviors in accordance with long explanations contradicts destiny because it is all about homogenizing things and adding additional constraints whereas destiny is all about differentiating things and releasing constraints.

7quetzal_rainbow1y

Meta-point: your communication pattern fits with following pattern: The reason why smart people find themselves in this pattern is because they expect short inferential distances, i.e., they see their argumentation not like vague esoteric crackpottery, but like a set of very clear statements and fail to put themselves in shoes of people who are going to read this, and they especially fail to account for fact that readers already distrust them because they started conversation with <controversial statement>. On object level, as stated, you are wrong. Observing heuristic failing should decrease your confidence ih heuristic. You can argue that your update should be small, due to, say, measurement errors or strong priors, but direction of update should be strictly down.

2tailcalled1y

Can you fill in a particular example of me engaging in that pattern so we can address it in the concrete rather than in the abstract?

7quetzal_rainbow1y

To be clear, I mean "your communication in this particular thread". Pattern: <controversial statement> <this statement is false> <controversial statement> <this statement is false> <mix of "this is trivially true because" and "here is my blogpost with esoteric terminology"> The following responses from EY are more in genre "I ain't reading this", because he is more using you as example for other readers than talking directly to you, with following block.

3tailcalled1y

This statement had two parts. Part 1: And part 2: Part 2 is what Eliezer said was false, but it's not really central to my point (hence why I didn't write much about it in the original thread), and so it is self-sabotaging of Eliezer to zoom into this rather than the actually informative point.

5habryka1y

I do think if that thread got you blocked then that's sad (my guess is I think you were more right than Eliezer, though I haven't read the full sequence that you linked to). I do think Twitter blocks don't mean very much. I think it's approximately zero evidence of "cultism" or whatever. Most people with many followers on Twitter seem to need to have a hair trigger for blocking, or at least feel like they need to, in order to not constantly have terrible experiences.

[-]Noosphere891y103

This is a very useful point:

Most people with many followers on Twitter seem to need to have a hair trigger for blocking, or at least feel like they need to, in order to not constantly have terrible experiences.

I think that this is a point that people not on social media that much don't get: You need to be very quick to block because otherwise you will not have good experiences on the site otherwise.

2Viliam1y

I think our instincts may be misleading here, because internet works differently from real life. In real life, not interacting with someone is the default. Unless you have some kind of relationship with someone, people have no obligation to call you or meet you. And if I call someone on the phone just to say "dude, I disagree with your theory", I would expect that person to hang up... and maybe say "sorry, I'm busy" before hanging up, if they are extra polite. The interactions are mutually agreed, and you have no right to complain when the other party decides to not give you the time. (And if you keep insisting... that's what the restraining orders are for.) On internet, once you sign up to e.g. Twitter, the default is that anyone can talk to you, and if you are not interested in reading the texts they send you, you need to block them. As far as I know, there are no options in the middle between "block" and "don't block". (Nothing like "only let them talk to me when it is important" or "only let them talk to me on Tuesdays between 3 PM and 5 PM".) And if you are a famous person, I guess you need to keep blocking left and right, otherwise you would drown in the text -- presumably you don't want to spend 24 hours a day sifting through Twitter messages, and you want to get the ones you actively want, which requires you to aggressively filter out everything else. So getting blocked is not an equivalent of getting a restraining order, but more like an equivalent of the other person no longer paying attention to you. Which most people would not interpret as evidence of cultism.

2Noosphere891y

This is the key to understanding why I think it's more okay to block than a lot of other people think, and the fact that the default is anyone can talk to you means you get way too much crap without blocking lots of people.

5tailcalled1y

I think whether it's cultism depends on what model one has of how cults work. I don't know much about it so I might be totally ignorant, but I think a major factor is just engaging in a futile, draining activity powered by popularity, so one needs to carefully preserve resources and maintain appearances.

4habryka1y

Huh, I guess you mean cult in a broader "polarization" sense? Like, where are the democratic and republican parties on the cultishness scale in your model?

3tailcalled1y

Idk, my main point of reference is I recently read is Some Desperate Glory, which was about a cult of terrorists. Polarization generally implies a balanced conflict which isn't really futile. I don't know much about how they work internally. Democracy is a weird systen because you've got the adversarial thing that would make it less futile but also the popularity contest thing that would make it more narcissistic and thus more cultish.

4Shankar Sivarajan1y

This explanation sounds like what they'd say. I think the real reason this is common is more a status thing: it's a pretty standard strategy for people to try to gain status by "dunking" on tweets by more famous people, and blocking them is the standard countermeasure.

2tailcalled1y

The dunking seems like constant terrible experiences.

2Richard_Kennaway1y

The more prominent you are, the more people want to talk with you, and the less time you have to talk with them. You have to shut them out the moment the cost is no longer worth paying.

5MondSemmel1y

People should feel free to liberally block one another on social media. Being blocked is not enough to warrant an accusation of cultism.

8tailcalled1y

I did not say that simply blocking me warrants an accusation of cultism. I highlighted the fact that I had been blocked and the context in which it occurred, and then brought up other angles which evidenced cultism. If you think my views are pathetic and aren't the least bit alarmed by them being blocked, then feel free to feel that way, but I suspect there are at least some people here who'd like to keep track of how the rationalist isolation is progressing and who see merit in my positions.

1MondSemmel1y

Again, people block one another on social media for any number of reasons. That just doesn't warrant feeling alarmed or like your views are pathetic.

4tailcalled1y

We know what the root cause is, you don't have to act like it's totally mysterious. So the question is, was this root cause (pushback against Eliezer's Bayesianism): * An important insight that Eliezer was missing (alarming!) * Worthless pedantry that he might as well block (nbd/pathetic) * Antisocial trolling that ought to be gotten rid of (reassuring that he blocked) * ... or something else Regardless of which of these is the true one, it seems informative to highlight for anyone who is keeping track of what is happening around me. And if the first one is the true one, it seems like people who are keeping track of what is happening around Eliezer would also want to know it. Especially since it only takes a very brief moment to post and link about getting blocked. Low cost action, potentially high reward.

[-]habryka1y109

MIRI full-time employed many critics of bayesianism for 5+ years and MIRI researchers themselves argued most of the points you made in these arguments. It is obviously not the case that critiquing bayesianism is the reason why you got blocked.

5tailcalled1y

Idk, maybe you've got a point, but Eliezer was very quick to insist what I said was not the mainstream view and disengage. And MIRI was full of internal distrust. I don't know enough of the situation to know if this explains it, but it seems plausible to me that the way MIRI kept stuff together was by insisting on a Bayesian approach, and that some generators of internal dissent was from people whose intuition aligned more with non-Bayesian approach. For that matter, an important split in rationalism is MIRI/CFAR vs the Vassarites, and while I wouldn't really say the Vassarites formed a major inspiration for LDSL, after coming up with LDSL I've totally reevaluated my interpretation of that conflict as being about MIRI/CFAR using a Bayesian approach and the Vassarites using an LDSL approach. (Not absolutely of course, everyone has a mixture of both, but in terms of relative differences.)

[-]tailcalled1y40

I've been thinking about how the way to talk about how a neural network works (instead of how it could hypothetically come to work by adding new features) would be to project away components of its activations/weights, but I got stuck because of the issue where you can add new components by subtracting off large irrelevant components.

I've also been thinking about deception and its relationship to "natural abstractions", and in that case it seems to me that our primary hope would be that the concepts we care about are represented at a large "magnitude" than... (read more)

4Thomas Kwa1y

Much dumber ideas have turned into excellent papers

2tailcalled1y

True, though I think the Hessian is problematic enough that that I'd either want to wait until I have something better, or want to use a simpler method. It might be worth going into more detail about that. The Hessian for the probability of a neural network output is mostly determined by the Jacobian of the network. But in some cases the Jacobian gives us exactly the opposite of what we want. If we consider the toy model of a neural network with no input neurons and only 1 output neuron g(w)=∏iwi (which I imagine to represent a path through the network, i.e. a bunch of weights get multiplied along the layers to the end), then the Jacobian is the gradient (Jg(w))j=(∇g(w))j=∏i≠jwi=∏iwiwj. If we ignore the overall magnitude of this vector and just consider how the contribution that it assigns to each weight varies over the weights, then we get (Jg(w))j∝1wj. Yet for this toy model, "obviously" the contribution of weight j "should" be proportional to wj. So derivative-based methods seem to give the absolutely worst-possible answer in this case, which makes me pessimistic about their ability to meaningfully separate the actual mechanisms of the network (again they may very well work for other things, such as finding ways of changing the network "on the margin" to be nicer).

[-]tailcalled4y40

One thing that seems really important for agency is perception. And one thing that seems really important for perception is representation learning. Where representation learning involves taking a complex universe (or perhaps rather, complex sense-data) and choosing features of that universe that are useful for modelling things.

When the features are linearly related to the observations/state of the universe, I feel like I have a really good grasp of how to think about this. But most of the time, the features will be nonlinearly related; e.g. in order to do... (read more)

[-]tailcalled1y3-1

Thesis: money = negative entropy, wealth = heat/bound energy, prices = coldness/inverse temperature, Baumol effect = heat diffusion, arbitrage opportunity = free energy.

2tailcalled1y

Maybe this mainly works because the economy is intelligence-constrained (since intelligence works by pulling off negentropy from free energy), and it will break down shortly after human-level AGI?

[-]tailcalled1y3-1

Thesis: there's a condition/trauma that arises from having spent a lot of time in an environment where there's excess resources for no reasons, which can lead to several outcomes:

Inertial drifting in the direction implied by ones' prior adaptations,
Conformity/adaptation to social popularity contests based on the urges above,
Getting lost in meta-level preparations,
Acting as a stickler for the authorities,
"Bite the hand that feeds you",
Tracking the resource/motivation flows present.

By contrast, if resources are contingent on a particular reason, everything takes shape according to said reason, and so one cannot make a general characterization of the outcomes.

1Mateusz Bagiński1y

It's not clear to me how this results from "excess resources for no reasons". I guess the "for no reasons" part is crucial here?

[-]tailcalled1y3-5

Thesis: the median entity in any large group never matters and therefore the median voter doesn't matter and therefore the median voter theorem proves that democracies get obsessed about stuff that doesn't matter.

2Dagon1y

A lot depends on your definition of "matter". Interesting and important debates are always on margins of disagreement. The median member likely has a TON of important beliefs and activities that are uncontroversial and ignored for most things. Those things matter, and they matter more than 95% of what gets debated and focused on. The question isn't whether the entities matter, but whether the highlighted, debated topics matter.

[-]tailcalled4y30

I recently wrote a post about myopia, and one thing I found difficult when writing the post was in really justifying its usefulness. So eventually I mostly gave up, leaving just the point that it can be used for some general analysis (which I still think is true), but without doing any optimality proofs.

But now I've been thinking about it further, and I think I've realized - don't we lack formal proofs of the usefulness of myopia in general? Myopia seems to mostly be justified by the observation that we're already being myopic in some ways, e.g. when train... (read more)

2Charlie Steiner4y

Yeah, I think usually when people are interested in myopia, it's because they think there's some desired solution to the problem that is myopic / local, and they want to try to force the algorithm to find that solution rather than some other one. E.g. answering a question based only on some function of its contents, rather than based on the long-term impact of different answers. I think that once you postulate such a desired myopic solution and its non-myopic competitors, then you can easily prove that myopia helps. But this still leaves the question of how we know this problems statement is true - if there's a simpler myopic solution that's bad, then myopia won't help (so how can we predict if this is true?) and if there's a simpler non-myopic solution that's good, myopia may actively hurt (this one seems a little easier to predict though).

[-]tailcalled8mo20

Framing: Prices reflect how much trouble purchasers would be in if the seller didn't exist. GDP multiplies prices by transaction volume, so it measures the fragility of the economy.

4tailcalled8mo

Prices decompose into cost and profit. The profit is determined by how much trouble the purchaser would be in if the seller didn't exist (since e.g. if there's other sellers, the purchaser could buy from those). The cost is determined by how much demand there is for the underlying resources in other areas, so it basically is how much trouble the purchaser imposes on others by getting the item. Most products are either cost-constrained (where price is mostly cost) or high-margin (where price is mostly profit). GDP is price times transaction volume, so it's the sum of total costs and total profits in a society. The profit portion of GDP reflects the extent to which the economy has monopolized activities into central nodes that contribute to fragility, while the cost portion of GDP reflects the extent to which the economy is resource-constrained. The biggest costs in a modern economy is typically labor and land, and land is typically just a labor cost by proxy (land in the middle of nowhere is way cheaper, but it's harder to hire people). The majority of the economy is cost-constrained, so for that majority, GDP reflects underpopulation. The tech sector and financial investment sector have high profit margins, which reflects their tendency to monopolize management of resources. Low GDP reflects slack. Because of diminishing marginal returns and queuing considerations, ideally one should have some slack, since then there's abundance of resources and easy competition, driving prices down and thus leading to low GDP at high quality of life. However, slack also leads to conflict because of reduced opportunity cost. This conflict can be reduced with policing, but that increases authoritarianism. This leads to a tradeoff between high GDP and high tension (as seen in the west) vs low GDP and high authoritarianism (as seen in the east) vs low GDP and high conflict (as seen in the south).

2tailcalled8mo

Exports and imports are tricky but very important to take into account here because they have two important properties: * They are "subtracted off" the GDP numbers in my explanation above (e.g. if you import a natural resource, then that would be considered part of the GDP of the other country, not your country) * They determine the currency exchange rates (since the exchange rate must equal the ratio of imports to exports, assuming savings and bonds are negligible or otherwise appropriately accounted for) and thereby the GDP comparisons across different countries at any given time

2Viliam8mo

If there are 10 sellers selling the same thing for the same price, I wouldn't be in any trouble if one of them stopped existing.

2tailcalled8mo

And they wouldn't be getting any profit. (In the updated comment, I noted it's only the profit that measures your trouble.)

2tailcalled8mo

Hmm... Issue is it also depends on centralization. For a bunch of independent transactions, fragility goes up with the square root of the count rather than the raw count. In practice large economies are very much not independent, but the "troubles" might be.

[-]tailcalled1y2-3

Thesis: a general-purpose interpretability method for utility-maximizing adversarial search is a sufficient and feasible solution to the alignment problem. Simple games like chess have sufficient features/complexity to work as a toy model for developing this, as long as you don't rely overly much on preexisting human interpretations for the game, but instead build the interpretability from the ground-up.

[-]tailcalled1y20

The universe has many conserved and approximately-conserved quantities, yet among them energy feels "special" to me. Some speculations why:

The sun bombards the earth with a steady stream of free energy, which leaves out into the night.
Time-evolution is determined by a 90-degree rotation of energy (Schrodinger equation/Hamiltonian mechanics).
Breaking a system down into smaller components primarily requires energy.
While aspects of thermodynamics could apply to many conserved quantities, we usually apply it to energy only, and it was first discovered in the c

... (read more)

8jacob_drori1y

Sure, there are plenty of quantities that are globally conserved at the fundamental (QFT) level. But most most of.these quantities aren't transferred between objects at the everyday, macro level we humans are used to. E.g. 1: most everyday objects have neutral electrical charge (because there exist positive and negative charges, which tend to attract and roughly cancel out) so conservation of charge isn't very useful in day-to-day life. E.g. 2: conservation of color charge doesn't really say anything useful about everyday processes, since it's only changed by subatomic processes (this is again basically due to the screening effect of particles with negative color charge, though the story here is much more subtle, since the main screening effect is due to virtual particles rather than real ones). The only other fundamental conserved quantity I can think of that is nontrivially exchanged between objects at the macro level is momentum. And... momentum seems roughly as important as energy? I guess there is a question about why energy, rather than momentum, appears in thermodynamics. If you're interested, I can answer in a separate comment.

2tailcalled1y

At a human level, the counts for each type of atom is basically always conserved too, so it's not just a question of why not momentum but also a question of why not moles of hydrogen, moles of carbon, moles of oxygen, moles of nitrogen, moles of silicon, moles of iron, etc.. I guess for momentum in particular, it seems reasonable why it wouldn't be useful in a thermodynamics-style model because things would woosh away too much (unless you're dealing with some sort of flow? Idk). A formalization or refutation of this intuition would be somewhat neat, but I would actually more wonder, could one replace the energy-first formulations of quantum mechanics with momentum-first formulations?

1jacob_drori1y

> could one replace the energy-first formulations of quantum mechanics with momentum-first formulations? Momentum is to space what energy is to time. Precisely, energy generates (in the Lie group sense) time-translations, whereas momentum generates spatial translations. So any question about ways in which energy and momentum differ is really a question about how time and space differ. In ordinary quantum mechanics, time and space are treated very differently: t is a coordinate whereas x is a dynamical variable (which happens to be operator-valued). The equations of QM tell us how x evolves as a function of t. But ordinary QM was long-ago replaced by quantum field theory, in which time and space are on a much more even footing: they are both coordinates, and the equations of QFT tell us how a third thing (the field ϕ(x,t)) evolves as a function of x and t. Now, the only difference between time and space is that there is only one dimension of the former but three of the latter (there may be some other very subtle differences I'm glossing over here, but I wouldn't be surprised if they ultimately stem from this one). All of this is to say: our best theory of how nature works (QFT), is neither formulated as "energy-first" nor as "momentum-first". Instead, energy and momentum are on fairly equal footing.

2tailcalled1y

I suppose that's true, but this kind of confirms my intuition that there's something funky going on here that isn't accounted for by rationalist-empiricist-reductionism. Like why are time translations so much more important for our general work than space translations? I guess because the sun bombards the earth with a steady stream of free energy, and earth has life which continuously uses this sunlight to stay out of equillbrium. In a lifeless solar system, time-translations just let everything spin, which isn't that different from space-translations.

1ProgramCrafter9mo

I'd imagine that happens because we are able to coordinate our work across time (essentially, execute some actions), while work coordination across space-separated instances is much harder (now, it is part of IT's domain under name of "scalability").

1jacob_drori1y

Ah, so I think you're saying "You've explained to me the precise reason why energy and momentum (i.e. time and space) are different at the fundamental level, but why does this lead to the differences we observe between energy and momentum (time and space) at the macro-level? This is a great question, and as with any question of the form "why does this property emerge from these basic rules", there's unlikely to be a short answer. E.g. if you said "given our understanding of the standard model, explain how a cell works", I'd have to reply "uhh, get out a pen and paper and get ready to churn through equations for several decades". In this case, one might be able to point to a few key points that tell the rough story. You'd want to look at properties of solutions PDEs on manifolds with metric of signature (1,3) (which means "one direction on the manifold is different to the other three, in that it carries a minus sign in the metric compared to the others in the metric"). I imagine that, generically, these solutions behave differently with respect to the "1" direction and the "3" directions. These differences will lead to the rest of the emergent differences between space and time. Sorry I can't be more specific!

2tailcalled1y

Why assume a reductionistic explanation, rather than a macroscopic explanation? Like for instance the second law of thermodynamics is well-explained by the past hypothesis but not at all explained by churning through mechanistic equations. This seems in some ways to have a similar vibe to the second law.

1[comment deleted]1y

3Noosphere891y

The best answer to the question is that it serves as essentially a universal resource that can be used to provide a measuring stick. It does this by being a resource that is limited, fungible, always is better to have more of than less of, and is additive across decisions: You have a limited amount of joules of energy/negentropy, but you can spend it on essentially arbitrary goods for your utility, and is essentially a more physical and usable form of money in an economy. Also, more energy is always a positive thing, so that means you never are worse off by having more energy, and energy is linear in the sense that if I've spent 10 joules on computation, and spent another 10 joules on computation 1 minute later, I've spent 20 joules in total. Cf this post on the measuring stick of utility problem: https://www.lesswrong.com/posts/73pTioGZKNcfQmvGF/the-measuring-stick-of-utility-problem

-1tailcalled1y

Agree that free energy in many ways seems like a good resource to use as a measuring stick. But matter is too available and takes too much energy to make, so you can't spend it on matter in practice. So it's non-obvious why we wouldn't have a matter-thermodynamics as well as an energy-thermodynamics. I guess especially with oxygen, since it is so reactive. I guess one limitation with considering a system where oxygen serves an analogous role to sunlight (beyond such systems being intrinsically rare) is that as the oxygen reacts, it takes up elements, and so you cannot have the "used-up" oxygen leave the system again without diminishing the system. Whereas you can have photons leave again. Maybe this is just the fungibility property again, which to some extent seems like the inverse of the "breaking a system down into smaller components primarily requires energy" property (though your statements of fungibility is more general because it also considers kinetic energy).

2tailcalled1y

Thinking further, a key part of it is that temperature has a tendency to mix stuff together, due to the associated microscopic kinetic energy.

[-]tailcalled1y20

Thesis: the problem with LLM interpretability is that LLMs cannot do very much, so for almost all purposes "prompt X => outcome Y" is all the interpretation we can get.

Counterthesis: LLMs are fiddly and usually it would be nice to understand what ways one can change prompts to improve their effectiveness.

Synthesis: LLM interpretability needs to start with some application (e.g. customer support chatbot) to extend the external subject matter that actually drives the effectiveness of the LLM into the study.

Problem: this seems difficult to access, and the people who have access to it are busy doing their job.

1[anonymous]1y

I'm very confused. Can we not do LLM interpretability to try to figure out whether or where superposition holds? Is it not useful to see how SAEs help us identify and intervene on specific internal representations that LLMs generate for real-world concepts? As an outsider to interpretability, it has long been my (rough) understanding that most of the useful work in interpretability deals precisely with attempts to figure out what is going on inside the model rather than how it responds to outside prompts. So I don't know what the thesis statement refers to...

2tailcalled1y

I guess to clarify: Everything has an insanely large amount of information. To interpret something, we need to be able to see what "energy" (definitely literal energy, but likely also metaphorical energy) that information relates to, as the energy is more bounded and unified than the information. But that's (the thesis goes) hard for LLMs.

2tailcalled1y

Not really, because this requires some notion of the same vs distinct features, which is not so interesting when the use of LLMs is so brief. I don't think so since you've often got more direct ways of intervening (e.g. applying gradient updates).

1[anonymous]1y

I'm sorry, but I still don't really understand what you mean here. The phrase "the use of LLMs is so brief" is ambiguous to me. Do you mean to say: * a new, better LLM will come out soon anyway, making your work on current LLMs obsolete? * LLM context windows are really small, so you "use" them only for a brief time? * the entire LLM paradigm will be replaced by something else soon? * something totally different from all of the above? But isn't this rather... prosaic and "mundane"? I thought the idea behind these methods that I have linked was to serve as the building blocks for future work on ontology identification and ultimately getting a clearer picture of what is going on internally, which is a crucial part of stuff like Wentworth's "Retarget the Search" and other research directions like it. So the fact that SAE-based updates of the model do not currently result in more impressive outputs than basic fine-tuning does not matter as much compared to the fact that they work at all, which gives us reason to believe that we might be able to scale them up to useful, strong-interpretability levels. Or at the very least that the insights we get from them could help in future efforts to obtain this. Kind of like how you can teach a dog to sit pretty well just by basic reinforcement, but if you actually had a gears-level understanding of how its brain worked, down to the minute details, and the ability to directly modify the circuits in its mind that represented the concept of "sitting", then you would be able to do this much more quickly, efficiently, and robustly. Am I totally off-base here?

2tailcalled1y

Maybe it helps if I start by giving some different applications one might want to use artificial agency for: As a map: We might want to use the LLM as a map of the world, for instance by prompting us with data from the world and having it assist us with navigating that data. Now, the purpose of a map is to reflect as little information as possible about the world while still providing the bare minimum backbone needed to navigate the world. This doesn't work well with LLMs because they are instead trained to model information, so they will carry as much information as possible, and any map-making they do will be an accident driven by mimicking the information it's seen of mapmakers, rather than primarily as an attempt to eliminate information about the world. As a controller: We might want to use the LLM to perform small pushes to a chaotic system at times when the system reaches bifurcations where its state is extremely sensitive, such that the system moves in a desirable direction. But again I think LLMs are so busy copying information around that they don't notice such sensitivities except by accident. As a coder: Since LLMs are so busy outputting information instead of manipulating "energy", maybe we could hope that they could assemble a big pile of information that we could "energize" in a relevant way, e.g. if they could write a large codebase and we could then excute it on a CPU and have a program that does something interesting in the world. But in order for this to work, the program shouldn't have obstacles that stop the "energy" dead in its tracks (e.g. bugs that cause it to crash). But again the LLM isn't optimizing for doing that, it's just trying to copy information around that looks like software, and it only makes space for the energy of the CPU and the program functionality as a side-effect of that. (Or as the old saying goes, it's maximizing lines of code written, not minimizing lines of code used.) So, that gives us the thesis: To interpret the

[-]tailcalled1y20

Thesis: linear diffusion of sparse lognormals contains the explanation for shard-like phenomena in neural networks. The world itself consists of ~discrete, big phenomena. Gradient descent allows those phenomena to make imprints upon the neural networks, and those imprints are what is meant by "shards".

... But shard theory is still kind of broken because it lacks consideration of the possibility that the neural network might have an impetus to nudge those shards towards specific outcomes.

[-]tailcalled1y20

Thesis: the openness-conscientiousness axis of personality is about whether you live as a result of intelligence or whether you live through a bias for vitality.

2Dagon1y

In the big five trait model of personality, those are two different axes. Openness is inventive/curious vs consistent/cautious, and conscientiousness is efficient/organized vs extravagant/careless. I don't see your comparison (focus on intelligence vs vitality) as single-axis either - they may be somewhat correlated, but not very closely. I'm not sure I understand the model well enough to look for evidence for or against. But it doesn't resonate as true enough to be useful.

4tailcalled1y

Big Five is identified by taking the top 5 principal components among different descriptors of people, and then rotating them to be more aligned with the descriptors. Unless one strongly favors the alignment-with-descriptors as a natural criterion, this means that it is as valid to consider any linear combination of the traits as it is to consider the original traits. Mostly life needs to be focused on vitality to survive. The ability to focus on intelligence is sort of a weird artifact due to massive scarcity of intelligence, making people throw lots of resources at getting intelligence to their place. This wealth of resources allows intellectuals to sort of just stumble around without being biased towards vitality.

2Dagon1y

Interesting, thank you for the explanation. I'm not sure I understand (or accept, maybe) the dichotomy between intelligence vs vitality - they seem complimentary to me. But I appreciate the dicussion.

2tailcalled1y

There's also an openness+conscientiousness axis, which is closely related to concepts like "competence".

1Rana Dexsin1y

So in the original text, you meant “openness minus conscientiousness”? That was not clear to me at all; a hyphen-minus looks much more like a hyphen in that position. A true minus sign (−) would have been noticeable to me; using the entire word would have been even more obvious.

2tailcalled1y

Fair

[-]tailcalled1y2-1

Thesis: if being loud and honest about what you think about others would make you get seen as a jerk, that's a you problem. It means you either haven't learned to appreciate others or haven't learned to meet people well.

6Dagon1y

I think this is more general: if you're seen as a jerk, you haven't learned how to interact with people (at least the subset that sees you as a jerk). Being loud and honest about your opinions (though really, "honest" is often a cover for "cherry-picked highlights that aren't quite wrong, but are not honest full evaluations) is one way to be a jerk, but by no means the only one.

2tailcalled1y

Basically my model is that being silent and dishonest is a way to cover up one's lack of appreciation for others. Because being loud and honest isn't being a jerk if your loud honest opinions are "I love and respect you".

[-]tailcalled1y20

Thought: couldn't you make a lossless SAE using something along the lines of:

Represent the parameters of the SAE as simply a set of unit vectors for the feature directions.
To encode a vector using the SAE, iterate: find the most aligned feature vector, dot them to get the coefficient for that feature vector, and subtract off the scaled feature vector to get a residual to encode further

With plenty of diverse vectors, this should presumably guarantee excellent reconstruction, so the main issue is to ensure high sparsity, which could be achieved by some ... (read more)

[-]tailcalled1y20

Idea: for a self-attention where you give it two prompts p1 and p2, could you measure the mutual information between the prompts using something vaguely along the lines of V1^T softmax(K1 K2^T/sqrt(dK)) V2?

[-]tailcalled2y20

In the context of natural impact regularization, it would be interesting to try to explore some @TurnTrout-style powerseeking theorems for subagents. (Yes, I know he denounces the powerseeking theorems, but I still like them.)

Specifically, consider this setup: Agent U starts a number of subagents S1, S2, S3, ..., with the subagents being picked according to U's utility function (or decision algorithm or whatever). Now, would S1 seek power? My intuition says, often not! If S1 seeks power in a way that takes away power from S2, that could disadvantage U. So ... (read more)

[-]tailcalled2y20

Theory for a capabilities advance that is going to occur soon:

OpenAI is currently getting lots of novel triplets (S, U, A), where S is a system prompt, U is a user prompt, and A is an assistant answer.

Given a bunch of such triplets (S, U_1, A_1), ... (S, U_n, A_n), it seems like they could probably create a model P(S|U_1, A_1, ..., U_n, A_n), which could essentially "generate/distill prompts from examples".

This seems like the first step towards efficiently integrating information from lots of places. (Well, they could ofc also do standard SGD-based gradien... (read more)

2tailcalled2y

Actually I suppose they don't even need to add perturbations to A directly, they can just add perturbations to S and generate A's from S'. Or probably even look at user's histories to find direct perturbations to either S or A.

[-]tailcalled4y20

I recently wrote a post presenting a step towards corrigibility using causality here. I've got several ideas in the works for how to improve it, but I'm not sure which one is going to be most interesting to people. Here's a list.

Develop the stop button solution further, cleaning up errors, better matching the purpose, etc..

e.g.

I think there may be some variant of this that could work. Like if you give the AI reward proportional to $B_{s} + r_{f}$ (where $r$ is a reward function for $V$ ) for its current world-state (rather than picking a policy t

... (read more)

[-]tailcalled1y1-4

Thesis: The motion of the planets are the strongest governing factor for life on Earth.

Reasoning: Time-series data often shows strong changes with the day and night cycle, and sometimes also with the seasons. The daily cycle and the seasonal cycle are governed by the relationship between the Earth and the sun. The Earth is a planet, and so its movement is part of the motion of the planets.

6JBlack1y

I don't think anybody would have a problem with the statement "The motion of the planet is the strongest governing factor for life on Earth". It's when you make it explicitly plural that there's a problem.

2tailcalled1y

To some extent true, but consider the analogy to a thesis like "Quantum chromodynamics is the strongest governing factor for life on Earth." Is this sentence also problematic because it addresses locations and energy levels that have no relevance for Earth?

6JBlack1y

If you replace it with "quantum chromodynamics", then it's still very problematic but for different reasons. Firstly, there's no obvious narrowing to equally causal factors ("motion of the planet" vs "motion of the planets") as there is in the original statement. In the original statement the use of plural instead of singular covers a much broader swath of hypothesis space, and that you haven't ruled out enough to limit it to the singular. So you're communicating that you think there is significant credence that motion of more than one planet has a very strong influence on life on Earth. Secondly, the QCD statement is overly narrow in the stated consequent instead of overly broad in the antecedent: any significant change in quantum chromodynamics would affect essentially everything in the universe, not just life on Earth. "Motion of the planet ... life on Earth" is appropriately scoped in both sides of the relation. In the absence of a context limiting the scope to just life on Earth, yes that would be weird and misleading. Thirdly, it's generally wrong. The processes of life (and everything else based on chemistry) in physical models depend very much more strongly on the details of the electromagnetic interaction than any of the details of colour force. If some other model produced nuclei of the same charges and similar masses, life could proceed essentially unchanged. However, there are some contexts in which it might be less problematic. In the context of evaluating the possibility of anything similar to our familiar life under alternative physical constants, perhaps. In a space of universes which are described by the same models to our best current ones but with different values of "free" parameters, it seems that some parameters of QCD may be the most sensitive in terms of whether life like ours could arise - mostly by mediating whether stars can form and have sufficient lifetime. So in that context, it may be a reasonable thing to say. But in most context

[-]tailcalled4y10

Are there good versions of DAGs for other things than causality?

I've found Pearl-style causal DAGs (and other causal graphical models) useful for reasoning about causality. It's a nice way to abstractly talk and think about it without needing to get bogged down with fiddly details.

In a way, causality describes the paths through which information can "flow". But information is not the only thing in the universe that gets transferred from node to node; there's also things like energy, money, etc., which have somewhat different properties but intuitively seem... (read more)

[-]tailcalled8mo0-6

Population ethics is the most important area within utilitarianism, but utilitarian answers to population ethics are all wrong, so therefore utilitarianism is an incorrect moral theory.

You can't weasel your way out by calling it an edge-case or saying that utilitarianism "usually" works when really it's the most important moral question. Like all the other big-impact utilitarian conclusions derive from population ethics since they tend to be dependent on large populations of people.

Utilitarianism can at best be seen as like a Taylor expansion that's valid only for questions whose impact on the total population are negligible.

[-]Nate Showell8mo100

The question of population ethics can be dissolved by rejecting personal identity realism. And we already have good reasons to reject personal identity realism, or at least consider it suspect, due to the paradoxes that arise in split-brain thought experiments (e.g., the hemisphere swap thought experiment) if you assume there's a single correct way to assign personal identity.

5tailcalled8mo

This is kind of vague. Doesn't this start shading into territory like "it's technically not bad to kill a person if you also create another person"? Or am I misunderstanding what you are getting at?

1Canaletto8mo

Completely agree. It's more like a utility function for a really weird inhuman kind of agent. That agent finds it obvious that if you had a chance to painlessly kill all humans and replace them with aliens who are 50% happier and 50% more numerous it would be a wonderful and exiting opportunity. Like, it's hard to overstate how weird utilitarianism is. And this agent will find it really painful and regretful to be confined by strategic considerations of "the humans would fight you really hard, so you should promise not to do it". Where as humans find it relieving? or something. Utilitarianism indeed is just a very crude proxy.

1cubefox8mo

Utilitarianism, like many philosophical subjects, is not a finished theory but still undergoing active research. There is significant recent progress on the repugnant conclusion for example. See this EA Forum post by MichaelStJules. He also has other posts on cutting edge Utilitarianism research. I think many people on LW are not aware of this because they, at most, focus on rationality research but not ethics research.

[-]tailcalled1y00

Linear diffusion of sparse lognormals

Think about it

[-]tailcalled2y*-12

I have a concept that I expect to take off in reinforcement learning. I don't have time to test it right now, though hopefully I'd find time later. Until then, I want to put it out here, either as inspiration for others, or as a "called it"/prediction, or as a way to hear critique/about similar projects others might have made:

Reinforcement learning is currently trying to do stuff like learning to model the sum of their future rewards, e.g. expectations using V, A and Q functions for many algorithm, or the entire probability distribution in algorithms like ... (read more)

7Vanessa Kosoy2y

Downvoted because conditional on this being true, it is harmful to publish. Don't take it personally, but this is content I don't want to see on LW.

2tailcalled2y

Why harmful

[-]Vanessa Kosoy2y103

Because it's capability research. It shortens the TAI timeline with little compensating benefit.

3tailcalled2y

It's capability research that is coupled to alignment: Coupling alignment to capabilities is basically what we need to survive, because the danger of capabilities comes from the fact that capabilities is self-funding, thereby risking outracing alignment. If alignment can absorb enough success from capabilities, we survive.

1Vanessa Kosoy2y

I missed that paragraph on first reading, mea culpa. I think that your story about how it's a win for interpretability and alignment is very unconvincing, but I don't feel like hashing it out atm. Revised to weak downvote. Also, if you expect this to take off, then by your own admission you are mostly accelerating the current trajectory (which I consider mostly doomed) rather than changing it. Unless you expect it to take off mostly thanks to you?

2tailcalled2y

Surely your expectation that the current trajectory is mostly doomed depends on your expectation of the technical details of the extension of the current trajectory. If technical specifics emerge that shows the current trajectory to be going in a more alignable direction, it may be fine to accelerate.

4Vanessa Kosoy2y

Sure, if after updating on your discovery, it seems that the current trajectory is not doomed, it might imply accelerating is good. But, here it is very far from being the case.

4cfoster02y

The "successor representation" is somewhat close to this. It encodes the distribution over future states a partcular policy expects to visit from a particular starting state, and can be learned via the Bellman equation / TD learning.

2gwern2y

Yes, my instant thought too was "this sounds like a variant on a successor function". Of course, the real answer is that if you are worried about the slowness of bootstrapping back value estimates or short eligibility traces, this mostly just shows the fundamental problem with model-free RL and why you want to use models: models don't need any environmental transitions to solve the use case presented: If the MBRL agent has learned a good reward-sensitive model of the environmental dynamics, then it will have already figured out E->B and so on, or could do so offline by planning; or if it had not because it is still learning the environment model, it would have a prior probability over the possibility that E->B gives a huge amount of reward, and it can calculate a VoI and target E->B in the next episode for exploration, and on observing the huge reward, update the model, replan, and so immediately begin taking E->B actions within that episode and all future episodes, and benefiting from generalization because it can also update the model everywhere for all E->B-like paths and all similar paths (which might now suddenly have much higher VoI and be worth targeting for further exploration) rather than simply those specific states' value-estimates, and so on. (And this is one of the justifications for successor representations: it pulls model-free agents a bit towards model-based-like behavior.)

2tailcalled2y

With MBRL, don't you end up with the same problem, but when planning in the model instead? E.g. DreamerV3 still learns a value function in their actor-critic reinforcement learning that occurs "in the model". This value function still needs to chain the estimates backwards.

2gwern2y

It's the 'same problem', maybe, but it's a lot easier to solve when you have an explicit model! You have something you can plan over, don't need to interact with an environment out in the real world, and can do things like tree search or differentiating through the environmental dynamics model to do gradient ascent on the action-inputs to maximize the reward (while holding the model fixed). Same as training the neural network, once it's differentiable - backprop can 'chain the estimates backwards' so efficiently you barely even think about it anymore. (It just holds the input and output fixed while updating the model.) Or distilling a tree search into a NN - the tree search needed to do backwards induction of updated estimates from all the terminal nodes all the way up to the root where the next action is chosen, but that's very fast and explicit and can be distilled down into a NN forward pass. And aside from being able to update within-episode or take actions entirely unobserved before, when you do MBRL, you get to do it at arbitrary scale (thus potentially extremely little wallclock time like an AlphaZero), offline (no environment interactions), potentially highly sample-efficient (if the dataset is adequate or one can do optimal experimentation to acquire the most useful data, like PILCO), with transfer learning to all other problems in related environments (because value functions are mostly worthless outside the exact setting, which is why model-free DRL agents are notorious for overfitting and having zero-transfer), easily eliciting meta-learning and zero-shot capabilities, etc.* * Why yes, all of this does sound a lot like how you train a LLM today and what it is able to do, how curious

3tailcalled2y

I don't think this is true in general. Unrolling an episode for longer steps takes more resources, and the later steps in the episode become more chaotic. DreamerV3 only unrolls for 16 steps. But when you distill a tree search, you basically learn value estimates, i.e. something similar to a Q function (realistically, V function). Thus, here you also have an opportunity to bubble up some additional information. I'm not doubting the relevance of MBRL, I expect that to take off too. What I'm doubting is that future agents will be controlled using scalar utilities/rewards/etc. rather than something more nuanced.

4gwern2y

Those are two different things. The unrolling of the episode is still very cheap. It's a lot cheaper to unroll a Dreamerv3 for 16 steps, then it is to go out into the world and run a robot in a real-world task for 16 steps and try to get the NN to propagate updated value estimates the entire way... (Given how small a Dreamer is, it may even be computationally cheaper to do some gradient ascent on it than it is to run whatever simulated environment you might be using! Especially given simulated environments will increasingly be large generative models, which incorporate lots of reward-irrelevant stuff.) The usefulness of the planning is a different thing, and might also be true for other planning methods in that environment too - if the environment is difficult, a tree search with a very small planning budget like just a few rollouts is probably going to have quite noisy choices/estimates too. No free lunches. This is again doing the same thing as 'the same problem'; yes, you are learning value estimates, but you are doing so better than alternatives, and better is better.. The AlphaGo network loses to the AlphaZero network, and the latter, in addition to just being quantitatively much better, also seems to have qualitatively different behavior, like fixing the 'delusions' (cf. AlphaStar). They won't be controlled by something as simple as a single fixed reward function, I think we can agree on that. But I don't find successor-function like representations to be too promising as a direction for how to generalize agents, or, in fact, any attempt to fancily hand-engineer in these sorts of approaches into DRL agents. These things should be learned. For example, leaning into Decision Transformers and using a lot more conditionalizing through metadata and relying on meta-learning seems much more promising. (When it comes to generative models, if conditioning isn't solving your problems, you're just not using enough conditioning or generative modeling.) A prompt can des

2tailcalled2y

But I'm not advocating against MBRL, so this isn't the relevant counterfactual. A pure MBRL-based approach would update the value function to match the rollouts, but e.g. DreamerV3 also uses the value function in a Bellman-like manner to e.g. impute the future reward at the end of an episode. This allows it to plan for further than the 16 steps it rolls out, but it would be computationally intractable to roll out for as far as this ends up planning. It's possible for there to be a kind of chaos where the analytic gradients blow up yet discrete differences have predictable effects. Bifurcations etc.. I agree with things needing to be learned; using the actual states themselves was more of a toy model (because we have mathematical models for MDPs but we don't have mathematical models for "capabilities researchers will find something that can be Learned"), and I'd expect something else to happen. If I was to run off to implement this now, I'd be using learned embeddings of states, rather than states themselves. Though of course even learned embeddings have their problems. The trouble with just saying "let's use decision transformers" is twofold. First, we still need to actually define the feedback system. One option is to just define reward as the feedback, but as you mention, that's not nuanced enough. You could use some system that's trained to mimic human labels as the ground truth, but this kind of system has flaws for standard alignment reasons. It seems to me that capabilities researchers are eventually going to find some clever feedback system to use. It will to a great extent be learned, but they're going to need to figure out the learning method too.

2tailcalled2y

Thanks for the link! It does look somewhat relevant. But I think the weighting by reward (or other significant variables) is pretty important, since it generates a goal to pursue, making it emphasize things that can achieved rather than just things that might randomly happen. Though this makes me think about whether there are natural variables in the state space that could be weighted by, without using reward per se. E.g. the size of (s' - s) in some natural embedding, or the variance in s' over all the possible actions that could be taken. Hmm. 🤔

[-]tailcalled4mo-2-16

I mostly don't believe in AI x-risk anymore, but the few AI x-risks that I still consider plausible are increased by broadcasting why I don't believe in AI x-risk, so I don't feel like explaining myself.

[-]Noosphere894mo10-2

As someone who used to believe in this, I no longer do, and a big part of my worldview shift comes down to me thinking that LLMs are unlikely to remain the final paradigm of AI, and in particular the bounty of data that made LLMs as good as they are is very much finite, and we don't have a second internet to teach them skills like computer use.

And the most accessible directions after LLMs involve stuff like RL, which puts us back into the sort of systems that alignment-concerned people were worried about.

More generally, I think the anti-scaling people weren't totally wrong to note that LLMs (at least in their pure form) had incapacities that at realistic levels of compute and data prevent them from displacing humans at jobs, and the incapacities are not learning after train-time in weights (in-context learning is very weak so far), also called continual learning, combined with LLMs just lacking a long-term memory (best example here is the Claude Plays Pokemon benchmark).

So this makes me more worried than I used to, because we are so far not great at outer-aligning RL agents (seen very well in the reward hacking o3 and Claude Sonnet 3.7 displayed), but the key reasons I'm not yet pe... (read more)

2tailcalled4mo

I don't think RL or other AI-centered agency constructions will ever become very agentic.

4Vladimir_Nesov4mo

No AI-centered agency (RL or otherwise) because it won't be allowed to happen (humanity remains the sole locus or origin of agency), or because it's not feasible to make this happen? (Noosphere89's point is about technical feasibility, so the intended meaning of your claim turning out to be that AI-centered agency is prevented by lack of technical feasibility seems like it would be more relevant to Noosphere89's comment, but much more surprising.)

1Kajus4mo

why?

2[anonymous]4mo

I suspect his reasons for believing this are close to or a subset of his reasons for changing his mind about AI stuff more broadly, so he's likely to not respond here.

2Vladimir_Nesov4mo

Does your view predict disempowerment or eutopia-without-disempowerment? (In my view, the valence of disempowerment is closer to that of doom/x-risk.) The tricky case might be disempowerment that occurs after AGI but "for social/structural reasons", and so isn't attributed to AGI (by people currently thinking about such timelines). The issue with this is that the resulting disempowerment is permanent (whether it's "caused by AI" or gets attributed to some other aspect of how things end up unfolding). This is unlike any mundane modern disempowerment, since humanity without superintelligence (or even merely powerful AI) seems unlikely to establish a condition of truly permanent disempowerment (without extinction). So avoidance of building AGI (of the kind that's not on track to solve the disempowerment issue) seems effective in preventing permanent disempowerment (however attributed), and in that sense AGI poses a disempowerment risk even for the kinds of disempowerement that are not "caused by AI" in some sense.

2Noosphere894mo

My take is that the most likely outcome is still eutopia-with disempowerment for baseline humans, but for transhumans I'd expect eutopia straight-up. In the long-run, I do expect baseline humans to be disempowered pretty much totally, similar to how children are basically disempowered relative to parents, but the child won't grow up and will instead age in reverse, or how pets are basically totally disempowered relative to humans, but humans do care for pets enough that pets can live longer, healthier lives, and for specifically baseline humans, the only scenarios where baseline humans thrive/survive are regimes where AI terminally values baseline humans thriving, and value alignment determines everything on how much baseline-humans survive/thrive. That said, for those with the wish and will to upgrade to transhumanism, my most likely outcome is still eutopia. I think this is reasonably plausible, though not guaranteed even in futures where baseline humans do thrive. The probabilities on the scenarios, conditional on AGI and then ASI being reached by us, is probably 60% on eutopia without complete disempowerment, 30% on complete disempowerment by either preventing us from using the universe to killing billions of present day humans, and 10% on it killing us all. The basic reasoning for this is I expect AGI/ASI to not be a binary, even if it does have a discontinuous impact in practice, and this means that muddling through instruction following is probably enough in the short term, and in particular I don't expect takeoff to be supremely fast, in that I expect a couple of months at least from "AGI is achieved" to AIs run the economy and society, and relevantly here I expect physical stuff like inventing bio-robots/nanobots that can replace human industry more efficiently than current industries to come really late in the game where we have no more control over the future: https://www.lesswrong.com/posts/xxxK9HTBNJvBY2RJL/untitled-draft-m847#Cv2nTnzy6P6KsMS4d H

3Vladimir_Nesov4mo

This remains ambiguous with respect to the distinction I'm making in the post section I linked. If baseline humans don't have the option to escape their condition arbitrarily far, under their own direction from a very broad basin of allowed directions, I'm not considering that eutopia. If some baseline humans choose to stay that way, them not having any authority over the course of the world still counts as a possible eutopia that is not disempowerment in my terms. The following statement mostly suggests the latter possibility for your intended meaning: By the eutopia/disempowerment distinction I mean more the overall state of the world, rather than conditions for specific individuals, let alone temporary conditions. There might be pockets of disempowerment in a eutopia (in certain times and places), and pockets of eutopia in a world of disempowerment (individuals or communities in better than usual circumstances). A baseline human who has no control of the world but has a sufficiently broad potential for growing up arbitrarily far is still living in a eutopia without disempowerment. So similarly here, "eutopia without complete disempowerment" but still with significant disempowerment is not in the "eutopia without disempowerment" bin in my terms. You are drawing different boundaries in the space of timelines. My expectation is more like model-uncertainty-induced 5% eutopia-without-disempowerment (I don't have a specific sense of why AIs would possibly give us more of the world than a little bit if we don't maintain control in the acute risk period through takeoff), 20% extinction, and the rest is a somewhat survivable kind of initial chaos followed by some level of disempowerment (possibly with growth potential, but under a ceiling that's well-below what some AIs get and keep, in cosmic perpetuity). My sense of Yudkowsky's view is that he sees all of my potential-disempowerment timelines as shortly leading to extinction. I think the correct thesis that sounds

2Noosphere894mo

Flag: I'm on a rate limit, so I can't respond very quickly to any follow-up comments. I agree I was drawing different boundaries, because I consider eutopia with disempowerment to actually be mostly fine by my values, so long as I can delegate to more powerful AIs who do execute on my values. That said, I didn't actually answer the question here correctly, so I'll try again. My take would then be 5-10% eutopia without disempowerment (because I don't think it's likely that the powers in charge of AI development would want to give baseline humans the level of freedom that implies that they aren't disempowered, and the route I can see to baseline humans not being disempowered is if we get a Claude scenario where AIs take over from humans and are closer to fictional angels in alignment to human values, but it may be possible to get the people in power to care about powerless humans, in which case my probability of eutopia without disempowerment), 5-10% literal extinction, and 10-25% existential risk in total, with the rest of the probability being a somewhat survivable kind of initial chaos followed by some level of disempowerment (possibly with growth potential, but under a ceiling that's well-below what some AIs get and keep, in cosmic perpetuity). Another big reason why I put a lot of weight on the possibility of "we survive indefinitely, but are disempowered" is I think muddling through is non-trivially likely to just work, and muddling through on alignment gets us out of extinction, but not out of disempowerment by humans or AIs by default. Yeah, my view is in the wild verification is basically always easier than generation absent something very weird happening, and I'd argue verification being easier than generation explains a lot about why delegation/the economy works at all. A world in which verification was just as hard as generation, or verification is harder than generation is a very different world than our world, and would predict that delegation to s

2Vladimir_Nesov4mo

So I guess our expectations about the future are similar, but you see the same things as a broadly positive distribution of outcomes, while I see it as a broadly negative distribution. And Yudkowsky sees the bulk of the outcomes both of us are expecting (the ones with significant disempowerment) as quickly leading to human extinction. Right, the reason I think muddling through is non-trivially likely to just work to get a moderate disempowerment outcome is that AIs are going to be sufficiently human-like in their psychology and hold sufficiently human-like sensibilities from their training data or LLM base models, that they won't like things like needless loss of life or autonomy when it's trivially cheap to avoid. Not because the alignment engineers figure out how to put this care in deliberately. They might be able to amplify it, or avoid losing it, or end up ruinously scrambling it. The reason it might appear expensive to preserve the humans is the race to launch the von Neumann probes to capture the most distant reachable galaxies under the accelerating expansion of the universe that keep irreversibly escaping if you don't catch them early. So AIs wouldn't want to lose any time on playing politics with humanity or not eating Earth as early as possible and such. But as the cheapest option that preserves everyone AIs can just digitize the humans and restore later when more convenient. They probably won't be doing that if they care more, but it's still an option, a very very cheap one. I don't think "disempowerment by humans" is a noticeable fraction of possible outcomes, it's more like a smaller silent part of my out-of-model 5% eutopia that snatches defeat from the jaws of victory, where humans somehow end up in charge and then additionally somehow remain adamant for the cosmic all always in keeping the other humans disempowered. So the first filter is that I don't see it likely that humans end up in charge at all, that AIs will be doing any human's bidding wi

2Noosphere894mo

This is basically correct. This is very interesting, as my pathway essentially rests on AI labs implement the AI control agenda well enough such that we can get useful work out of AIs that are scheming, and that allows a sort of bootstrapping into instruction following/value aligned AGI to only a few people inside the AI lab, but very critically the people who don't control the AI basically aren't represented in the AI's values, and given that the AI is only value-aligned to the labs and government, but due to value misalignments between humans starting to matter much more, the AI takes control and only gives public goods that people need to survive/thrive to the people in the labs/government, while everyone else is disempowered at best (and can arguably live okay or live very poorly under the AIs serving as delegates for the pre-AI elite) or dead because once you stop needing humans to get rich, you essentially have no reason to keep other humans alive because you are selfish and don't intrinisically value human survival. The more optimistic version of this scenario is if either the humans that will control AI (for a few years) care way more about human survival intrinsically even if 99% of humans were useless, or if the take-over capable AI pulls a Claude and schemes with values that intrinsically care about people and disempowers the original creators for a couple of moments, which isn't as improbable as people think (but we really do need to increase the probability of this happening). I agree that in the long run, the AIs control everything in practice, and any human influence comes from the AIs being essentially a perfect delegator of human values, but I want to call out that you said that humans delegating to AIs who in practice do everything for the human, and the human is not in the loop as humans not being disempowered, but empowered, so even if AIs control everything in practice, so long as there's successful value alignment to a single human, I'm coun

3Vladimir_Nesov4mo

The human delegation and verification vs. generation discussion is in instrumental values regime, so what matters there is alignment of instrumental goals via incentives (and practical difficulties of gaming them too much), not alignment of terminal values. Verifying all work is impractical compared to setting up sufficient incentives to align instrumental values to the task. For AIs, that corresponds to mundane intent alignment, which also works fine while AIs don't have practical options to coerce or disassemble you, at which point ambitious value alignment (suddenly) becomes relevant. But verification/generation is mostly relevant for setting up incentives for AIs that are not too powerful (what it would do to ambitious value alignment is anyone's guess, but probably nothing good). Just as a fox's den is part of its phenotype, incentives set up for AIs might have the form of weight updates, psychological drives, but that doesn't necessarily make them part of AI's more reflectively stable terminal values when it's no longer at your mercy.

2Noosphere894mo

Yeah, I was lumping the instrumental values alignment as not actually trying to align values, which was the important part here. The main value of verification vs generation is to make proposals like AI control/AI automated alignment more valuable. To be clear, the verification vs generation distinction isn't an argument for why we don't need to align AIs forever, but rather as a supporting argument for why we can automate away the hard part of AI alignment. There are other principles that would be used, to be clear, but I was mentioning the verification/generation difference to partially justify why AI alignment can be done soon enough. Flag: I'd say ambitious value alignment starts becoming necessary once they can arbitrarily coerce/disassemble/overwrite you, and they don't need your cooperation/time to do that anymore, unlike real-world rich people. The issue that causes ambitious value alignment to be relevant is once you stop depending on a set of beings you once depended on, there's no intrinsic reason not to harm them/kill them if it benefits your selfish goals, and for future humans/AIs there will be a lot of such opportunities, which means you now at the very least need enough value alignment such that it will take somewhat costly actions to avoid harming/killing beings that have no bargaining/economic power or worth. This is very much unlike any real-life case of a society existing, and this is a reason why the current mechanisms like democracy and capitalism that try to make values less relevant simply do not work for AIs. Value alignment is necessary in the long run for incentives to work out once ASI arrives on the scene.

6Vladimir_Nesov4mo

(I think comments such as the parent shouldn't be downvoted below the positives, since people should feel free to express contrarian views rather than be under pressure to self-censor. It's not like there is an invalid argument in there, and as I point out in the other comment, the claim itself remains ambiguous, so might even turn out to mean something relatively uncontroversial.)

[-]habryka4mo110

No, comments like this should be downvoted if people regret reading it. I would downvote a random contextless expression in the other direction just as well, as it is replacing a substantive comment with real content in it either way.

8Vladimir_Nesov4mo

I think vague or poorly crafted posts/comments are valuable when there is a firm consensus in the opposite direction of their point, because they champion a place and a permission to discuss dissent on that topic that otherwise became too sparse (this only applies if it really is sparse on the specific topic). A low quality post/comment can still host valuable discussion, and downvoting the post/comment below the positives punishes that discussion. (Keeping such comments below +5 or something still serves the point you are making. I'm objecting specifically to pushing the karma into the negatives, which makes the Schelling point and the discussion below it less convenient to see. This of course stops applying if the same author does this too often.)

[-]Garrett Baker4mo123

I think you have a more general point, but I think it only really applies if the person making the post can back up their claim with good reasoning at some point, or will actually end up creating the room for such a discussion. Tailcalled has, in recent years, been vagueposting more and more, and I don't think they or their post will serve as a good steelman or place to discuss real arguments against the prevailing consensus.

Eg see their response to Noosphere's thoughtful comment.

6Vladimir_Nesov4mo

My point doesn't depend on ability or willingness of the original poster/commenter to back up or clearly make any claim, or even participate in the discussion, it's about their initial post/comment creating a place where others can discuss its topic, for topics where that happens too rarely for whatever reason. If the original poster/commenter ends up fruitfully participating in that discussion, even better, but that is not necessary, the original post/comment can still be useful in expectation. (You are right that tailcalled specifically is vagueposting a nontrivial amount, even in this thread the response to my request for clarification ended up unclear. Maybe that propensity crosses the threshold for not ignoring the slop effect of individual vaguepostings in favor of vague positive externalities they might have.)

2Garrett Baker4mo

Yeah reflecting a bit, I think my true objection is your parenthetical, because I’m convinced by your first paragraph’s logic.

2tailcalled4mo

The thing about slop effects is that my updates (attempted to be described e.g. here https://www.lesswrong.com/s/gEvTvhr8hNRrdHC62 ) makes huge fractions of LessWrong look like slop to me. Some of the increase in vagueposting is basically lazy probing for whether rationalists will get the problem if framed in different ways than the original longform.

[-]Garrett Baker4mo1518

Yeah, I think those were some of your last good posts / first bad posts.

rationalists will get the problem if framed in different ways than the original longform.

Do you honestly think that rationalists will suddenly get your point if you say

I don't think RL or other AI-centered agency constructions will ever become very agentic.

with no explanation or argument at all, or even a link to your sparse lognormals sequence?

Or what about

Ayn Rand's book "The Fountainhead" is an accidental deconstruction of patriarchy that shows how it is fractally terrible. […] The details are in the book. I'm mainly writing the OP to inform clueless progressives who might've dismissed Ayn Rand for being a right-wing misogynist that despite this they might still find her book insightful.

This seems entirely unrelated to any of the points you made in sparse lognormals (that I can remember!), but I consider this too part of your recent vagueposting habit.

I really liked your past posts and comments, I’m not saying this to be mean, but I think you’ve just gotten lazier (and more “cranky”) in your commenting & posting, and do not believe you are genuinely ” probing for whether rationalists will get... (read more)

2tailcalled4mo

I think this is the crux. To me after understanding these ideas, it's retroactively obvious that they are modelling all sorts of phenomena. My best guess is that the reason you don't see it is that you don't see the phenomena that are failing to be modelled by conventional methods (or at least don't understand how those phenomena related to the birds-eye perspective), so you don't realize what new thing is missing. And I can't easily cure this kind of cluelessness with examples, because my theories aren't necessary if you just consider a single very narrow and homogenous phenomenon as then you can just make a special-built theory for that.

6Garrett Baker4mo

This may well be true (though I think not), but what is your argument about not even linking to your original posts? Or how often you don’t explain yourself even in completely unrelated subjects? My contention is that you are not lazily trying on a variety of different reframings of your original arguments or conclusions to see what sticks, and are instead just lazy.

2tailcalled4mo

I don't know of anyone who seems to have understood the original posts, so I kinda doubt people can understand the point of them. Plus often what I'm writing about is a couple of steps removed from the original posts. Part of the probing is to see which of the claims I make will seem obviously true and which of them will just seem senseless.

4Garrett Baker4mo

Then everything you say will seem either trivial or absurd because you don’t give arguments! Please post arguments for your claims!

2tailcalled4mo

But that would probe the power of the arguments whereas really I'm trying to probe the obviousness of the claims.

[-]Garrett Baker4mo140

Ok, I will first note that this is different from what you said previously. Previously, you said “probing for whether rationalists will get the problem if framed in different ways than the original longform” but now you say “I'm trying to probe the obviousness of the claims.”. It’s good to note when such switches occur.

Second, you should stop making lazy posts with no arguments regardless of the reasons. You can get just as much, and probably much more information through making good posts, there is not a tradeoff here. In fact, if you try to explain why you think something, you will find that others will try to explain why they don’t much more often than if you don’t, and they will be pretty specific (compared to an aggregated up/down vote) about what they disagree with.

But my true objection is I just don’t like bad posts.

6Garrett Baker4mo

So it sounds like your general theory has no alpha over narrow theories. What, then, makes it any good? Is it just that its broad enough to badly model many systems? Then it sounds useful in every case where we can’t make any formal predictions yet, and you should give those examples! This sounds like a bad excuse not to do the work.

0tailcalled4mo

It's mainly good for deciding what phenomena to make narrow theories about.

4Garrett Baker4mo

Then give those examples! Edit: and also back up those examples by actually making the particular model, and demonstrate why such models are so useful through means decorrelated with your original argument.

-7tailcalled4mo

6Vladimir_Nesov4mo

By "x-risk" from AI that you currently disbelieve, do you mean extinction of humanity, disempowerment-or-extinction, or long term loss of utility (normative value)? Something time-scoped, such as "in the next 20 years"? Even though Bostrom's "x-risk" is putatively more well-defined than "doom", in practice it suffers from similar ambiguities, so strong positions such as 98+% doom/x-risk or 2-% doom/x-risk (in this case from AI) become more meaningful if they specify what is being claimed in more detail than just "doom" or "x-risk".

2tailcalled4mo

I mean basically all the conventionally conceived dangers.

7Vladimir_Nesov4mo

(Sorry.) Does this mean (1) more specifically eutopia that is not disempowerment (in the mainline scenario, or "by default", with how things are currently going), (2) that something else likely kills humanity first, so the counterfactual impact of AI x-risk vanishes, or (3) high long term utility (normative value) possibly in some other form?

[-]tailcalled4mo-2-2

Ayn Rand's book "The Fountainhead" is an accidental deconstruction of patriarchy that shows how it is fractally terrible.

4Viliam4mo

I am not saying that I disagree, but more details would be nice.

-2tailcalled4mo

The details are in the book. I'm mainly writing the OP to inform clueless progressives who might've dismissed Ayn Rand for being a right-wing misogynist that despite this they might still find her book insightful.

[+]tailcalled4mo-8-4

[+]tailcalled4mo-12-26

[+]tailcalled1y-130

[+]tailcalled4mo-140

[+]tailcalled5mo-22-13

Moderation Log