# All of rime's Comments + Replies

but I am pretty sure that there is a program that you can write down that has the same structural property of being interpretable in this way, where the algorithm also happens to define an AGI.

Interesting. I have semi-strong intuitions in the other direction. These intuitions are mainly from thinking about what I call the Q-gap, inspired by Q Home's post and this quote:

…for simple mechanisms, it is often easier to describe how they work than what they do, while for more complicated mechanisms, it is usually the other way around.

Intelligent processes are an...

1Johannes C. Mayer2mo
That is an interesting analogy. So if I have a simple AGI algorithm, then if I can predict where it will move to, and understand the final state it will move to, I am probably good, as long as I can be sure of some high-level properties of the plan. I.e. the plan should not take over the world let's say. That seems to be property you might be able to predict of a plan, because it would make the plan so much longer, than just doing the obvious thing. This isn't easy of cause, but I don't think having a system that is more complex would help with this. Having a system that is simple makes it simpler to analyze the system in all regards, all else equal (assuming you don't make it short by writing a code golf program, you still want to follow good design practices, and lay out the program in the obvious most understandable way). As a story sidenote before I get into why I think tho Q-gap probably is wrong: That I can't predict that it will rain tomorrow if I have the perfect model of low-level dynamics in the universe, has more to do with how much compute I have available. I might be able to predict if it would rain tomorrow would I know the initial conditions of the universe and some very large but finite amount of compute, if the universe is not infinite? I am not sure the Q-gap makes sense. I can have a 2D double pendulum. This is very easy to describe and hard to predict. I can make a chaotic system more complex, and then it becomes a bit harder to predict but not really by much. It's not analytically solvable for 2 joints already (according to Google). That describing the functioning of complex mechanisms seems harder than saying what they do, might be an illusion. We as humans have a lot of abstractions in our heads to think about the real world. A lot of the things that we build mechanisms to do are expressible in these concepts. So they seem simple to us. This is true for most mechanisms we build that produce some observable output. If we ask "What does thi

There's a funny self-contradiction here.[1]

If you learn from this essay, you will then also see how silly it was that it had to be explained to you in this manner. The essay is littered with appeals to historical anecdotes, and invites you defer to the way they went about it because it's evident they had some success.

Bergman, Grothendieck, and Pascal all do this.

If the method itself doesn't make sense to you by the light of your own reasoning, it's not something you should be interested in taking seriously. And if the method makes sense to you on its own, ...

I'm curious to know what people are down voting.

My uncharitable guess? People are doing negative selection over posts, instead of "ruling posts in, not out". Posts like this one that go into a lot of specific details present voters with many more opportunities to disagree with something. So when readers downvote based on the first objectionable thing they find, writers are disincentivised from going into detail.

Plus, the author uses a lot of jargon and makes up new words, which somehow associates with epistemic inhumility for some people. Whereas I think w...

3Mazianni4mo
You make some good points. For instance, I did not associate "model collapse" with artificial training data, largely because of my scope of thinking about what 'well crafted training data' must look like (in order to qualify for the description 'well crafted.') Yet, some might recognize the problem of model collapse and the relationship between artificial training data and my speculation and express a negative selection bias, ruling out my speculation as infeasible due to complexity and scalability concerns. (And they might be correct. Certainly the scope of what I was talking about is impractical, at a minimum, and very expensive, at a maximum.) And if someone does not engage with the premise of my comment, but instead simply downvotes and moves on... there does appear to be reasonable cause to apply an epithet of 'epistemic inhumility.' (Or would that be better as 'epistemic arrogance'?) I do note that instead of a few votes and substantially negative karma score, we now have a modest increase in votes and a net positive score. This could be explained either by some down votes being retracted or several high positive karma votes being added to more than offset the total karma of the article. (Given the way the karma system works, it seems unlikely that we can deduce the exact conditions due to partial observability.) I would certainly like to believe that if epistemic arrogance played a part in the initial down votes that such people would retract those down votes without also accompanying the votes with specific comments to help people improve themselves.
2MiguelDev4mo
About the use of jargon - it is unavoidable for my case, or I believe anyone trying to do alignment research in this regard - like this new post I made where in I use "high corrigibility" as a term yet no one has established a baseline on how to measure corribility. But the thing is for my project to move - I am willing to break some conventional norms, especially I am zooming in to factors where most of the best and respectable people here haven't touched yet. I'm willing to absorb all the damage that comes out of this process of using novel terms. Besides, I think the theoretical framework for alignment that we are looking for will most likely contain similar nature - defined by its own term and most likely havent' been conceptualized in any forum - 90% to 95% probability of this being true in my estimation.

Interesting! I came to it from googling about definitions of CLT in terms of convolutions. But I have one gripe:

does that mean the form of my uncertainty about things approaches Gaussian as I learn more?

I think a counterexample would be your uncertainty over the number of book sales for your next book. There are recursive network effects such that more book sales causes more book sales. The more books you (first-order) expect to sell, the more books you ought to (second-order) expect to sell. In other words, your expectation over X indirectly depends on yo...

1Maxwell Peterson5mo
Yes, agree - I've looked into non-identical distributions in previous posts, and found that identicality isn't important, but I haven't looked at non-independence at all. I agree dependent chains, like the books example, is an open question!

I've taken to calling it the 'Q-gap' in my notes now. ^^'

You can understand AlphaZero's fundamental structure so well that you're able to build it, yet be unable to predict what it can do. Conversely, you can have a statistical model of its consequences that lets you predict what it will do better than any of its engineers, yet know nothing about its fundamental structure. There's a computational gap between the system's fundamental parts & and its consequences.

The Q-gap refers to the distance between these two explanatory levels.

...for simple mechanis

...

Yeah, a lot of "second-best theories" are due to smallmindedness xor realistic expectations about what you can and cannot change. And a lot of inadequate equilibria are stuck in equilibrium due to the repressive effect the Overton window has on people's ability to imagine.

## Second-best theories & Nash equilibria

A general frame I often find comes in handy while analysing systems is to look for look for equilibria, figure out the key variables sustaining it (e.g., strategic complements, balancing selection, latency or asymmetrical information in commons-tragedies), and well, that's it. Those are the leverage points to the system. If you understand them, you're in a much better position to evaluate whether some suggested changes might work, is guaranteed to fail, or suffers from a lack of imagination.

Suggestions that fail to...

2SirTruffleberry5mo
I think this is closely related to the more colloquial concept of "necessary evils". I always felt the term was a bit of a misnomer--we feel they are evils, I suspect, because their necessity is questionable. Actually necessary things aren't assigned moral value, because that would be pointless. You can't prescribe behavior that is impossible (to paraphrase Kant). As a recent example, someone argued that school bullying is a necessary evil because bullying in the adult world is inevitable and the schoolyard version is preparation. In that case it seems there was a sort of "all-or-nothing" fallacy, i.e., if we can't eliminate it, we might as well not even mitigate it.

I dislike the frame of "charity" & "steelmanning". It's not usefwl for me because it assumes I would feel negatively about seeing some patterns in the first place, and that I need to correct for this by overriding my habitual soldier-like attitudes. But the value of "interpreting patterns usefwly" is extremely general, so it's a distraction to talk as if it's exclusive to the social realm.

Anyway, this reminded me of what I call "analytic" and "synthetic" thinking. They're both thinking-modes, but they emphasise different things.

• When I process a pattern
...

Sometimes they're the same thing. But sometimes you have:

• An unpredictable process with predictable final outcomes. E.g. when you play chess against a computer: you don't know what the computer will do to you, but you know that you will lose.
• (gap) A predictable process with unpredictable final outcomes. E.g. if you don't have enough memory to remember all past actions of the predictable process. But the final outcome is created by those past actions.

Quoting E.W. Dijkstra quoting von Neumann:

...for simple mechanisms, it is often easier to describe how

...
1Q Home5mo
... Can you expand on the this thought ("something can give less specific predictions, but be more general") or reference famous/professional people discussing it? This thought can be very trivial, but it also can be very controversial. Right now I'm writing a post about "informal simplicity", "conceptual simplicity". It discusses simplicity of informal concepts (concepts not giving specific predictions). I make an argument that "informal simplicity" should be very important a priori. But I don't know if "informal simplicity" was used (at least implicitly) by professional and famous people. Here's as much as I know: (warning, controversial and potentially inaccurate takes!) * Zeno of Elea made arguments basically equivalent to "calculus should exist" and "theory of computation should exist" ("supertasks are a thing") using only the basic math. * The success of neural networks is a success of one of the simplest mechanisms: backpropagation and attention. (Even though they can be heavy on math too.) We observed a complicated phenomenon (real neurons), we simplified it... and BOOM! * Arguably, many breakthroughs in early and late science were sealed behind simple considerations (e.g. equivalence principle), not deduced from formal reasoning. Feynman diagram weren't deduced from some specific math, they came from the desire to simplify. * Some fields "simplify each other" in some way. Physics "simplifies" math (via physical intuitions). Computability theory "simplifies" math (by limiting it to things which can be done by series of steps). Rationality "simplifies" philosophy (by connecting it to practical concerns) and science. * To learn flying, Wright brothers had to analyze "simple" considerations. * Eliezer Yudkowsky influenced many people with very "simple" arguments. Rational community as a whole is a "simplified" approach to philosophy and science (to a degree). * The possibility of a logical decision theory can be deduced from simple informal cons
1rime5mo
I've taken to calling it the 'Q-gap' in my notes now. ^^' You can understand AlphaZero's fundamental structure so well that you're able to build it, yet be unable to predict what it can do. Conversely, you can have a statistical model of its consequences that lets you predict what it will do better than any of its engineers, yet know nothing about its fundamental structure. There's a computational gap between the system's fundamental parts & and its consequences. The Q-gap refers to the distance between these two explanatory levels. Let's say you've measured the surface tension of water to be 73 mN/m at room temperature. This gives you an amazing ability to predict which objects will float on top of it, which will be very usefwl for e.g. building boats. As an alternative approach, imagine zooming in on the water while an object floats on top of it. Why doesn't it sink? It kinda looks like the tiny waterdrops are trying to hold each others' hands like a crowd of people (h/t Feynman). And if you use this metaphor to imagine what's going to happen to a tiny drop of water on a plastic table, you could predict that it will form a ball and refuse to spread out. While the metaphor may only be able to generate very uncertain & imprecise predictions, it's also more general. By trying to find metaphors that capture aspects of the fundamental structure, you're going to find questions you wouldn't have thought to ask if all you had were empirical measurements. What happens if you have a vertical tube with walls that hold hands with the water more strongly than water holds hands with itself?[1] Beliefs should pay rent, but if anticipated experiences is the only currency you're willing to accept, you'll lose out on generalisability. 1. ^ ^ capillary motion

Strong agree. I don't personally use (much) math when I reason about moral philosophy, so I'm pessimistic about being able to somehow teach an AI to use math in order to figure out how to be good.

If I can reduce my own morality into a formula and feel confident that I personally will remain good if I blindly obey that formula, then sure, that seems like a thing to teach the AI. However, I know my morality relies on fuzzy feature-recognition encoded in population vectors which cannot efficiently be compressed into simple math. Thus, if the formula doesn't even work for my own decisions, I don't expect it to work for the AI.

I can empathise with the feeling, but I think it stems from the notion that I (used to) find challenges that I set for myself "artificial" in some way, so I can't be happy unless something or somebody else creates it for me. I don't like this attitude, as it seems like my brain is infantilising me. I don't want to depend on irreducible ignorance to be satisfied. I like being responsible for myself. I'm trying to capture something vague by using vague words, so there are likely many ways to misunderstand me here.

Another point is just that our brains fundame...

what i am pretty confident about, is that whatever the situation, somehow, they are okay.

This hit me. Had to read it thrice to parse it. "Is that sentence even finished?"

I've done a lot of endgame speculation, but I've never been close to imagining what it looks like for everyone to be okay. I can imagine, however, what it looks like internally for me to be confident everyone is ok. The same way I can imagine Magnus Carlsen winning a chess game even if the board is a mystery to me.

It's a destabilising feeling, but seems usefwl to backchain from.

I think a core part of this is understanding that there are trade-offs between "sensitivity" and "specificity", and different search spaces vary greatly in what trade-off is appropriate for it.

I distinguish two different reading modes: sometimes I read to judge whether it's safe to defer to the author about stuff I can't verify, other times I'm just fishing for patterns that are useful to my work.

The former mode is necessary when I read about medicine. I can't tell the difference between a brilliant insight and a lethal mistake, so it really matters to me ...

I.

Why do you believe that "global utility maximization is something that an ASI might independently discover as a worthwhile goal"? (I assume by "utility" you mean something like happiness.)

II.

I'm not sure most people aren't sadists. Humans have wildly inconsistent personalities in different situations.[1] Few people have even have even noticed their own inconsistencies, fewer still have gone through the process of extracting a coherent set of values from the soup and gradually generalising that set to every context they can think of...

So I wouldn't b...

Did this come as a surprise to you, and if so I'm curious why?

It came as a surprise because I hadn't thought about it in detail. If I had asked myself the question head-on, surrounding beliefs would have propagated and filled the gap. It does seem obvious in foresight as well as hindsight, if you just focus on the question.

In my defense, I'm not in the business of making predictions, primarily. I build things. And for building, it's important to ask "ok, how can I make sure the thing that's being built doesn't kill us?" and less important to ask "how are o...

I feel like something tangible is shifting beneath my feet when I read this. I'm not sure anything will be the same ever again.

3mwatkins7mo
I know the feeling. It's interesting to observe the sharp division between this kind of reaction and that of people who seem keen to immediately state "There's no big mystery here, it's just [insert badly informed or reasoned 'explanation']".

Strong upvote, but I disagree on something important. There's an underlying generator that chooses between simulacra do a weighted average over in its response. The notion that you can "speak" to that generator is a type error, perhaps akin to thinking that you can speak to the country 'France' by calling its elected president.

My current model says that the human brain also works by taking the weighted (and normalised!) average (the linear combination) over several population vectors (modules) and using the resultant vector to stream a response. There are ...

3[comment deleted]7mo