This is a special post for quick takes by Dalcy. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Dalcy's Shortform

45Dalcy

6Alexander Gietelink Oldenziel

23Dalcy

3Alexander Gietelink Oldenziel

2mattmacdermott

21Dalcy

6kave

5mattmacdermott

2habryka

1Dalcy

2Adam Shai

3Dalcy

17Dalcy

9Vladimir_Nesov

3Dalcy

3Vladimir_Nesov

7Dalcy

4Jonas Hallgren

3Dalcy

13Dalcy

8Dalcy

2MondSemmel

2papetoast

8Dalcy

1technicalities

7Dalcy

9Alok Singh

3Elizabeth

3Matt Goldenberg

2Adam Zerner

7Dalcy

5Dalcy

4Alok Singh

3Alexander Gietelink Oldenziel

3Alexander Gietelink Oldenziel

2Noosphere89

5Dalcy

4Dalcy

4Dalcy

6Zack_M_Davis

4Dalcy

4Dalcy

4avturchin

3RHollerith

3Dalcy

9Steven Byrnes

3Dalcy

5Alexander Gietelink Oldenziel

2Vladimir_Nesov

4Alexander Gietelink Oldenziel

2Vladimir_Nesov

3Dalcy

3Dalcy

10niplav

3Dalcy

3mako yass

3Dalcy

10Carl Feynman

4Alexander Gietelink Oldenziel

2Noosphere89

1Dalcy

3Dalcy

2Dalcy

2Dalcy

3Algon

2Dalcy

2Dalcy

1Stephen Fowler

2Dalcy

13DirectedEvolution

4Carl Feynman

4Dalcy

3Carl Feynman

3Elizabeth

2Alexander Gietelink Oldenziel

1Dalcy

0Maxwell Clarke

2Alexander Gietelink Oldenziel

1Dalcy

2Alexander Gietelink Oldenziel

2Dalcy

2Dalcy

1Dalcy

2Alexander Gietelink Oldenziel

1Dalcy

6tom4everitt

2[anonymous]

1Dalcy

1Dalcy

1Dalcy

1JNS

1Dalcy

1Dalcy

1Dalcy

1Dalcy

1Dalcy

1Dalcy

6Alexander Gietelink Oldenziel

1Dalcy

1Dalcy

1Dalcy

Some comments are truncated due to high volume. (⌘F to expand all)

Thoughtdump on why I'm interested in computational mechanics:

- one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool
- ... but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions.

- re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don't know how good, but effort-wise far too less invested compared to theory work
- would be interested in these reconstruction algorithms, eg what are the b

6

I agree with you.
Epsilon machine (and MSP) construction is most likely computationally intractable [I don't know an exact statement of such a result in the literature but I suspect it is true] for realistic scenarios.
Scaling an approximate version of epsilon reconstruction seems therefore of prime importance. Real world architectures and data has highly specific structure & symmetry that makes it different from completely generic HMMs. This must most likely be exploited.
The calculi of emergence paper has inspired many people but has not been developed much. Many of the details are somewhat obscure, vague. I also believe that most likely completely different methods are needed to push the program further. Computational Mechanics' is primarily a theory of hidden markov models - it doesn't have the tools to easily describe behaviour higher up the Chomsky hierarchy. I suspect more powerful and sophisticated algebraic, logical and categorical thinking will be needed here. I caveat this by saying that Paul Riechers has pointed out that actually one can understand all these gadgets up the Chomsky hierarchy as infinite HMMs which may be analyzed usefully just as finite HMMs.
The still-underdeveloped theory of epsilon transducers I regard as the most promising lens on agent foundations. This is uncharcted territory; I suspect the largest impact of computational mechanics will come from this direction.
Your point on True Names is well-taken. More basic examples than gauge information, synchronization order are the triple of quantites entropy rate h, excess entropy E and Crutchfield's statistical/forecasting complexity C. These are the most important quantities to understand for any stochastic process (such as the structure of language and LLMs!)

Just read through Robust agents learn causal world models and man it is really cool! It proves a couple of bona fide selection theorems, talking about the internal structure of agents selected against a certain criteria.

- Tl;dr, agents selected to perform robustly in various local interventional distributions must internally represent something isomorphic to a causal model of the variables upstream of utility, for it is capable of answering all causal queries for those variables.
*Thm 1*: agents achieving optimal policy (util max) across various local interventions must be able to answer causal queries for all variables upstream of the utility node*Thm 2*: relaxation of above to nonoptimal policies, relating regret bounds to the accuracy of the reconstructed causal model- the proof is constructive - an algorithm that, when given access to regret-bounded-policy-oracle wrt an environment with some local intervention, queries them appropriately to construct a causal model
- one implication is an algorithm for causal inference that converts black box agents to explicit causal models (because, y’know, agents like you and i are literally that aforementioned ‘regret-bounded-policy-oracle‘)

- These selec

3

yes !! discovered this last week - seems very important the quantitative regret bounds for approximatiions is especially promising

2

I think you can drop this premise and modify the conclusion to “you can find a causal model for all variables upstream of the utility and not downstream of the decision.”

Quick paper review of Measuring Goal-Directedness from the causal incentives group.

tl;dr, goal directedness of a policy wrt a utility function is measured by its min distance to one of the policies implied by the utility function, as per the intentional stance - that one should model a system as an agent insofar as doing so is useful.

- how is "policies implied by the utility function" operationalized? given a value , we define a set containing policies of maximum entropy (of the decision variable, given its parents in the causal bayes net) among those policies that attain the utility .
- then union them over all the achievable values of to get this "wide set of maxent policies," and define goal directedness of a policy wrt a utility function as the maximum (negative) cross entropy between and an element of the above set. (actually we get the same result if we quantify the min operation over just the set of maxent policies achieving the same utility as .)

intuitively, this is measuring: **"how close is my policy **** to being 'deterministic,' while 'optimizing **** at the competence level **...

6

Reminds me a little bit of this idea from Vanessa Kosoy.

5

Thanks for the feedback!
Yeah, uniqueness definitely doesn't always hold for the optimal/anti-optimal policy. I think the way MEG works here makes sense: if you're following the unique optimal policy for some utility function, that's a lot of evidence for goal-directedness. If you're following one of many optimal policies, that's a bit less evidence -- there's a greater chance that it's an accident. In the most extreme case (for the constant utility function) every policy is optimal -- and we definitely don't want to ascribe maximum goal-directedness to optimal policies there.
With regard to relaxing smoothly to epsilon-optimal/anti-optimal policies, from memory I think we do have the property that MEG is increasing in the utility of the policy for policies with greater than the utility of the uniform policy, and decreasing for policies with less than the utility of the uniform policy. I think you can prove this via the property that the set of maxent policies is (very nearly) just Boltzman policies with varying temperature. But I would have to sit down and think about it properly. I should probably add that to the paper if it's the case.
Thanks for this. The proof is indeed nonsense, but I think the proposition is still true. I've corrected it to this.

2

This link doesn't work for me:

1

Thanks, it seems like the link got updated. Fixed!

2

Thanks for writing this up! Having not read the paper, I am wondering if in your opinion there's a potential connection between this type of work and comp mech type of analysis/point of view? Even if it doesn't fit in a concrete way right now, maybe there's room to extend/modify things to combine things in a fruitful way? Any thoughts?

3

Here's my current take, I wrote it as a separate shortform because it got too long. Thanks for prompting me to think about this :)

**EDIT: I no longer think this setup is viable, for reasons that connect to why I think Critch's operationalization is incomplete and why boundaries should ultimately be grounded in Pearlian Causality and interventions. Check ****update****.**

**I believe there's nothing much in the way of actually implementing an approximation of Critch's ****boundaries**^{[1]}** using deep learning.**

Recall, Critch's boundaries are:

- Given a world (markovian stochastic process) , map its values (vector) bijectively using into 'features' that can be split into four vectors each representing a boundary-possessing system's Viscera, Active Boundary, Passive Boundary, and Environment.
- Then, we characterize boundary-ness (i.e. minimal information flow across features unmediated by a boundary) using two mutual information criterion each representing infiltration and exfiltration of information.
- And a policy of the boundary-posessing system (under the 'stance' of viewing the world implied by ) can be viewed as a stochastic map (that has no infiltration/exfiltration by definition) that best approximates the true dynamics.
- The interpretation here (under low exfiltration and infiltrati

9

I don't see much hope in capturing a technical definition that doesn't fall out of some sort of game theory, and even the latter won't directly work for boundaries as representation of respect for autonomy helpful for alignment (as it needs to apply to radically weaker parties).
Boundaries seem more like a landmark feature of human-like preferences that serves as a test case for whether toy models of preference are reasonable. If a moral theory insists on tiling the universe with something, it fails the test. Imperative to merge all agents fails the test unless the agents end up essentially reconstructed. And with computronium, we'd need to look at the shape of things it's computing rather than at the computing substrate.

3

I think it's plausible that the general concept of boundaries can possibly be characterized somewhat independently of preferences, but at the same time have boundary-preservation be a quality that agents mostly satisfy (discussion here. very unsure about this). I see Critch's definition as a first iteration of an operationalization for boundaries in the general, somewhat-preference-independent sense.
But I do agree that ultimately all of this should tie back to game theory. I find Discovering Agents most promising in this regards, though there are still a lot of problems - some of which I suspect might be easier to solve if we treat systems-with-high-boundaryness as a sort of primitive for the kind-of-thing that we can associate agency and preferences with in the first place.

3

There are two different points here, boundaries as a formulation of agency, and boundaries as a major component of human values (which might be somewhat sufficient by itself for some alignment purposes). In the first role, boundaries are an acausal norm that many agents end up adopting, so that it's natural to consider a notion of agency that implies boundaries (after the agent had an opportunity for sufficient reflection). But this use of boundaries is probably open to arbitrary ruthlessness, it's not respect for autonomy of someone the powers that be wouldn't sufficiently care about. Instead, boundaries would be a convenient primitive for describing interactions with other live players, a Schelling concept shared by agents in this sense.
The second role as an aspect of values expresses that the agent does care about autonomy of others outside game theoretic considerations, so it only ties back to game theory by similarity, or through the story of formation of such values that involved game theory. A general definition might be useful here, if pointing AIs at it could instill it into their values. But technical definitions don't seem to work when you consider what happens if you try to protect humanity's autonomy using a boundary according to such definitions. It's like machine translation, the problem could well be well-defined, but impossible to formally specify, other than by gesturing at a learning process.

7

I no longer think the setup above is viable, for reasons that connect to why I think Critch's operationalization is incomplete and why boundaries should ultimately be grounded in Pearlian Causality and interventions.
(Note: I am thinking as I'm writing, so this might be a bit rambly.)
The world-trajectory distribution is ambiguous.
Intuition: Why does a robust glider in Lenia intuitively feel like a system possessing boundary? Well, I imagine various situations that happen in the world (like bullets) and this pattern mostly stays stable in face of them.
Now, notice that the measure of infiltration/exfiltration depends on ϕ∈Δ(Wω), a distribution over world history. Infil(ϕ):=Aggt≥0MutWω∼ϕ((Vt+1,At+1);Et∣(Vt,At,Pt))
So, for the above measure to capture my intuition, the approximate Markov condition (operationalized by low infil & exfil) must consider the world state Wω that contains the Lenia pattern with it avoiding bullets.
Remember, W is the raw world state, no coarse graining. So ϕ is the distribution over the raw world trajectory. It already captures all the "potentially occurring trajectories under which the system may take boundary-preserving-action." Since everything is observed, our distribution already encodes all of "Nature's Intervention." So in some sense Critch's definition is already causal (in a very trivial sense), by the virtue of requiring a distribution over the raw world trajectory, despite mentioning no Pearlian Causality.
Issue: Choice of ϕ
Maybe there is some canonical true ϕ for our physical world that minds can intersubjectively arrive at, so there's no ambiguity.
But when I imagine trying to implement this scheme on Lenia, there's immediately an ambiguity as to which distribution (representing my epistemic state on which raw world trajectories that will "actually happen") we should choose:
1. Perhaps a very simple distribution: assigning uniform probability over world trajectories where the world contains nothing but the glider

4

I think the update makes sense in general, isn't there however some way mutual information and causality is linked? Maybe it isn't strong enough for there to be an easy extrapolation from one to the other.
Also I just wanted to drop this to see if you find it interesting, kind of on this topic? Im not sure its fully defined in a causality based way but it is about structure preservation.
https://youtu.be/1tT0pFAE36c?si=yv6mbswVpMiywQx9
Active Inference people also have the boundary problem as core in their work so they have some interesting stuff on it.

3

Yeah I'd like to know if there's a unified way of thinking about information theoretic quantities and causal quantities, though a quick literature search doesn't show up anything interesting. My guess is that we'd want separate boundary metrics for informational separation and causal separation.

Perhaps I should ~~one day in the far far future~~ write a sequence on bayes nets.

Some low-effort TOC (this is basically mostly koller & friedman):

- why bayesnets and markovnets? factorized cognition, how to do efficient bayesian updates in practice, it's how our brain is probably organized, etc. why would anyone want to study this subject if they're doing alignment research? explain philosophy behind them.
- simple examples of bayes nets. basic factorization theorems (the I-map stuff and separation criterion)
- tangent on why bayes nets aren't causal nets, though Zack M Davis had a good post on this exact topic, comment threads there are high insight
- how inference is basically marginalization (basic theorems of: a reduced markov net represents conditioning, thus inference upon conditioning is the same as marginalization on a reduced net)
- why is marginalization hard? i.e. NP-completeness of exact and approximate inference worst-case

what is a workaround? solve by hand simple cases in which inference can be greatly simplified by just shuffling in the order of sums and products, and realize that the exponential blowup of complexity is dependent on a graphical property of your bayesnet called th

Any thoughts on how to customize LessWrong to make it LessAddictive? I just really, really like the editor for various reasons, so I usually write a bunch (drafts, research notes, study notes, etc) using it but it's quite easy to get distracted.

2

You could use the ad & content blocker uBlock Origin to zap any addictive elements of the site, like the main page feed or the Quick Takes or Popular Comments. Then if you do want to access these, you can temporarily turn off uBlock Origin.
Incidentally, uBlock Origin can also be installed on mobile Firefox, and you can manually sync its settings across devices.

2

Maybe make a habit of blocking https://www.lesswrong.com/posts/* while writing?

moments of microscopic fun encountered while studying/researching:

- Quantum mechanics call vector space & its dual bra/ket because ... bra-c-ket. What can I say? I like it - But where did the letter 'c' go, Dirac?
- Defining cauchy sequences and limits in real analysis: it's really cool how you "bootstrap" the definition of Cauchy sequences / limit on real using the definition of Cauchy sequences / limit on rationals. basically:
- (1) define Cauchy sequence on rationals
- (2) use it to define limit (on rationals) using rational-Cauchy
- (3) use it to define reals
- (4)

1

Maybe he dropped the "c" because it changes the "a" phoneme from æ to ɑː and gives a cleaner division in sounds: "brac-ket" pronounced together collides with "bracket" where "braa-ket" does not.

Any advice on reducing neck and shoulder pain while studying? For me that's my biggest blocker to being able to focus longer (especially for math, where I have to look down at my notes/book for a long period of time). I'm considering stuff like getting a standing desk or doing regular back/shoulder exercises. Would like to hear what everyone else's setups are.

9

Train skill of noticing tension and focus on it. Tends to dissolve. No that's not so satisfying but it works. Standing desk can help but it's just not that comfortable for most.

3

weight training?

3

I still have lots of neck and shoulder tension, but the only thing I've found that can reliably lessen it is doing some hard work on a punching bag for about 20 minutes every day, especially hard straights and jabs with full extension.

2

I've used Pain Science in the past as a resource and highly, highly endorse it. Here is an article they have on neck pain.

*(Quality: Low, only read when you have nothing better to do—also not much citing)*

30-minute high-LLM-temp stream-of-consciousness on "How do we make mechanistic interpretability work for non-transformers, or just any architectures?"

- We want a general way to reverse engineer circuits
- e.g., Should be able to rediscover properties we discovered from transformers

- Concrete Example: we spent a bunch of effort reverse engineering transformer-type architectures—then boom, suddenly some parallel-GPU-friendly-LSTM architecutre turns out to have better scaling properties

I am curious as to how often the asymptotic results proven using features of the problem that seem basically practically-irrelevant become relevant in practice.

Like, I understand that there are many asymptotic results (e.g., free energy principle in SLT) that are useful in practice, but i feel like there's something sus about similar results from information theory or complexity theory where the way in which they prove certain bounds (or inclusion relationship, for complexity theory) seem totally detached from practicality?

*joint source coding theorem*is of

4

P v NP: https://en.wikipedia.org/wiki/Generic-case_complexity

3

One result to mention in computational complexity is the PCP theorem which not only gives probabilistically checkable proofs but also gives approximation case hardness. Seems deep but I haven't understood the proof yet.

3

Great question. I don't have a satisfying answer. Perhaps a cynical answer is survival bias - we remember the asymptotic results that eventually become relevant (because people develop practical algorithms or a deeper theory is discovered) but don't remember the irrelevant ones.
Existence results are categorically easier to prove than explicit algorithms. Indeed, classical existence may hold (the former) while intuitioinistically (the latter) might not. We would expect non-explicit existence results to appear before explicit algorithms.
One minor remark on 'quantifying over all boolean algorithms'. Unease with quantification over large domains may be a vestige of set-theoretic thinking that imagines types as (platonic) boxes. But a term of a for-all quantifier is better thought of as an algorithm/ method to check the property for any given term (in this case a Boolean circuit). This doesn't sound divorced from practice to my ears.

2

Yes, it does, for several reasons:
1. It basically is necessary to prove P != NP to get a lot of other results to work, and for some of those results, proving P != NP is sufficient.
2. If P != NP (As most people suspect), it fundamentally rules out solving lots of problems generally and quickly without exploiting structure, and in particular lets me flip the burden of proof to the algorithm maker to explain why their solution to a problem like SAT is efficient, rather than me having to disprove the existence of an efficient algorithm.
It's either by exploiting structure, somehow having a proof that P=NP, or relying on new physics models that enable computing NP-complete problems efficiently, and the latter 2 need very, very strong evidence behind them.
This in particular applies to basically all learning problems in AI today.
1. It explains why certain problems cannot be reasonably solved optimally, without huge discoveries, and the best examples are travelling salesman problems for inability to optimally solve, as well as a whole lot of other NP-complete problems. There are also other NP problems where there isn't a way to solve them efficiently at all, especially if FPT != W[1] holds.
Also a note that we also expect a lot of NP-complete problems to also not be solvable by fast algorithms even in the average case, which basically means it's likely to be very relevant quite a lot of the time, so we don't have to limit ourselves to the worst case either.

I recently learned about metauni, and it looks amazing. TL;DR, a bunch of researchers give out lectures or seminars on Roblox - Topics include AI alignment/policy, Natural Abstractions, Topos Theory, Singular Learning Theory, etc.

I haven't actually participated in any of their live events yet and only watched their videos, but they all look really interesting. I'm somewhat surprised that there hasn't been much discussion about this on LW!

Discovering agents provide a genuine causal, interventionist account of agency and an algorithm to detect them, motivated by the intentional stance. I find this paper *very* enlightening from a conceptual perspective!

I've tried to think of problems that needed to be solved before we can actually implement this on real systems - both conceptual and practical - on approximate order of importance.

**There are no 'dynamics,' no learning.**As soon as a mechanism node is edited, it is assumed that agents immediately change their 'object decision variable' (a condition

Complaint with Pugh's real analysis textbook: He doesn't even define the limit of a function properly?!

It's implicitly defined together with the definition of continuity where , but in Chapter 3 when defining differentiability he implicitly switches the condition to without even mentioning it (nor the requirement that now needs to be an accumulation point!) While Pugh has its own benefits, coming from Terry Tao's analysis textbook backgrou...

6

Maybe you should email Pugh with the feedback? (I audited his honors analysis course in fall 2017; he seemed nice.)
As far as the frontier of analysis textbooks goes, I really like how Schröder Mathematical Analysis manages to be both rigorous and friendly: the early chapters patiently explain standard proof techniques (like the add-and-subtract triangle inequality gambit) to the novice who hasn't seen them before, but the punishing details of the subject are in no way simplified. (One wonders if the subtitle "A Concise Introduction" was intended ironically.)

I used to try out near-random search on ideaspace, where I made a quick app that spat out 3~5 random words from a dictionary of interesting words/concepts that I curated, and I spent 5 minutes every day thinking *very hard* on whether anything interesting came out of those combinations.

Of course I knew random search on exponential space was futile, but I got a couple cool invention ideas (most of which turned out to already exist), like:

- infinite indoor rockclimbing: attach rocks to a vertical treadmill, and now you have an infinite indoor rock climbing wall

Having lived ~19 years, I can distinctly remember around 5~6 times when I explicitly noticed myself experiencing totally new qualia with my inner monologue going “oh wow! I didn't know this dimension of qualia was a thing.” examples:

- hard-to-explain sense that my mind is expanding horizontally with fractal cube-like structures (think bismuth) forming around it and my subjective experience gliding along its surface which lasted for ~5 minutes after taking zolpidem for the first time to sleep (2 days ago)
- getting drunk for the first time (half a year ago)
- feeli

4

I observed new visual qualia of colors while using some light machine.
Also, when I first came to Italy, I have a feeling as if the whole rainbow of color qualia changed

3

Sunlight scattered by the atmosphere on cloudless mornings during the hour before sunrise inspires a subtle feeling ("this is cool, maybe even exciting") that I never noticed till I started intentionally exposing myself to it for health reasons (specifically, making it easier to fall asleep 18 hours later).
More precisely, I might or might not have noticed the feeling, but if I did notice it, I quickly forgot about it because I had no idea how to reproduce it.
I have to get away from artificial light (streetlamps) (and from direct (yellow) sunlight) for the (blue) indirect sunlight to have this effect. Also, it is no good looking at a small patch of sky, e.g., through a window in a building: most or all of the upper half of my field of vision must be receiving this indirect sunlight. (The intrinsically-photosensitive retinal ganglion cells are all over the bottom half of the retina, but absent from the top half.)

To me, the fact that the human brain basically implements SSL+RL is *very very* strong evidence that the current DL paradigm (with a bit of "engineering" effort, but nothing like fundamental breakthroughs) will kinda just keep scaling until we reach point-of-no-return. Does this broadly look correct to people here? Would really appreciate other perspectives.

9

I mostly think “algorithms that involve both SSL and RL” is a much broader space of possible algorithms than you seem to think it is, and thus that there are parts of this broad space that require “fundamental breakthroughs” to access. For example, both AlexNet and differentiable rendering can be used to analyze images via supervised learning with gradient descent. But those two algorithms are very very different from each other! So there’s more to an algorithm than its update rule.
See also 2nd section of this comment, although I was emphasizing alignment-relevant differences there whereas you’re talking about capabilities. Other things include the fact that if I ask you to solve a hard math problem, your brain will be different (different weights, not just different activations / context) when you’re halfway through compared to when you started working on it (a.k.a. online learning, see also here), and the fact that brain neural networks are not really “deep” in the DL sense. Among other things.

3

Makes sense. I think we're using the terms differently in scope. By "DL paradigm" I meant to encompass the kind of stuff you mentioned (RL-directing-SS-target (active learning), online learning, different architecture, etc) because they really seemed like "engineering challenges" to me (despite them covering a broad space of algorithms) in the sense that capabilities researchers already seem to be working on & scaling them without facing any apparent blockers to further progress, i.e. in need of any "fundamental breakthroughs"—by which I was pointing more at paradigm shifts away from DL like, idk, symbolic learning.

5

I have a slightly different takeaway. Yes techniques similar to current techniques will most likely lead to AGI but it's not literally 'just scaling LLMs'. The actual architecture of the brain is meaningfully different from what's being deployed right now. So different in one sense. On the other hand it's not like the brain does something completely different and proposals that are much closer to the brain architecture are in the literature (I won't name them here...). It's plausible that some variant on that will lead to true AGI. Pure hardware scaling obviously increases capabilities in a straightforward way but a transformer is not a generally intelligent agent and won't be even if scaled many more OOMs.
(I think Steven Byrnes has a similar view but I wouldn't want to misrepresent his views)

2

So far as I can tell, a transformer has three possible blockers (that would need to stand undefeated together): (1) in-context learning plateauing at a level where it's not able to do even a little bit of useful work without changing model weights, (2) terrible sample efficiency that asks for more data than is available on new or rare/situational topics, and (3) absence of a synthetic data generation process that's both sufficiently prolific and known not to be useless at that scale.
A need for online learning and terrible sample efficiency are defeated by OOMs if enough useful synthetic data can be generated, which the anemic in-context learning without changing weights might turn out to be sufficient for. This is the case of defeating (3), with others falling as a result.
Another possibility is that much larger multimodal transformers (there is a lot of video) might suffice without synthetic data if a model learns superintelligent in-context learning. SSL is not just about imitating humans, the problems it potentially becomes adept at solving are arbitrarily intricate. So even if it can't grow further and learn substantially new things within its current architecture/model, it might happen to already be far enough along at inference time to do the necessary redesign on its own. This is the case of defeating (1), leaving it to the model to defeat the others. And it should help with (3) even at non-superintelligent levels.
Failing that, RL demonstrates human level sample efficiency in increasingly non-toy settings, promising that saner amounts of useful synthetic data might suffice, defeating (2), though at this point it's substantially not-a-transformer.

4

generating useful synthetic data and solving novel tasks with little correlation with training data is the exact issue here. Seems straightforwardly true that a transformer arcthiecture doesn't do that?
I don't know what superintelligent in-context learning is - I'd be skeptical that scaling a transformer a further 3 OOMS will suddenly make it do tasks that are very far from the text distribution it is trained on, indeed solutions to tasks that are not even remotely in the internet text data like building a recursively self-improving agent (if such a thing is possible...)? Maybe I'm misunderstanding what you're claiming here.
Not saying it's impossible, just seems deeply implausible. ofc LLMs being so impressive was also a prior implausible but this seems another OOM of implausibility bits if that makes sense?

2

I'm imagining some prompts to generate reasoning, inferred claims about the world. You can't generate new observations about the world, but you can reason about the observations available so far, and having those inferred claims in the dataset likely helps, that's how humans build intuition about theory. If an average a 1000 inferred claims are generated for every naturally observed statement (or just those on rare/new/situational topics), that could close the gap of sample efficiency with humans. Might take the form of exercises or essays or something.
If this is all done with prompts, using a sufficiently smart order-following chatbot, then it's straightforwardly just a transformer, with some superficial scaffolding. If this can work, it'll eventually appear in distillation literature, though I'm not sure if serious effort to check was actually made with current SOTA LLMs, to pre-train exclusively on synthetic data that's not too simplistically prompted. Possibly you get nothing for a GPT-3 level generator, and then something for GPT-4+, because reasoning needs to be good enough to preserve contact with ground truth. From Altman's comments I get the impression that it's plausibly the exact thing OpenAI is hoping for.
In-context learning is capability to make use of novel data that's only seen in a context, not in pre-training, to do tasks that make use of this novel data, in ways that normally would've been expected to require it being seen in pre-training. In-context learning is a model capability, it's learned. So its properties are not capped by those of the hardcoded model training algorithm, notably in principle in-context learning could have higher sample efficiency (which might be crucial for generating a lot of synthetic data out of a few rare observations). Right now it's worse in most respects, but that could change with scale without substantially modifying the transformer architecture, which is the premise of this thread.
By superintelligent in-cont

I wonder if the following is possible to study textbooks more efficiently using LLMs:

- Feed the entire textbook to the LLM and produce a list of summaries that increases in granularity and length, covering all the material in the textbook just at a different depth (eg proofs omitted, further elaboration on high-level perspectives, etc)
- The student starts from the highest-level summary, and gradually moves to the more granular materials.

When I study textbooks, I spend a significant amount of time improving my mental autocompletion, like being able to familiari...

What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.

The MIRI Research Guide recommends An Introduction to Decision Theory and Game Theory: An Introduction. I have read neither and am simply relaying the recommendation.

i absolutely hate bureaucracy, dumb forms, stupid websites etc. like, I almost had a literal breakdown trying to install Minecraft recently (and eventually failed). God.

3

I think what's so crushing about it, is that it reminds me that the wrong people are designing things, and that they wont allow them to be fixed, and I can only find solace in thinking that the inefficiency of their designs is also a sign that they can be defeated.

God, I wish real analysis was at least half as elegant as any other math subject — way too much pathological examples that I can't care less about. I've heard some good things about constructivism though, hopefully analysis is done better there.

Yeah, real analysis sucks. But you have to go through it to get to delightful stuff— I particularly love harmonic and functional analysis. Real analysis is just a bunch of pathological cases and technical persnicketiness that you need to have to keep you from steering over a cliff when you get to the more advanced stuff. I’ve encountered some other subjects that have the same feeling to them. For example, measure-theoretic probability is a dry technical subject that you need to get through before you get the fun of stochastic differential equations. Same with commutative algebra and algebraic geometry, or point-set topology and differential geometry.

Constructivism, in my experience, makes real analysis more mind blowing, but also harder to reason about. My brain uses non-constructive methods subconsciously, so it’s hard for me to notice when I’ve transgressed the rules of constructivism.

4

As a general reflection on undergraduate mathematics imho there is way too much emphasis on real analysis. Yes, knowing how to be rigorous is important, being aware of pathological counterexample is importanting, and real analysis is used all over the place. But there is so much more to learn in mathematics than real analysis and the focus on minor technical issues here is often a distraction to developing a broad & deep mathematical background.
For most mathematicians (and scientists using serious math) real analysis is a only a small part of the toolkit. Understanding well the different kinds of limits can ofc be crucial in functional analysis, stochastic processes and various parts of physics. But there are so many topics that are important to know and learn here!
The reason it is so prominent in the undergraduate curriculum seems to be more tied to institutional inertia, its prominence on centralized exams, relation with calculus, etc

2

Really, what's going on is that in the general case, as mathematics is asked to be more and more general, you will start encountering pathological examples more, and paying attention to detail more is a valuable skill in both math and real life.
And while being technical about the pathological cases is kind of annoying, it's also one that actually matters in real life, as you aren't guaranteed to have an elegant solution to your problems.

1

Update: huh, nonstandard analysis is really cool. Not only are things much more intuitive (by using infinitesimals from hyperreals instead of using epsilon-delta formulation for everything), by the transfer principle all first order statements are equivalent between standard and nonstandard analysis!

There were various notions/frames of optimization floating around, and I tried my best to distill them:

- Eliezer's Measuring Optimization Power on unlikelihood of outcome + agent preference ordering
- Alex Flint's The ground of optimization on robustness of system-as-a-whole evolution
- Selection vs Control as distinguishing different types of "space of possibilities"
- Selection as having that space explicitly given & selectable numerous times by the agent
- Control as having that space only given in terms of counterfactuals, and the agent can access it only once.
- T

I find the intersection of computational mechanics, boundaries/frames/factored-sets, and some works from the causal incentives group - especially discovering agents and robust agents learn causal world model (review) - to be a very interesting theoretical direction.

By boundaries, I mean a sustaining/propagating system that informationally/causally insulates its 'viscera' from the 'environment,' and only allows relatively small amounts of deliberate information flow through certain channels in both directions. Living systems are an example of it (from bacte...

Does anyone know if Shannon arrive at entropy from the axiomatic definition first, or the operational definition first?

I've been thinking about these two distinct ways in which we seem to arrive at new mathematical concepts, and looking at the countless partial information decomposition measures in the literature all derived/motivated based on an axiomatic basis, and not knowing which intuition to prioritize over which, I've been assigning less premium on axiomatic conceptual definitions than i used to:

- decision theoretic justification of probability > C

3

I'm not sure what you mean by operational vs axiomatic definitions.
But Shannon was unaware of the usage of S=−Σi pi ln pi in statistical mechanics. Instead, he was inspired by Nyquist and Hartley's work, which introduced ad-hoc definitions of information in the case of constant probability distributions.
And in his seminal paper, "A mathematical theory of communication", he argued in the introduction for the logarithm as a measure of information because of practicality, intuition and mathematical convenience. Moreover, he explicitly derived the entropy of a distribution from three axioms:
1) that it be continuous wrt. the probabilities,
2) that it increase monotonically for larger systems w/ constant probability distributions,
3) and that it be a weighted sum the entropy of sub-systems.
See section 6 for more details.
I hope that answers your question.

*'Symmetry' implies 'redundant coordinate' implies 'cyclic coordinates in your Lagrangian / Hamiltonian' implies 'conservation of conjugate momentum'*

And because the action principle (where the true system trajectory extremizes your action, i.e. integral of Lagrangian) works in various dynamical systems, the above argument works in non-physical dynamical systems.

Thus conserved quantities usually exist in a given dynamical system.

mmm, but why does the action principle hold in such a wide variety of systems though? (like how you get entropy by postulating something to be maximized in an equilibrium setting)

Mildly surprised how some verbs/connectives barely play any role in conversations, even in technical ones. I just tried directed babbling with someone, and (I think?) I learned quite a lot about Israel-Pakistan relations with almost no stress coming from eg needing to make my sentences grammatically correct.

Example of (a small part of) my attempt to summarize my understanding of how Jews migrated in/out of Jerusalem over the course of history:

...They here *hand gesture on air*, enslaved out, they back, kicked out, and boom, they everywhere.

(audience nods, giv

1

Could you explain more what you mean by this?
My (completely amateur) understanding is that the "extra" semantic and syntactic structure of written and spoken language does two things.
One, it adds redundancy and reduces error. Simple example, gendered pronouns mean that when you hear "Have you seen Laurence? She didn't get much sleep last night." you have a chance to ask the speaker for clarification and catch if they had actually said "Laura" and you misheard.
Two, it can be used as a signal. The correct use of jargon is used by listeners or readers as a proxy for competence. Or many typos in your text will indicate to readers that you haven't put much effort into what you're saying.

Why haven't mosquitos evolved to be less itchy? Is there just not enough selection pressure posed by humans yet? (yes probably) Or *are* they evolving towards that direction? (they of course already evolved towards being less itchy while biting, but not enough to make that lack-of-itch permanent)

~~this is a request for help i've been trying and failing to catch this one for god knows how long plz halp~~

tbh would be somewhat content coexisting with them (at the level of houseflies) as long as they evolved the itch and high-pitch noise away, modulo disease risk considerations.

The reason mosquito bites itch is because they are injecting saliva into your skin. Saliva contains mosquito antigens, foreign particles that your body has evolved to attack with an inflammatory immune response that causes itching. The compound histamine is a key signaling molecule used by your body to drive this reaction.

In order for the mosquito to avoid provoking this reaction, they would either have to avoid leaving compounds inside of your body, or mutate those compounds so that they do not provoke an immune response. The human immune system is an adversarial opponent designed with an ability to recognize foreign particles generally. If it was tractable for organisms to reliably evolve to avoid provoking this response, that would represent a fundamental vulnerability in the human immune system.

Mosquitoe saliva *does* in fact contain anti-inflammatory, antihemostatic, and immunomodulatory compounds. So they're trying! But also this means that mosquitos are evolved *to* put saliva inside of you when they feed, which means they're inevitably going to expose the foreign particles they produce to your immune system.

There's also a facet of selection bias making mosquitos appear unsucces...

4

Because they have no reproductive advantage to being less itchy. You can kill them while they’re feeding, which is why they put lots of evolutionary effort into not being noticed. (They have an anesthetic in their saliva so you are unlikely to notice the bite.) By the time you develop the itchy bump, they’ve flown away and you can’t kill them.

4

There's still some pressure, though. If the bites were permanently not itchy, then I may have not noticed that the mosquitos were in my room in the first place, and consequently would less likely pursue them directly. I guess that's just not enough.

3

There’s also positive selection for itchiness. Mosquito spit contains dozens of carefully evolved proteins. We don’t know what they all are, but some of them are anticoagulants and anesthetics. Presumably they wouldn’t be there if they didn’t have a purpose. And your body, when it detects these foreign proteins, mounts a protective reaction, causing redness, swelling, and itching. IIRC, that reaction does a good job of killing any viruses that came in with the mosquito saliva. We’ve evolved to have that reaction. The itchiness is probably good for killing any bloodsuckers that don’t flee quickly. It certainly works against ticks.
Evolution is not our friend. It doesn’t give us what we want, just what we need.

3

I believe mosquitos do inject something to suppress your reaction to them, which is why you don't notice bug bites until long after the bug is gone. There's no reproductive advantage to the mosquito to extending that indefinitely.

2

Oh wow, that would make a ton of sense. Thanks Elizabeth!

1

I had something like locality in mind when writing this shortform, the context being: [I'm in my room -> I notice itch -> I realize there's a mosquito somewhere in my room -> I deliberately pursue and kill the mosquito that I wouldn't have known existed without the itch]
But, again, this probably wouldn't amount to much selection pressure, partially due to the fact that the vast majority of mosquito population exists in places where such locality doesn't hold i.e. in an open environment.

0

In NZ we have biting bugs called sandflies which don't do this - you can often tell the moment they get you.

2

The reason you find them itchy is because humans are selected to find them itchy most likely?

1

But the evolutionary timescale at which mosquitos can adapt to avoid detection must be faster than that of humans adapting to find mosquitos itchy! Or so I thought - my current boring guess is that (1) mechanisms for the human body to detect foreign particles are fairly "broad", (2) the required adaptation from the mosquitos to evade them are not-way-too-simple, and (3) we just haven't put enough selection pressure to make such change happen.

2

Yeah that would be my thinking as well.

*Just noticing that the negation of a statement exists is enough to make meaningful updates.*

e.g. I used to (implicitly) think "Chatbot Romance is weird" without having evaluated anything in-depth about the subject (and consequently didn't have any strong opinions about it)—probably as a result of some underlying cached belief.

But after seeing this post, *just reading the title* was enough to make me go (1) "Oh! I just realized it is *perfectly possible* to argue in favor of Chatbot Romance ... my belief on this subject must be a cached belief!" (2) hence ...

*People mean different things when they say "values" (object vs meta values)*

I noticed that people often mean different things when they say "values," and they end up talking past each other (or convergence only happens after a long discussion). One of the difference is in whether they contain meta-level values.

- Some people refer to the "object-level" preferences that we hold.
- Often people bring up the "beauty" of the human mind's capacity for its values to change, evolve, adopt, and grow—changing mind as it learns more about the world, being open to persuasio

*tl;dr, the unidimensional continuity of preference assumption in the **money pumping argument** used to justify the VNM axioms correspond to the assumption that there exists some unidimensional "resource" that the agent cares about, and this language is provided by the notion of "souring / sweetening" a lottery.*

Various coherence theorems - or more specifically, various money pumping arguments generally have the following form:

...If you violate this principle, then [you are rationally re

2

Thinking about for some time my feeling has been that resources are about fungibility implicitly embedded in a context of trade, multiple agents (very broadly construed. E.g. an agent in time can be thought of as multiple agents cooperating intertemporally perhaps).
A resource over time has the property that I can spend it now or I can spend it later. Glibly, one could say the operational meaning of the resource arises from the intertemporal bargaining of the agent.
----------------------------------------
Perhaps it's useful to distinguish several levels of resources and resource-like quantities.
Discrete vs continuous, tradeable / meaningful to different agents, ?? Fungibility, ?? Temporal and spatial locatedness, ?? Additivity?, submodularity ?
----------------------------------------
Addendum: another thing to consider is that the input of the vNM theorem is in some sense more complicated than the output. The output is just a utility function u: X -> R, while your input is a preference order on the very infinite set of lotteries (= probability distributions ) L(X).
Thinking operationally about a preference ordering on a space of distribution is a little wacky. It means you are willing to trade off uncertain options against one another. For this to be a meaningful choice would seem to necessitate some sort of (probabilistic) world model.

Damn, why did Pearl recommend readers (in the preface of his causality book) to read all the chapters other than chapter 2 (and the last review chapter)? Chapter 2 is literally the coolest part - inferring causal structure from purely observational data! Almost skipped that chapter because of it ...

6

it's true it's cool, but I suspect he's been a bit disheartened by how complicated it's been to get this to work in real-world settings.
in the book of why, he basically now says it's impossible to learn causality from data, which is a bit of a confusing message if you come from his previous books.
but now with language models, I think his hopes are up again, since models can basically piggy-back on causal relationships inferred by humans

2[anonymous]

You should also check out Timeless Causality, if you haven't done so already.

Bayes Net inference algorithms maintain its efficiency by using dynamic programming over multiple layers.

Level 0: Naive Marginalization

- No dynamic programming whatsoever. Just multiply all the conditional probability distribution (CPD) tables, and sum over the variables of non-interest.

Level 1: Variable Elimination

- Cache the repeated computations within a query.
- For example, given a chain-structured Bayes Net , instead of doing , we can do . Check my post for more.

Level 2: Clique-tree...

Man, deviation arguments are so cool:

- what are macrostates? Variables which are required to make your thermodynamics theory work! If they don't, add more macrostates!
- nonequilibrium? Define it as systems that don't admit a thermodynamic description!
- inductive biases? Define it as the amount of correction needed for a system to obey Bayesian updating, i.e. correction terms in the exponent of the Gibbs measure!
- coarse graining? Define the coarse-grained variables to keep the dynamics as close as possible to that of the micro-dynamics!
- or in a similar spirit - doe

One of the rare insightful lessons from high school: Don't set your AC to the minimum temperature even if it's really hot, just set it to where you want it to be.

It's not like the air released gets colder with lower target temperature, because most ACs (according to my teacher, I haven't checked lol) are just a simple control system that turns itself on/off around the target temperature, meaning the time it takes to reach a certain temperature X is independent of the target temperature (as long it's lower than X)

... which is *embarrassingly* obvious in hindsight.

1

Well is he is right about some ACs being simple on/off units.
But there also exists units than can change cycle speed, its basically the same thing except the motor driving the compression cycle can vary in speed.
In case you where wondering, they are called inverters. And when buying new today, you really should get an inverter (efficiency).

(Note: This was a post, but in retrospect was probably better to be posted as a shortform)

*(Epistemic Status: 20-minute worth of thinking, haven't done any **builder/breaker** on this yet although I plan to, and would welcome any attempts in the comment)*

- Have an algorithmic task whose input/output pair could (in reasonable algorithmic complexity) be generated using highly specific combination of modular components (e.g., basic arithmetic, combination of random NN module outputs, etc).
- Train a small transformer (or anything, really) on the input/output pairs.
- Take

Quick thoughts on my plans:

- I want to focus on having a better mechanistic picture of agent value formation & distinguishing between hypotheses (e.g., shard theory, Thane Ruthenis's value-compilation hypothesis, etc) and forming my own.
- I think I have a specific but very high uncertainty baseline model of what-to-expect from agent value-formation using greedy search optimization. It's probably time to allocate more resources on reducing that uncertainty by touching reality i.e. running experiments.
- (and also think about related theoretical arguments like

Useful perspective when thinking of mechanistic pictures of agent/value development is to take the "perspective" of different optimizers, consider their relative "power," and how they interact with each other.

E.g., early on SGD is the dominant optimizer, which has the property of (having direct access to feedback from U / greedy). Later on early proto-GPS (general-purpose search) forms, which is less greedy, but still can largely be swayed by SGD (such as having its problem-specification-input tweaked, having the overall GPS-implementation modified, etc). ...

1

This is a very useful approximation at the late-stage when the GPS self-modifies the agent in pursuit of its objective! Rather than having to meticulously think about local SGD gradient incentives and such, since GPS is non-greedy, we can directly model it as doing what's obviously rational from a birds-eye-perspective.
(kinda similar to e.g., separation of timescale when analyzing dynamical systems)

It seems like retrieval-based transformers like RETRO is "obviously" the way to go—(1) there's just no need to store all the factual information as fixed weights, (2) and it uses much less parameter/memory. Maybe mechanistic interpretability should start paying more attention to these type of architectures, especially since they're probably going to be a more relevant form of architecture.

They might also be easier to interpret thanks to specialization!

I've noticed during my alignment study that just the sheer amount of relevant posts out there is giving me a pretty bad habit of (1) passively engaging with the material and (2) not doing much independent thinking. Just keeping up to date & distilling the stuff in my todo read list takes up most of my time.

- I guess the reason I do it is because (at least for me) it takes a ton of mental effort to switch modes between "passive consumption" and "active thinking":
- I noticed then when self-studying math; like, my subjective experience is that I enjoy
*both*"p

- I noticed then when self-studying math; like, my subjective experience is that I enjoy

6

There are lots of posts but the actual content is very thing. I would say there is plausibly more content in your real analysis book than there is in the entire alignment field.

*Is there a case for AI **gain-of-function** research?*

(Epistemic Status: I don't endorse this yet, just thinking aloud. Please let me know if you want to act/research based on this idea)

It seems like it should be possible to materialize certain forms of AI alignment failure modes with today's deep learning algorithms, if we directly optimize for their discovery. For example, training a Gradient Hacker Enzyme.

A possible benefit of this would be that it gives us bits of evidence wrt how such hypothesized risks would actually manifest in real training environments...

*Random alignment-related idea: train and investigate a "Gradient Hacker Enzyme"*

TL;DR, Use meta-learning methods like MAML to train a network submodule i.e. circuit that would resist gradient updates in a wide variety of contexts (various architectures, hyperparameters, modality, etc), and use mechanistic interpretability to see how it works.

It should be possible to have a training setup for goals other than "resist gradient updates," such as restricting the meta-objective to a specific sub-sub-circuit. In that case, the outer circuit might (1) instrumental...

1

Update: I'm trying to upskill mechanistic interpretability, and training a Gradient Hacker Enzyme seems like a fairly good project just to get myself started.
I don't think this project would be highly valuable in and of itself (although I would definitely learn a lot!), so one failure mode I need to avoid is ending up investing too much of my time in this idea. I'll probably spend a total of ~1 week working on it.