Thoughtdump on why I'm interested in computational mechanics:
Just read through Robust agents learn causal world models and man it is really cool! It proves a couple of bona fide selection theorems, talking about the internal structure of agents selected against a certain criteria.
Quick paper review of Measuring Goal-Directedness from the causal incentives group.
tl;dr, goal directedness of a policy wrt a utility function is measured by its min distance to one of the policies implied by the utility function, as per the intentional stance - that one should model a system as an agent insofar as doing so is useful.
intuitively, this is measuring: "how close is my policy to being 'deterministic,' while 'optimizing at the competence level ...
EDIT: I no longer think this setup is viable, for reasons that connect to why I think Critch's operationalization is incomplete and why boundaries should ultimately be grounded in Pearlian Causality and interventions. Check update.
I believe there's nothing much in the way of actually implementing an approximation of Critch's boundaries[1] using deep learning.
Recall, Critch's boundaries are:
Perhaps I should one day in the far far future write a sequence on bayes nets.
Some low-effort TOC (this is basically mostly koller & friedman):
Any thoughts on how to customize LessWrong to make it LessAddictive? I just really, really like the editor for various reasons, so I usually write a bunch (drafts, research notes, study notes, etc) using it but it's quite easy to get distracted.
moments of microscopic fun encountered while studying/researching:
Any advice on reducing neck and shoulder pain while studying? For me that's my biggest blocker to being able to focus longer (especially for math, where I have to look down at my notes/book for a long period of time). I'm considering stuff like getting a standing desk or doing regular back/shoulder exercises. Would like to hear what everyone else's setups are.
(Quality: Low, only read when you have nothing better to do—also not much citing)
30-minute high-LLM-temp stream-of-consciousness on "How do we make mechanistic interpretability work for non-transformers, or just any architectures?"
I am curious as to how often the asymptotic results proven using features of the problem that seem basically practically-irrelevant become relevant in practice.
Like, I understand that there are many asymptotic results (e.g., free energy principle in SLT) that are useful in practice, but i feel like there's something sus about similar results from information theory or complexity theory where the way in which they prove certain bounds (or inclusion relationship, for complexity theory) seem totally detached from practicality?
I recently learned about metauni, and it looks amazing. TL;DR, a bunch of researchers give out lectures or seminars on Roblox - Topics include AI alignment/policy, Natural Abstractions, Topos Theory, Singular Learning Theory, etc.
I haven't actually participated in any of their live events yet and only watched their videos, but they all look really interesting. I'm somewhat surprised that there hasn't been much discussion about this on LW!
Discovering agents provide a genuine causal, interventionist account of agency and an algorithm to detect them, motivated by the intentional stance. I find this paper very enlightening from a conceptual perspective!
I've tried to think of problems that needed to be solved before we can actually implement this on real systems - both conceptual and practical - on approximate order of importance.
Complaint with Pugh's real analysis textbook: He doesn't even define the limit of a function properly?!
It's implicitly defined together with the definition of continuity where , but in Chapter 3 when defining differentiability he implicitly switches the condition to without even mentioning it (nor the requirement that now needs to be an accumulation point!) While Pugh has its own benefits, coming from Terry Tao's analysis textbook backgrou...
I used to try out near-random search on ideaspace, where I made a quick app that spat out 3~5 random words from a dictionary of interesting words/concepts that I curated, and I spent 5 minutes every day thinking very hard on whether anything interesting came out of those combinations.
Of course I knew random search on exponential space was futile, but I got a couple cool invention ideas (most of which turned out to already exist), like:
Having lived ~19 years, I can distinctly remember around 5~6 times when I explicitly noticed myself experiencing totally new qualia with my inner monologue going “oh wow! I didn't know this dimension of qualia was a thing.” examples:
To me, the fact that the human brain basically implements SSL+RL is very very strong evidence that the current DL paradigm (with a bit of "engineering" effort, but nothing like fundamental breakthroughs) will kinda just keep scaling until we reach point-of-no-return. Does this broadly look correct to people here? Would really appreciate other perspectives.
I wonder if the following is possible to study textbooks more efficiently using LLMs:
When I study textbooks, I spend a significant amount of time improving my mental autocompletion, like being able to familiari...
What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.
The MIRI Research Guide recommends An Introduction to Decision Theory and Game Theory: An Introduction. I have read neither and am simply relaying the recommendation.
i absolutely hate bureaucracy, dumb forms, stupid websites etc. like, I almost had a literal breakdown trying to install Minecraft recently (and eventually failed). God.
God, I wish real analysis was at least half as elegant as any other math subject — way too much pathological examples that I can't care less about. I've heard some good things about constructivism though, hopefully analysis is done better there.
Yeah, real analysis sucks. But you have to go through it to get to delightful stuff— I particularly love harmonic and functional analysis. Real analysis is just a bunch of pathological cases and technical persnicketiness that you need to have to keep you from steering over a cliff when you get to the more advanced stuff. I’ve encountered some other subjects that have the same feeling to them. For example, measure-theoretic probability is a dry technical subject that you need to get through before you get the fun of stochastic differential equations. Same with commutative algebra and algebraic geometry, or point-set topology and differential geometry.
Constructivism, in my experience, makes real analysis more mind blowing, but also harder to reason about. My brain uses non-constructive methods subconsciously, so it’s hard for me to notice when I’ve transgressed the rules of constructivism.
There were various notions/frames of optimization floating around, and I tried my best to distill them:
I find the intersection of computational mechanics, boundaries/frames/factored-sets, and some works from the causal incentives group - especially discovering agents and robust agents learn causal world model (review) - to be a very interesting theoretical direction.
By boundaries, I mean a sustaining/propagating system that informationally/causally insulates its 'viscera' from the 'environment,' and only allows relatively small amounts of deliberate information flow through certain channels in both directions. Living systems are an example of it (from bacte...
Does anyone know if Shannon arrive at entropy from the axiomatic definition first, or the operational definition first?
I've been thinking about these two distinct ways in which we seem to arrive at new mathematical concepts, and looking at the countless partial information decomposition measures in the literature all derived/motivated based on an axiomatic basis, and not knowing which intuition to prioritize over which, I've been assigning less premium on axiomatic conceptual definitions than i used to:
'Symmetry' implies 'redundant coordinate' implies 'cyclic coordinates in your Lagrangian / Hamiltonian' implies 'conservation of conjugate momentum'
And because the action principle (where the true system trajectory extremizes your action, i.e. integral of Lagrangian) works in various dynamical systems, the above argument works in non-physical dynamical systems.
Thus conserved quantities usually exist in a given dynamical system.
mmm, but why does the action principle hold in such a wide variety of systems though? (like how you get entropy by postulating something to be maximized in an equilibrium setting)
Mildly surprised how some verbs/connectives barely play any role in conversations, even in technical ones. I just tried directed babbling with someone, and (I think?) I learned quite a lot about Israel-Pakistan relations with almost no stress coming from eg needing to make my sentences grammatically correct.
Example of (a small part of) my attempt to summarize my understanding of how Jews migrated in/out of Jerusalem over the course of history:
...They here *hand gesture on air*, enslaved out, they back, kicked out, and boom, they everywhere.
(audience nods, giv
Why haven't mosquitos evolved to be less itchy? Is there just not enough selection pressure posed by humans yet? (yes probably) Or are they evolving towards that direction? (they of course already evolved towards being less itchy while biting, but not enough to make that lack-of-itch permanent)
this is a request for help i've been trying and failing to catch this one for god knows how long plz halp
tbh would be somewhat content coexisting with them (at the level of houseflies) as long as they evolved the itch and high-pitch noise away, modulo disease risk considerations.
The reason mosquito bites itch is because they are injecting saliva into your skin. Saliva contains mosquito antigens, foreign particles that your body has evolved to attack with an inflammatory immune response that causes itching. The compound histamine is a key signaling molecule used by your body to drive this reaction.
In order for the mosquito to avoid provoking this reaction, they would either have to avoid leaving compounds inside of your body, or mutate those compounds so that they do not provoke an immune response. The human immune system is an adversarial opponent designed with an ability to recognize foreign particles generally. If it was tractable for organisms to reliably evolve to avoid provoking this response, that would represent a fundamental vulnerability in the human immune system.
Mosquitoe saliva does in fact contain anti-inflammatory, antihemostatic, and immunomodulatory compounds. So they're trying! But also this means that mosquitos are evolved to put saliva inside of you when they feed, which means they're inevitably going to expose the foreign particles they produce to your immune system.
There's also a facet of selection bias making mosquitos appear unsucces...
Just noticing that the negation of a statement exists is enough to make meaningful updates.
e.g. I used to (implicitly) think "Chatbot Romance is weird" without having evaluated anything in-depth about the subject (and consequently didn't have any strong opinions about it)—probably as a result of some underlying cached belief.
But after seeing this post, just reading the title was enough to make me go (1) "Oh! I just realized it is perfectly possible to argue in favor of Chatbot Romance ... my belief on this subject must be a cached belief!" (2) hence ...
People mean different things when they say "values" (object vs meta values)
I noticed that people often mean different things when they say "values," and they end up talking past each other (or convergence only happens after a long discussion). One of the difference is in whether they contain meta-level values.
tl;dr, the unidimensional continuity of preference assumption in the money pumping argument used to justify the VNM axioms correspond to the assumption that there exists some unidimensional "resource" that the agent cares about, and this language is provided by the notion of "souring / sweetening" a lottery.
Various coherence theorems - or more specifically, various money pumping arguments generally have the following form:
...If you violate this principle, then [you are rationally re
Damn, why did Pearl recommend readers (in the preface of his causality book) to read all the chapters other than chapter 2 (and the last review chapter)? Chapter 2 is literally the coolest part - inferring causal structure from purely observational data! Almost skipped that chapter because of it ...
Bayes Net inference algorithms maintain its efficiency by using dynamic programming over multiple layers.
Level 0: Naive Marginalization
Level 1: Variable Elimination
Level 2: Clique-tree...
Man, deviation arguments are so cool:
One of the rare insightful lessons from high school: Don't set your AC to the minimum temperature even if it's really hot, just set it to where you want it to be.
It's not like the air released gets colder with lower target temperature, because most ACs (according to my teacher, I haven't checked lol) are just a simple control system that turns itself on/off around the target temperature, meaning the time it takes to reach a certain temperature X is independent of the target temperature (as long it's lower than X)
... which is embarrassingly obvious in hindsight.
(Note: This was a post, but in retrospect was probably better to be posted as a shortform)
(Epistemic Status: 20-minute worth of thinking, haven't done any builder/breaker on this yet although I plan to, and would welcome any attempts in the comment)
Quick thoughts on my plans:
Useful perspective when thinking of mechanistic pictures of agent/value development is to take the "perspective" of different optimizers, consider their relative "power," and how they interact with each other.
E.g., early on SGD is the dominant optimizer, which has the property of (having direct access to feedback from U / greedy). Later on early proto-GPS (general-purpose search) forms, which is less greedy, but still can largely be swayed by SGD (such as having its problem-specification-input tweaked, having the overall GPS-implementation modified, etc). ...
It seems like retrieval-based transformers like RETRO is "obviously" the way to go—(1) there's just no need to store all the factual information as fixed weights, (2) and it uses much less parameter/memory. Maybe mechanistic interpretability should start paying more attention to these type of architectures, especially since they're probably going to be a more relevant form of architecture.
They might also be easier to interpret thanks to specialization!
I've noticed during my alignment study that just the sheer amount of relevant posts out there is giving me a pretty bad habit of (1) passively engaging with the material and (2) not doing much independent thinking. Just keeping up to date & distilling the stuff in my todo read list takes up most of my time.
Is there a case for AI gain-of-function research?
(Epistemic Status: I don't endorse this yet, just thinking aloud. Please let me know if you want to act/research based on this idea)
It seems like it should be possible to materialize certain forms of AI alignment failure modes with today's deep learning algorithms, if we directly optimize for their discovery. For example, training a Gradient Hacker Enzyme.
A possible benefit of this would be that it gives us bits of evidence wrt how such hypothesized risks would actually manifest in real training environments...
Random alignment-related idea: train and investigate a "Gradient Hacker Enzyme"
TL;DR, Use meta-learning methods like MAML to train a network submodule i.e. circuit that would resist gradient updates in a wide variety of contexts (various architectures, hyperparameters, modality, etc), and use mechanistic interpretability to see how it works.
It should be possible to have a training setup for goals other than "resist gradient updates," such as restricting the meta-objective to a specific sub-sub-circuit. In that case, the outer circuit might (1) instrumental...