This is a special post for short-form writing by Dalcy Bremin. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.
62 comments, sorted by Click to highlight new comments since: Today at 10:20 AM

Any advice on reducing neck and shoulder pain while studying? For me that's my biggest blocker to being able to focus longer (especially for math, where I have to look down at my notes/book for a long period of time). I'm considering stuff like getting a standing desk or doing regular back/shoulder exercises. Would like to hear what everyone else's setups are.

Train skill of noticing tension and focus on it. Tends to dissolve. No that's not so satisfying but it works. Standing desk can help but it's just not that comfortable for most.

weight training?

I still have lots of neck and shoulder tension, but the only thing I've found that can reliably lessen it is doing some hard work on a punching bag for about 20 minutes every day, especially hard straights and jabs with full extension.

I've used Pain Science in the past as a resource and highly, highly endorse it. Here is an article they have on neck pain.

moments of microscopic fun encountered while studying/researching:

  • Quantum mechanics call vector space & its dual bra/ket because ... bra-c-ket. What can I say? I like it - But where did the letter 'c' go, Dirac?
  • Defining cauchy sequences and limits in real analysis: it's really cool how you "bootstrap" the definition of Cauchy sequences / limit on real using the definition of Cauchy sequences / limit on rationals. basically:
    • (1) define Cauchy sequence on rationals
    • (2) use it to define limit (on rationals) using rational-Cauchy
    • (3) use it to define reals
    • (4) use it to define Cauchy sequence on reals
    • (5) show it's consistent with Cauchy sequence on rationals in both directions
      • a. rationals are embedded in reals hence the real-Cauchy definition subsumes rational-Cauchy definition
      • b. you can always find a rational number smaller than a given real number hence a sequence being rational-Cauchy means it is also real-Cauchy)
    • (6) define limit (on reals)
    • (7) show it's consistent with limit on rationals
    • (8) ... and that they're equivalent to real-Cauchy
    • (9) proceed to ignore the distinction b/w real-Cauchy/limit and their rational counterpart. Slick!

(will probably keep updating this in the replies)

(Quality: Low, only read when you have nothing better to do—also not much citing)

30-minute high-LLM-temp stream-of-consciousness on "How do we make mechanistic interpretability work for non-transformers, or just any architectures?"

  • We want a general way to reverse engineer circuits
    • e.g., Should be able to rediscover properties we discovered from transformers
  • Concrete Example: we spent a bunch of effort reverse engineering transformer-type architectures—then boom, suddenly some parallel-GPU-friendly-LSTM architecutre turns out to have better scaling properties, and everyone starts using it. LSTMs have different inductive biases, like things in the same layer being able to communicate multiple times with each other (unlike transformers), which incentivizes e.g., reusing components (more search-y?).
  • Formalize:
    • You have task X. You train a model A with inductive bias I_A. You also train a model B with inductive bias I_B. Your mechanistic interpretability techniques work well on deciphering A, but not B. You want your mechanistic interpretability techniques to work well for B, too.
  • Proposal: Communication channel
    • Train a Transformer on task X
      • Existing Mechanistic interpretability work does well on interpreting this architecture
    • Somehow stitch the LSTM to the transformer (?)
      • I'm trying to get at to the idea of "interface conversion," that by the virtue of SGD being greedy, it will try to convert the outputs of transformer-friendly types
    • Now you can better understand the intermediate outputs of the LSTM by just running mechanistic interpretability on the transformer layers whose input are from the LSTM
    • (I don't know if I'm making any sense here, my LLM temp is > 1)
  • Proposal: approximation via large models?
    • Train a larger transformer architecture to approximate the smaller LSTM model (either just input output pairs, or intermediate features, or intermediate features across multiple time-steps, etc):
      • the basic idea is that a smaller model would be more subject to following its natural gradient shaped by the inductive bias, while larger model (with direct access to the intermediate outputs of the smaller model) would be able to approximate it despite not having as much inductive bias incentive towards it.
        • probably false but illustrative example: Train small LSTM on chess. By the virtue of being able to run serial computation on same layers, it focuses on algorithms that have repeating modular parts. In contrast, a small Transformer would learn algorithms that don't have such repeating modular parts. But instead, train a large transformer to "approximate" the small LSTM—it should be able to do so by, e.g., inefficiently having identical modules across multiple layers. Now use mechanistic interpretability on that.
  • Proposal: redirect GPS?
    • Thane's value formation picture says GPS should be incentivized to reverse-engineer the heuristics because it has access to inter-heuristic communication channel. Maybe, in the middle of training, gradually swap different parts of the model with those that have different inductive biases, see GPS gradually learn to reverse-engineer those, and mechanistically-interpret how GPS exactly does that, and reimplement in human code?
  • Proposal: Interpretability techniques based on behavioral constraints
    • e.g., Discovering Latent Knowledge without Supervision, putting constraints?
  • How to do we "back out" inductive biases, just given e.g., architecture, training setup? What is the type signature?
    • (I need to read more literature)

I used to try out near-random search on ideaspace, where I made a quick app that spat out 3~5 random words from a dictionary of interesting words/concepts that I curated, and I spent 5 minutes every day thinking very hard on whether anything interesting came out of those combinations.

Of course I knew random search on exponential space was futile, but I got a couple cool invention ideas (most of which turned out to already exist), like:

  • infinite indoor rockclimbing: attach rocks to a vertical treadmill, and now you have an infinite indoor rock climbing wall (which is also safe from falling)! maybe add some fancy mechanism to add variations to the rocks + a VR headgear, I guess.
  • clever crypto mechanism design (in the spirit of CO2 Coin) to incentivize crowdsourcing of age-reduction molecule design animal trials from the public. (I know what you're thinking)

You can probably do this smarter now if you wanted, with eg better GPT models.

Having lived ~19 years, I can distinctly remember around 5~6 times when I explicitly noticed myself experiencing totally new qualia with my inner monologue going “oh wow! I didn't know this dimension of qualia was a thing.” examples:

  • hard-to-explain sense that my mind is expanding horizontally with fractal cube-like structures (think bismuth) forming around it and my subjective experience gliding along its surface which lasted for ~5 minutes after taking zolpidem for the first time to sleep (2 days ago)
  • getting drunk for the first time (half a year ago)
  • feeling absolutely euphoric after having a cool math insight (a year ago)
  • ...

Reminds me of myself around a decade ago, completely incapable of understanding why my uncle smoked, being "huh? The smoke isn't even sweet, why would you want to do that?" Now that I have [addiction-to-X] as a clear dimension of qualia/experience solidified in myself, I can better model their subjective experiences although I've never smoked myself. Reminds me of the SSC classic.

Also one observation is that it feels like the rate at which I acquire these is getting faster, probably because of increase in self-awareness + increased option space as I reach adulthood (like being able to drink).

Anyways, I think it’s really cool, and can’t wait for more.

I observed new visual qualia of colors while using some light machine.

Also, when I first came to Italy, I have a feeling as if the whole rainbow of color qualia changed

Sunlight scattered by the atmosphere on cloudless mornings during the hour before sunrise inspires a subtle feeling ("this is cool, maybe even exciting") that I never noticed till I started intentionally exposing myself to it for health reasons (specifically, making it easier to fall asleep 18 hours later).

More precisely, I might or might not have noticed the feeling, but if I did notice it, I quickly forgot about it because I had no idea how to reproduce it.

I have to get away from artificial light (streetlamps) (and from direct (yellow) sunlight) for the (blue) indirect sunlight to have this effect. Also, it is no good looking at a small patch of sky, e.g., through a window in a building: most or all of the upper half of my field of vision must be receiving this indirect sunlight. (The intrinsically-photosensitive retinal ganglion cells are all over the bottom half of the retina, but absent from the top half.)

To me, the fact that the human brain basically implements SSL+RL is very very strong evidence that the current DL paradigm (with a bit of "engineering" effort, but nothing like fundamental breakthroughs) will kinda just keep scaling until we reach point-of-no-return. Does this broadly look correct to people here? Would really appreciate other perspectives.

I mostly think “algorithms that involve both SSL and RL” is a much broader space of possible algorithms than you seem to think it is, and thus that there are parts of this broad space that require “fundamental breakthroughs” to access. For example, both AlexNet and differentiable rendering can be used to analyze images via supervised learning with gradient descent. But those two algorithms are very very different from each other! So there’s more to an algorithm than its update rule.

See also 2nd section of this comment, although I was emphasizing alignment-relevant differences there whereas you’re talking about capabilities. Other things include the fact that if I ask you to solve a hard math problem, your brain will be different (different weights, not just different activations / context) when you’re halfway through compared to when you started working on it (a.k.a. online learning, see also here), and the fact that brain neural networks are not really “deep” in the DL sense. Among other things.

Makes sense. I think we're using the terms differently in scope. By "DL paradigm" I meant to encompass the kind of stuff you mentioned (RL-directing-SS-target (active learning), online learning, different architecture, etc) because they really seemed like "engineering challenges" to me (despite them covering a broad space of algorithms) in the sense that capabilities researchers already seem to be working on & scaling them without facing any apparent blockers to further progress, i.e. in need of any "fundamental breakthroughs"—by which I was pointing more at paradigm shifts away from DL like, idk, symbolic learning.

I have a slightly different takeaway.  Yes techniques similar to current techniques will most likely lead to AGI but it's not literally 'just scaling LLMs'. The actual architecture of the brain is meaningfully different from what's being deployed right now. So different in one sense. On the other hand it's not like the brain does something completely different and proposals that are much closer to the brain architecture are in the literature (I won't name them here...). It's plausible that some variant on that will lead to true AGI. Pure hardware scaling obviously increases capabilities in a straightforward way but a transformer is not a generally intelligent agent and won't be even if scaled many more OOMs. 

(I think Steven Byrnes has a similar view but I wouldn't want to misrepresent his views)

a transformer is not a generally intelligent agent and won't be even if scaled many more OOMs

So far as I can tell, a transformer has three possible blockers (that would need to stand undefeated together): (1) in-context learning plateauing at a level where it's not able to do even a little bit of useful work without changing model weights, (2) terrible sample efficiency that asks for more data than is available on new or rare/situational topics, and (3) absence of a synthetic data generation process that's both sufficiently prolific and known not to be useless at that scale.

A need for online learning and terrible sample efficiency are defeated by OOMs if enough useful synthetic data can be generated, which the anemic in-context learning without changing weights might turn out to be sufficient for. This is the case of defeating (3), with others falling as a result.

Another possibility is that much larger multimodal transformers (there is a lot of video) might suffice without synthetic data if a model learns superintelligent in-context learning. SSL is not just about imitating humans, the problems it potentially becomes adept at solving are arbitrarily intricate. So even if it can't grow further and learn substantially new things within its current architecture/model, it might happen to already be far enough along at inference time to do the necessary redesign on its own. This is the case of defeating (1), leaving it to the model to defeat the others. And it should help with (3) even at non-superintelligent levels.

Failing that, RL demonstrates human level sample efficiency in increasingly non-toy settings, promising that saner amounts of useful synthetic data might suffice, defeating (2), though at this point it's substantially not-a-transformer.

generating useful synthetic data and solving novel tasks with little correlation with training data is the exact issue here. Seems straightforwardly true that a transformer arcthiecture doesn't do that?

I don't know what superintelligent in-context learning is - I'd be skeptical that scaling a transformer a further 3 OOMS will suddenly make it do tasks that are very far from the text distribution it is trained on, indeed solutions to tasks that are not even remotely in the internet text data like building a recursively self-improving agent (if such a thing is possible...)?  Maybe I'm misunderstanding what you're claiming here. 

Not saying it's impossible, just seems deeply implausible. ofc LLMs being so impressive was also a prior implausible but this seems another OOM of implausibility bits if that makes sense?

generating useful synthetic data and solving novel tasks with little correlation with training data is the exact issue here. Seems straightforwardly true that a transformer arcthiecture doesn't do that?

I'm imagining some prompts to generate reasoning, inferred claims about the world. You can't generate new observations about the world, but you can reason about the observations available so far, and having those inferred claims in the dataset likely helps, that's how humans build intuition about theory. If an average a 1000 inferred claims are generated for every naturally observed statement (or just those on rare/new/situational topics), that could close the gap of sample efficiency with humans. Might take the form of exercises or essays or something.

If this is all done with prompts, using a sufficiently smart order-following chatbot, then it's straightforwardly just a transformer, with some superficial scaffolding. If this can work, it'll eventually appear in distillation literature, though I'm not sure if serious effort to check was actually made with current SOTA LLMs, to pre-train exclusively on synthetic data that's not too simplistically prompted. Possibly you get nothing for a GPT-3 level generator, and then something for GPT-4+, because reasoning needs to be good enough to preserve contact with ground truth. From Altman's comments I get the impression that it's plausibly the exact thing OpenAI is hoping for.

I don't know what superintelligent in-context learning is

In-context learning is capability to make use of novel data that's only seen in a context, not in pre-training, to do tasks that make use of this novel data, in ways that normally would've been expected to require it being seen in pre-training. In-context learning is a model capability, it's learned. So its properties are not capped by those of the hardcoded model training algorithm, notably in principle in-context learning could have higher sample efficiency (which might be crucial for generating a lot of synthetic data out of a few rare observations). Right now it's worse in most respects, but that could change with scale without substantially modifying the transformer architecture, which is the premise of this thread.

By superintelligent in-context learning I mean the capabilities of in-context learning significantly exceeding those of humans. Things like fully comprehending a new paper without changing any model weights, becoming able to immediately write the next one in the same context window. I agree that it's not very plausible, and probably can't happen without sufficiently deep circuits, which even deep networks don't seem to normally develop. But it's not really ruled out by anything that's been tried so far. Recent stuff on essentially pre-training with some frozen weights without losing resulting performance suggests a trend of increasing feasible model size for given compute. So I'm not sure this can't be done in a few years. Then there's things like memory transformers, handing a lot more data than a context to a learned learning capability.

Complaint with Pugh's real analysis textbook: He doesn't even define the limit of a function properly?!

It's implicitly defined together with the definition of continuity where , but in Chapter 3 when defining differentiability he implicitly switches the condition to  without even mentioning it (nor the requirement that  now needs to be an accumulation point!) While Pugh has its own benefits, coming from Terry Tao's analysis textbook background, this is absurd!

(though to be fair Terry Tao has the exact same issue in Book 2, where his definition of function continuity via limit in metric space precedes that of defining limit in general ... the only redeeming factor is that it's defined rigorously in Book 1, in the limited context of )

*sigh* I guess we're still pretty far from reaching the Pareto Frontier of textbook quality, at least in real analysis.

... Speaking of Pareto Frontiers, would anyone say there is such a textbook that is close to that frontier, at least in a different subject? Would love to read one of those.

Maybe you should email Pugh with the feedback? (I audited his honors analysis course in fall 2017; he seemed nice.)

As far as the frontier of analysis textbooks goes, I really like how Schröder Mathematical Analysis manages to be both rigorous and friendly: the early chapters patiently explain standard proof techniques (like the add-and-subtract triangle inequality gambit) to the novice who hasn't seen them before, but the punishing details of the subject are in no way simplified. (One wonders if the subtitle "A Concise Introduction" was intended ironically.)

I wonder if the following is possible to study textbooks more efficiently using LLMs:

  • Feed the entire textbook to the LLM and produce a list of summaries that increases in granularity and length, covering all the material in the textbook just at a different depth (eg proofs omitted, further elaboration on high-level perspectives, etc)
  • The student starts from the highest-level summary, and gradually moves to the more granular materials.

When I study textbooks, I spend a significant amount of time improving my mental autocompletion, like being able to familiarize myself with the terminologies, which words or proof-style usually come in which context, etc. Doing this seems to significantly improve my ability to read eg long proofs, since I can ignore all the pesky details (which I can trust my mental autocompletion to later fill in the details if needed) and allocate my effort in getting a high-level view of the proof.

Textbooks don't really admit this style of learning, because the students don't have prior knowledge of all the concept-dependencies of a new subject they're learning, and thus are forced to start at the lowest-level and make their way up to the high-level perspective.

Perhaps LLMs will let us reverse this direction, instead going from the highest to the lowest.

What's a good technical introduction to Decision Theory and Game Theory for alignment researchers? I'm guessing standard undergrad textbooks don't include, say, content about logical decision theory. I've mostly been reading posts on LW but as with most stuff here they feel more like self-contained blog posts (rather than textbooks that build on top of a common context) so I was wondering if there was anything like a canonical resource providing a unified technical / math-y perspective on the whole subject.

The MIRI Research Guide recommends An Introduction to Decision Theory and Game Theory: An Introduction. I have read neither and am simply relaying the recommendation.

i absolutely hate bureaucracy, dumb forms, stupid websites etc. like, I almost had a literal breakdown trying to install Minecraft recently (and eventually failed). God.

I think what's so crushing about it, is that it reminds me that the wrong people are designing things, and that they wont allow them to be fixed, and I can only find solace in thinking that the inefficiency of their designs is also a sign that they can be defeated.

God, I wish real analysis was at least half as elegant as any other math subject — way too much pathological examples that I can't care less about. I've heard some good things about constructivism though, hopefully analysis is done better there.

Yeah, real analysis sucks.  But you have to go through it to get to delightful stuff— I particularly love harmonic and functional analysis.  Real analysis is just a bunch of pathological cases and technical persnicketiness that you need to have to keep you from steering over a cliff when you get to the more advanced stuff.  I’ve encountered some other subjects that have the same feeling to them.  For example, measure-theoretic probability is a dry technical subject that you need to get through before you get the fun of stochastic differential equations.  Same with commutative algebra and algebraic geometry, or point-set topology and differential geometry.

Constructivism, in my experience, makes real analysis more mind blowing, but also harder to reason about.  My brain uses non-constructive methods subconsciously, so it’s hard for me to notice when I’ve transgressed the rules of constructivism.

As a general reflection on undergraduate mathematics imho there is way too much emphasis on real analysis. Yes, knowing how to be rigorous is important, being aware of pathological counterexample is importanting, and real analysis is used all over the place. But there is so much more to learn in mathematics than real analysis and the focus on minor technical issues here is often a distraction to developing a broad & deep mathematical background. 

For most mathematicians (and scientists using serious math) real analysis is a only a small part of the toolkit. Understanding well the different kinds of limits can ofc be crucial in functional analysis, stochastic processes and various parts of physics. But there are so many topics that are important to know and learn here!

The reason it is so prominent in the undergraduate curriculum seems to be more tied to institutional inertia, its prominence on centralized exams, relation with calculus, etc

Update: huh, nonstandard analysis is really cool. Not only are things much more intuitive (by using infinitesimals from hyperreals instead of using epsilon-delta formulation for everything), by the transfer principle all first order statements are equivalent between standard and nonstandard analysis!

There were various notions/frames of optimization floating around, and I tried my best to distill them:

  • Eliezer's Measuring Optimization Power on unlikelihood of outcome + agent preference ordering
  • Alex Flint's The ground of optimization on robustness of system-as-a-whole evolution
  • Selection vs Control as distinguishing different types of "space of possibilities"
    • Selection as having that space explicitly given & selectable numerous times by the agent
    • Control as having that space only given in terms of counterfactuals, and the agent can access it only once.
    • These distinctions correlate with the type of algorithm being used & its internal structure, where Selection uses more search-like process using maps, while Control may just use explicit formula ... although it may very well use internal maps to Select on counterfactual outcomes!
      • In other words, the Selection vs Control may very well be viewed as a different cluster of Analysis. Example:
        • If we decide to focus our Analysis of "space of possibilities" on eg "Real life outcome," then a guided missile is always Control.
        • But if we decide to focus on "space of internal representation of possibilities," then a guided missle that uses internal map to search on becomes Selection.
  • "Internal Optimization" vs "External Optimization"
    • Similar to Selection vs Control, but the analysis focuses more on internal structure:
      • Why? Motivated by the fact that, as with the guided missile example, Control systems can be viewed as Selection systems depending on perspective
      • ... hence, better to focus on internal structures where it's much less ambiguous.
    • IO: Internal search + selection
    • EO: Flint's definition of "optimizing system"
      • IO is included in EO, if we assume accurate map-to-environment correspondence.
    • To me, this doesn't really get at what the internals of actually-control-like systems look like, which presumably a subset of EO - IO.
  • Search-in-Territory vs Search-in-Map
    • Greater emphasis on internal structure—specifically, "maps."
    • Maps are capital investment, allowing you to be able to optimize despite not knowing what to exactly optimize for (by compressing info)

I have several thoughts on these framings, but one trouble is the excessive usage of words to represent "clusters" i.e. terms to group a bunch of correlated variables. Selection vs Control, for example, doesn't have a clear definition/criteria but rather points at a number of correlated things, like internal structure, search, maps, control-like things, etc.

Sure, deconfusing and pointing out clusters is useful because clusters imply correlations and correlations perhaps imply hidden structure + relationships—but I think the costs from cluster-representing-words doing hidden inference is much greater than the benefits, and it would be better to explicitly lay out the features-of-clusters that the one is referring to instead of just using the name of the cluster.

This is similar to the trouble I had with "wrapper-minds," which is yet another example of a cluster pointing at a bunch of correlated variables, and people using the same term to mean different things.

Anyways, I still feel totally confused about optimization—and while these clusters/frames are useful, I think thinking in terms of them would ensue even more confusion within myself. It's probably better to take the useful individual parts within the cluster and start deconfusing from the ground-up using those as the building blocks.

I recently learned about metauni, and it looks amazing. TL;DR, a bunch of researchers give out lectures or seminars on Roblox - Topics include AI alignment/policy, Natural Abstractions, Topos Theory, Singular Learning Theory, etc.

I haven't actually participated in any of their live events yet and only watched their videos, but they all look really interesting. I'm somewhat surprised that there hasn't been much discussion about this on LW!

Mildly surprised how some verbs/connectives barely play any role in conversations, even in technical ones. I just tried directed babbling with someone, and (I think?) I learned quite a lot about Israel-Pakistan relations with almost no stress coming from eg needing to make my sentences grammatically correct.

Example of (a small part of) my attempt to summarize my understanding of how Jews migrated in/out of Jerusalem over the course of history:

They here *hand gesture on air*, enslaved out, they back, kicked out, and boom, they everywhere.

(audience nods, given common knowledge re: gestures, meaning of "they," etc)

Could you explain more what you mean by this?

My (completely amateur) understanding is that the "extra" semantic and syntactic structure of written and spoken language does two things. 

One, it adds redundancy and reduces error. Simple example, gendered pronouns mean that when you hear "Have you seen Laurence? She didn't get much sleep last night." you have a chance to ask the speaker for clarification and catch if they had actually said "Laura" and you misheard.

Two, it can be used as a signal. The correct use of jargon is used by listeners or readers as a proxy for competence. Or many typos in your text will indicate to readers that you haven't put much effort into what you're saying.

Why haven't mosquitos evolved to be less itchy? Is there just not enough selection pressure posed by humans yet? (yes probably) Or are they evolving towards that direction? (they of course already evolved towards being less itchy while biting, but not enough to make that lack-of-itch permanent)

this is a request for help i've been trying and failing to catch this one for god knows how long plz halp

tbh would be somewhat content coexisting with them (at the level of houseflies) as long as they evolved the itch and high-pitch noise away, modulo disease risk considerations.

The reason mosquito bites itch is because they are injecting saliva into your skin. Saliva contains mosquito antigens, foreign particles that your body has evolved to attack with an inflammatory immune response that causes itching. The compound histamine is a key signaling molecule used by your body to drive this reaction.

In order for the mosquito to avoid provoking this reaction, they would either have to avoid leaving compounds inside of your body, or mutate those compounds so that they do not provoke an immune response. The human immune system is an adversarial opponent designed with an ability to recognize foreign particles generally. If it was tractable for organisms to reliably evolve to avoid provoking this response, that would represent a fundamental vulnerability in the human immune system.

Mosquitoe saliva does in fact contain anti-inflammatory, antihemostatic, and immunomodulatory compounds. So they're trying! But also this means that mosquitos are evolved to put saliva inside of you when they feed, which means they're inevitably going to expose the foreign particles they produce to your immune system.

There's also a facet of selection bias making mosquitos appear unsuccessful at making their bites less itchy. If a mosquito did evolve to not provoke (as much of) an immune response and therefore less itching, redness and swelling, you probably wouldn't notice they'd bitten you. People often perceive that some are prone to getting bitten, others aren't. It may be that some of this is that some people don't have as serious an immune response to mosquito bites, so they think they get bitten less often.

I'm sure there are several PhDs worth of research questions to investigate here - I'm a biomedical engineer with a good basic understanding of the immune system, but I don't study mosquitos.

Because they have no reproductive advantage to being less itchy.  You can kill them while they’re feeding, which is why they put lots of evolutionary effort into not being noticed.  (They have an anesthetic in their saliva so you are unlikely to notice the bite.)  By the time you develop the itchy bump, they’ve flown away and you can’t kill them.

There's still some pressure, though. If the bites were permanently not itchy, then I may have not noticed that the mosquitos were in my room in the first place, and consequently would less likely pursue them directly. I guess that's just not enough.

There’s also positive selection for itchiness.  Mosquito spit contains dozens of carefully evolved proteins.  We don’t know what they all are, but some of them are anticoagulants and anesthetics.  Presumably they wouldn’t be there if they didn’t have a purpose.  And your body, when it detects these foreign proteins, mounts a protective reaction, causing redness, swelling, and itching.  IIRC, that reaction does a good job of killing any viruses that came in with the mosquito saliva.  We’ve evolved to have that reaction.  The itchiness is probably good for killing any bloodsuckers that don’t flee quickly.  It certainly works against ticks.

Evolution is not our friend.  It doesn’t give us what we want, just what we need.

I believe mosquitos do inject something to suppress your reaction to them, which is why you don't notice bug bites until long after the bug is gone. There's no reproductive advantage to the mosquito to extending that indefinitely. 

Oh wow, that would make a ton of sense. Thanks Elizabeth!

I had something like locality in mind when writing this shortform, the context being: [I'm in my room -> I notice itch -> I realize there's a mosquito somewhere in my room -> I deliberately pursue and kill the mosquito that I wouldn't have known existed without the itch]

But, again, this probably wouldn't amount to much selection pressure, partially due to the fact that the vast majority of mosquito population exists in places where such locality doesn't hold i.e. in an open environment.

In NZ we have biting bugs called sandflies which don't do this - you can often tell the moment they get you.

The reason you find them itchy is because humans are selected to find them itchy most likely?

But the evolutionary timescale at which mosquitos can adapt to avoid detection must be faster than that of humans adapting to find mosquitos itchy! Or so I thought - my current boring guess is that (1) mechanisms for the human body to detect foreign particles are fairly "broad", (2) the required adaptation from the mosquitos to evade them are not-way-too-simple, and (3) we just haven't put enough selection pressure to make such change happen.

Yeah that would be my thinking as well. 

People mean different things when they say "values" (object vs meta values)

I noticed that people often mean different things when they say "values," and they end up talking past each other (or convergence only happens after a long discussion). One of the difference is in whether they contain meta-level values.

  • Some people refer to the "object-level" preferences that we hold.
    • Often people bring up the "beauty" of the human mind's capacity for its values to change, evolve, adopt, and grow—changing mind as it learns more about the world, being open to persuasion via rational argumentation, changing moral theories, etc.
  • Some people include the meta-values (that are defined on top of other values, and the evolution of such values).
    • e.g., My "values" include my meta-values, like wanting to be persuaded by good arguments, wanting to change my moral theories when I get to know better, even "not wanting my values to be fixed"
      • example of this view: carado's post on you want what you want, and one of Vanessa Cosoy's shortform/comment (can't remember the link)

'Symmetry' implies 'redundant coordinate' implies 'cyclic coordinates in your Lagrangian / Hamiltonian' implies 'conservation of conjugate momentum'

And because the action principle (where the true system trajectory extremizes your action, i.e. integral of Lagrangian) works in various dynamical systems, the above argument works in non-physical dynamical systems.

Thus conserved quantities usually exist in a given dynamical system.

mmm, but why does the action principle hold in such a wide variety of systems though? (like how you get entropy by postulating something to be maximized in an equilibrium setting)

Man, deviation arguments are so cool:

  • what are macrostates? Variables which are required to make your thermodynamics theory work! If they don't, add more macrostates!
  • nonequilibrium? Define it as systems that don't admit a thermodynamic description!
  • inductive biases? Define it as the amount of correction needed for a system to obey Bayesian updating, i.e. correction terms in the exponent of the Gibbs measure!
  • coarse graining? Define the coarse-grained variables to keep the dynamics as close as possible to that of the micro-dynamics!
  • or in a similar spirit - does your biological system deviate from expected utility theory? Well, there's discovery (and money) to be made!

It's easy to get confused and think the circularity is a problem ("how can you define thermodynamics in terms of equilibriums, when equilibriums are defined using thermodynamics?"), but it's all about carving nature at the right joints—and a sign that you made the right carving is that the amount of corrections needed to be applied aren't too numerous, and they all seem "natural" (and of course, all of this while letting you make nontrivial predictions. that's what matters at the end of the day).

Then, it's often the case that those corrections also turn out to be meaningful and natural quantities of interest.

One of the rare insightful lessons from high school: Don't set your AC to the minimum temperature even if it's really hot, just set it to where you want it to be.

It's not like the air released gets colder with lower target temperature, because most ACs (according to my teacher, I haven't checked lol) are just a simple control system that turns itself on/off around the target temperature, meaning the time it takes to reach a certain temperature X is independent of the target temperature (as long it's lower than X)

... which is embarrassingly obvious in hindsight.

Well is he is right about some ACs being simple on/off units.

But there also exists units than can change cycle speed, its basically the same thing except the motor driving the compression cycle can vary in speed. 

In case you where wondering, they are called inverters. And when buying new today, you really should get an inverter (efficiency).

Just noticing that the negation of a statement exists is enough to make meaningful updates.

e.g. I used to (implicitly) think "Chatbot Romance is weird" without having evaluated anything in-depth about the subject (and consequently didn't have any strong opinions about it)—probably as a result of some underlying cached belief. 

But after seeing this post, just reading the title was enough to make me go (1) "Oh! I just realized it is perfectly possible to argue in favor of Chatbot Romance ... my belief on this subject must be a cached belief!" (2) hence is probably by-default biased towards something like the consensus opinion, and (3) so I should update away from my current direction, even without reading the post.

(Note: This was a post, but in retrospect was probably better to be posted as a shortform)

(Epistemic Status: 20-minute worth of thinking, haven't done any builder/breaker on this yet although I plan to, and would welcome any attempts in the comment)

  1. Have an algorithmic task whose input/output pair could (in reasonable algorithmic complexity) be generated using highly specific combination of modular components (e.g., basic arithmetic, combination of random NN module outputs, etc).
  2. Train a small transformer (or anything, really) on the input/output pairs.
  3. Take a large transformer that takes the activation/weights, and outputs a computational graph.
  4. Train that large transformer over the small transformer, across a diverse set of such algorithmic tasks (probably automatically generated) with varying complexity. Now you have a general tool that takes in a set of high-dimensional matrices and backs-out a simple computational graph, great! Let's call it Inspector.
  5. Apply the Inspector in real models and see if it recovers anything we might expect (like induction heads).
  6. To go a step further, apply the Inspector to itself. Maybe we might back-out a human implementable general solution for mechanistic interpretability! (Or, at least let us build a better intuition towards the solution.)

(This probably won't work, or at least isn't as simple as described above. Again, welcome any builder/breaker attempts!)

Quick thoughts on my plans:

  1. I want to focus on having a better mechanistic picture of agent value formation & distinguishing between hypotheses (e.g., shard theory, Thane Ruthenis's value-compilation hypothesis, etc) and forming my own.
  2. I think I have a specific but very high uncertainty baseline model of what-to-expect from agent value-formation using greedy search optimization. It's probably time to allocate more resources on reducing that uncertainty by touching reality i.e. running experiments.
    1. (and also think about related theoretical arguments like Selection Theorem)
  3. So I'll probably allocate my research time:
    1. Studying math (more linear algebra / dynamical systems / causal inference / statistical mechanics)
    2. Sketching a better picture of agent development, assigning confidence, proposing high-bit experiments (that might have the side-effect of distinguishing between different conflicting pictures), formalization, etc.
      1. and read relevant literature (eg ones on theoretic DL and inductive biases)
    3. Upskilling mechanistic interpretability to actually start running quick experiments
    4. Unguided research brainstorming (e.g., going through various alignment exercises, having a writeup of random related ideas, etc)
  4. Possibly participate in programs like MATS? Probably the biggest benefit to me would be (1) commitment mechanism / additional motivation and (2) high-value conversations with other researchers.

Dunno, sounds pretty reasonable!

Useful perspective when thinking of mechanistic pictures of agent/value development is to take the "perspective" of different optimizers, consider their relative "power," and how they interact with each other.

E.g., early on SGD is the dominant optimizer, which has the property of (having direct access to feedback from U / greedy). Later on early proto-GPS (general-purpose search) forms, which is less greedy, but still can largely be swayed by SGD (such as having its problem-specification-input tweaked, having the overall GPS-implementation modified, etc). Much later, GPS becomes the dominant optimizing force "at run-time" which shortens the relevant time-scale and we can ignore the SGD's effect. This effect becomes much more pronounced after reflectivity + gradient hacking when the GPS's optimization target becomes fixed.

(very much inspired by reading Thane Ruthenis's value formation post)

This is a very useful approximation at the late-stage when the GPS self-modifies the agent in pursuit of its objective! Rather than having to meticulously think about local SGD gradient incentives and such, since GPS is non-greedy, we can directly model it as doing what's obviously rational from a birds-eye-perspective.

(kinda similar to e.g., separation of timescale when analyzing dynamical systems)

It seems like retrieval-based transformers like RETRO is "obviously" the way to go—(1) there's just no need to store all the factual information as fixed weights, (2) and it uses much less parameter/memory. Maybe mechanistic interpretability should start paying more attention to these type of architectures, especially since they're probably going to be a more relevant form of architecture.

They might also be easier to interpret thanks to specialization!

I've noticed during my alignment study that just the sheer amount of relevant posts out there is giving me a pretty bad habit of (1) passively engaging with the material and (2) not doing much independent thinking. Just keeping up to date & distilling the stuff in my todo read list takes up most of my time.

  • I guess the reason I do it is because (at least for me) it takes a ton of mental effort to switch modes between "passive consumption" and "active thinking":
    • I noticed then when self-studying math; like, my subjective experience is that I enjoy both "passively listening lectures+taking notes" and "solving practice problems," the problem is that it takes a ton of mental energy to switch between the two equilibriums.
    • (This is actually still a problem—too much wide & passive consumption rather than actively practicing them and solving problems.)
  • Also relevant is wanting to just progress/upskill as fast and wide of a subject as I can, sacrificing mastery for diversity. This probably makes sense to some degree (especially in the sense that having more frames is good), but I think I'm taking this wayyyy too far.
  • My  for opening new links far exceeds 1. This definitely helped me when I was trying to get a rapid overview of the entire field, but now it's just a bad adaptation + akrasia.

Okay, then, don't do that! Some directions to move towards:

  • Independent brainstorming/investigation sessions to form concrete inside views
  • Commitment mechanisms, like making regular posts or shortforms (eg)

There are lots of posts but the actual content is very thing. I would say there is plausibly more content in your real analysis book than there is in the entire alignment field. 

Is there a case for AI gain-of-function research?

(Epistemic Status: I don't endorse this yet, just thinking aloud. Please let me know if you want to act/research based on this idea)

It seems like it should be possible to materialize certain forms of AI alignment failure modes with today's deep learning algorithms, if we directly optimize for their discovery. For example, training a Gradient Hacker Enzyme.

A possible benefit of this would be that it gives us bits of evidence wrt how such hypothesized risks would actually manifest in real training environments. While the similarities would be limited because the training setups would be optimizing for their discovery, it should at least serve as a good lower bound for the scenarios in which these risks could manifest.

Perhaps having a concrete bound for when dangerous capabilities appear (eg a X parameter model trained in Y modality has Z chance of forming a gradient hacker) would make it easier for policy folks to push for regulations.

Is AI gain-of-function equally dangerous as biotech gain-of-function? Some arguments in favor (of the former being dangerous):

  • The malicious actor argument is probably stronger for AI gain-of-function.
    • if someone publicly releases a Gradient Hacker Enzyme, this lowers the resource that would be needed for a malicious actor to develop a misaligned AI (eg plug in the misaligned Enzyme at an otherwise benign low-capability training run).
  • Risky researcher incentive is equally strong.
    • e.g., a research lab carelessly pursuing gain-of-function research, deliberately starting risky training runs for financial/academic incentives, etc.

Some arguments against:

  • Accident risks from financial incentives are probably weaker for AI gain-of-function.
    • The standard gain-of-function risk scenario is: research lab engineers a dangerous pathogen, it accidentally leaks, and a pandemic happens.
    • I don't see how these events would happen "accidentally" when dealing with AI programs; e.g., the researcher would have to deliberately cut parts of the network weights and replace it with the enzyme, which is certainly intentional.

Random alignment-related idea: train and investigate a "Gradient Hacker Enzyme"

TL;DR, Use meta-learning methods like MAML to train a network submodule i.e. circuit that would resist gradient updates in a wide variety of contexts (various architectures, hyperparameters, modality, etc), and use mechanistic interpretability to see how it works.

It should be possible to have a training setup for goals other than "resist gradient updates," such as restricting the meta-objective to a specific sub-sub-circuit. In that case, the outer circuit might (1) instrumentally resist updates, or (2) somehow get modified while keeping its original behavioral objective intact.

This setup doesn't have to be restricted to circuits of course; there was a previous work which did this on the level of activations, although iiuc the model found a trivial solution by exploiting relu—it would be interesting to extend this to more diverse setup.

Anyways, varying this "sub-sub-circuit/activation-to-be-preserved" over different meta-learning episodes would incentivize the training process to find "general" Gradient Hacker designs that aren't specific to a particular circuit/activation—a potential precursor for various forms of advanced Gradient Hackers (and some loose analogies to how enzymes accelerate reactions).

What is the Theory of Impact for training a "Gradient Hacker Enzyme"?

(note: while I think these are valid, they're generated post-hoc and don't reflect the actual process for me coming up with this idea)

  • Estimating the lower-bound for the emergence Gradient Hackers.
    • By varying the meta-learning setups we can get an empirical estimate for the conditions in which Gradient Hackers are possible.
      • Perhaps gradient hackers are actually trivial to construct using tricks we haven't thought of before (like the relu example before). Maybe not! Perhaps they require [high-model-complexity/certain-modality/reflective-agent/etc].
    • Why lower-bound? In a real training environment, gradient hackers appear because of (presumably) convergent training incentives. Instead in the meta-learning setup, we're directly optimizing for gradient hackers.
  • Mechanistically understanding how Gradient Hackers work.
    • Applying mechanistic interpretability here might not be too difficult, because the circuit is cleanly separated from the rest of the model.
    • There has been several speculations on how such circuits might emerge. Testing them empirically sounds like a good idea!

This is just a random idea and I'm probably not going to work on it; but if you're interested, let me know. While I don't think this is capabilities-relevant, this probably falls under AI gain-of-function research and should be done with caution.

Update: I'm trying to upskill mechanistic interpretability, and training a Gradient Hacker Enzyme seems like a fairly good project just to get myself started.

I don't think this project would be highly valuable in and of itself (although I would definitely learn a lot!), so one failure mode I need to avoid is ending up investing too much of my time in this idea. I'll probably spend a total of ~1 week working on it.

algebraic geometry in the infinite dimensions (algebraic geometric ... functional analysis?!) surely sounds like a challenge, damn.