LESSWRONG
LW

504
johnswentworth
58734Ω690537034920
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
From Atoms To Agents
"Why Not Just..."
Basic Foundations for Agent Models
Framing Practicum
Gears Which Turn The World
Abstraction 2020
Gears of Aging
Model Comparison
11johnswentworth's Shortform
Ω
6y
Ω
684
johnswentworth's Shortform
johnswentworth21h840

About a month ago, after some back-and-forth with several people about their experiences (including on lesswrong), I hypothesized that I don't feel the emotions signalled by oxytocin, and never have. (I do feel some adjacent things, like empathy and a sense of responsibility for others, but I don't get the feeling of loving connection which usually comes alongside those.)

Naturally I set out to test that hypothesis. This note is an in-progress overview of what I've found so far and how I'm thinking about it, written largely to collect my thoughts and to see if anyone catches something I've missed.

Under the hypothesis, this has been a life-long thing for me, so the obvious guess is that it's genetic (the vast majority of other biological state turns over too often to last throughout life). I also don't have a slew of mysterious life-long illnesses, so the obvious guess is that's it's pretty narrowly limited to oxytocin - i.e. most likely a genetic variant in either the oxytocin gene or receptor, maybe the regulatory machinery around those two but that's less likely as we get further away and the machinery becomes entangled with more other things.

So I got my genome sequenced, and went looking at the oxytocin gene and the oxytocin receptor gene.

The receptor was the first one I checked, and sure enough I have a single-nucleotide deletion 42 amino acids in to the open reading frame (ORF) of the 389 amino acid protein. That will induce a frameshift error, completely fucking up the rest of protein. (The oxytocin gene, on the other hand, was totally normal.)

So that sure is damn strong evidence in favor of the hypothesis! But, we have two copies of most genes, including the oxytocin receptor. The frameshift error is only on one copy. Why isn't the other copy enough for almost-normal oxytocin signalling?

The frameshift error is the only thing I have which would obviously completely fuck up the whole protein, but there are also a couple nonsynonymous single nucleotide polymorphisms (SNPs) in the ORF, plus another couple upstream. So it's plausible that one of the SNPs messes up the other copy pretty badly; in particular, one of them changes an arginine to a histidine at the edge of the second intracellular loop. (Oxytocin receptor is a pretty standard g-protein coupled receptor, so that's the mental picture here.) I did drop the sequences into alphafold, and I don't see any large structural variation from the SNPs, but (a) that histidine substitution would most likely change binding rather than structure in isolation, and (b) this is exactly the sort of case where I don't trust alphafold much, because "this is one substitution away from a standard sequence, I'll just output the structure of that standard sequence" is exactly the sort of heuristic I'd expect a net to over-rely upon.

It's also possible-in-principle that the second receptor copy is fine, but the first copy frameshift alone is enough to mess up function. I think that's unlikely in this case. The mRNA for the frameshifted version should be removed pretty quickly by nonsense-mediated decay (I did double check that it has a bunch of early stop codons, NMD should definitely trigger). So there should not be a bunch of junk protein floating around from the frameshifted gene. And the frameshift is early enough that the messed up proteins probably won't e.g. form dimers with structurally-non-messed-up versions (even if oxytocin receptor normally dimerizes, which I'd guess it doesn't but haven't checked). At worst there should just be a 2x lower concentration of normal receptor than usual, and if there's any stable feedback control on the receptor concentration then there'd be hardly any effect at all.

Finally, there's the alternative hypothesis that my oxytocin signalling is unusually weak but not entirely nonfunctional. I do now have pretty damn strong evidence for that at a bare minimum, assuming that feedback control on receptor density doesn't basically counterbalance the fucked up receptor copy.

Anyway, that's where I'm currently at. I'm curious to hear others' thoughts on what mechanisms I might be missing here!

Reply1
johnswentworth's Shortform
johnswentworth2d270

Just got my whole genome sequenced. A thing which I could have figured out in advance but only realized once the results came back: if getting a whole genome sequence, it's high value to also get your parents' genomes sequenced.

Here's why.

Suppose I have two unusual variants at two different positions (not very close together) within the same gene. So, there's a variant at location A, and a variant at location B. But (typically) I have two copies of each gene, one from each parent. So, I might have the A and B variants both on the same copy, and the other copy could be normal. OR, I could have the A variant on one copy and the B variant on the other copy. And because modern sequencing usually works by breaking DNA into little chunks, sequencing the chunks, and then computationally stitching it together... those two possibilities can't be distinguished IIUC.

The difference is hugely important if e.g. both the A variant and the B variant severely fuck up the gene. If both are on the same copy, I'd have one normal working variant and one fucked up. If they're on different copies, then I'd have zero normal working variants, which will typically have much more extreme physiological results.

The easiest way to distinguish those two possibilities, IIUC, is to get the parents' genomes. In one case, I'd see the A and B variant in the same parent, and the other parent would have a normal gene. In the other case, I'd see the A variant in one parent and the B variant in the other.

In principle there are other ways to distinguish the two possibilities (like long-read sequencing), but getting the parents' sequence is probably the cheapest/easiest.

Reply
Richard Ngo's Shortform
johnswentworth2d73

I totally buy that peoples' verbal models aren't at a local nadir of connectedness-to-reality. The thing which seems increasingly disconnected from reality is more like metis, peoples' day-to-day behavior and intuitive knowledge, institutional knowledge and skills, personal identity and goals, that sort of thing.

I'm notably not thinking here primarily about examples like e.g. heritability of IQ becoming politicized; that's a verbal model, and I do think that verbal models have mostly become more reasonable modulo a few exceptions which people highlight.

Reply1
Richard Ngo's Shortform
johnswentworth3d95

Why don't we live in a world where wealth can buy a society defenses against such egregores?

I would point to the non-experts can't distinguish true from fake experts problem. That does seem to be a central phenomenon which most parasitic egregores exploit. More generally, as wealth becomes more abundant (and therefore lots of constraints become more slack), inability to get grounded feedback becomes a more taut constraint.

That said... do you remember any particular evidence or argument which led you toward the story at top of thread (as opposed to away from your previous understanding)?

Reply
Richard Ngo's Shortform
johnswentworth4d2717

Without commenting on the specifics, I agree with a lot of the gestalt here as a description of how things evolved historically, but I think that's not really the right lens to understand the problem.

My current best one-sentence understanding: the richer humans get, the more social reality can diverge from physical reality, and therefore the more resources can be captured by parasitic egregores/memes/institutions/ideologies/interest-groups/etc. Physical reality provides constraints and feedback which limit the propagation of such parasites, but wealth makes the constraints less binding and therefore makes the feedback weaker.

Reply11
johnswentworth's Shortform
johnswentworth4dΩ680

Proof

Specifically, we'll show that there exists an information throughput maximizing distribution which satisfies the undirected graph. We will not show that all optimal distributions satisfy the undirected graph, because that's false in some trivial cases - e.g. if all the Y's are completely independent of X, then all distributions are optimal. We will also not show that all optimal distributions factor over the undirected graph, which is importantly different because of the P[X]>0 caveat in the Hammersley-Clifford theorem.

First, we'll prove the (already known) fact that an independent distribution P[X]=P[X1]P[X2] is optimal for a pair of independent channels (X1→Y1,X2→Y2); we'll prove it in a way which will play well with the proof of our more general theorem. Using standard information identities plus the factorization structure Y1−X1−X2−Y2 (that's a Markov chain, not subtraction), we get

MI(X;Y)=MI(X;Y1)+MI(X;Y2|Y1)

=MI(X;Y1)+(MI(X;Y2)−MI(Y2;Y1)+MI(Y2;Y1|X))

=MI(X1;Y1)+MI(X2;Y2)−MI(Y2;Y1)

Now, suppose you hand me some supposedly-optimal distribution P[X]. From P, I construct a new distribution Q[X]:=P[X1]P[X2]. Note that MI(X1;Y1) and MI(X2;Y2) are both the same under Q as under P, while MI(Y2;Y1) is zero under Q. So, because MI(X;Y)=MI(X1;Y1)+MI(X2;Y2)−MI(Y2;Y1), the MI(X;Y) must be at least as large under Q as under P. In short: given any distribution, I can construct another distribution with as least as high information throughput, under which X1 and X2 are independent.

Now let's tackle our more general theorem, reusing some of the machinery above.

I'll split Y into Y1 and Y2, and split X into X1−2 (parents of Y1 but not Y2), X2−1 (parents of Y2 but not Y1), and X1∩2 (parents of both). Then

MI(X;Y)=MI(X1∩2;Y)+MI(X1−2,X2−1;Y|X1∩2)

In analogy to the case above, we consider distribution P[X], and construct a new distribution Q[X]:=P[X1∩2]P[X1−2|X1∩2]P[X2−1|X1∩2]. Compared to P, Q has the same value of MI(X1∩2;Y), and by exactly the same argument as the independent case MI(X1−2,X2−1;Y|X1∩2) cannot be any higher under Q; we just repeat the same argument with everything conditional on X1∩2 throughout. So, given any distribution, I can construct another distribution with at least as high information throughput, under which X1−2 and X2−1 are independent given X1∩2.

Since this works for any Markov blanket X1∩2, there exists an information thoughput maximizing distribution which satisfies the desired undirected graph.

Reply1
The Case Against AI Control Research
johnswentworth4d336

It's been about 8 months since this post and Buck's comment above.

At the time, I didn't bother replying to Buck's comment because it didn't really say much. My post basically said "this control agenda doesn't seem to address anything important", illustrated with a slew of specific examples of ways-things-go-wrong which seem-to-me to account for far more probability mass than scheming in early transformative AGI. Buck's response was basically "yeah, those are real limitations, but IDK man scheming seems intuitively important?". There's a substantive argument to be had here about why I expect scheming of early transformative AGI to be either unlikely or easy to fix (relative to other problems), whereas Buck expects the opposite, and we haven't had that debate.

Anyway, I'm leaving this comment now because I think some people saw that I had a critique, that Buck had responded, and that I hadn't responded back, and therefore assumed that the ball was in my court and I wasn't engaging. That's not my understanding of what's going on here; I think Buck has basically not argued back substantively against the core of the critique, but also I haven't argued strongly against his core crux either. We've identified a crux, and that's the state of things. (And to be clear that's fine, not every debate is worth having.)

Reply1
johnswentworth's Shortform
johnswentworth5d*Ω360

Does The Information-Throughput-Maximizing Input Distribution To A Sparsely-Connected Channel Satisfy An Undirected Graphical Model?

[EDIT: Never mind, proved it.]

Suppose I have an information channel X→Y. The X components X1,...,Xm and the Y components Y1,...,Yn are sparsely connected, i.e. the typical Yi is downstream of only a few parent X-components Xpa(i). (Mathematically, that means the channel factors as P[Y|X]=∏iP[Yi|Xpa(i)].)

Now, suppose I split the Y components into two sets, and hold constant any X-components which are upstream of components in both sets. Conditional on those (relatively few) X-components, our channel splits into two independent channels.

E.g. in the image above, if I hold X4 constant, then I have two independent channels: (X1,X2,X3)→(Y1,Y2,Y3,Y4) and (X5,X6,X7)→(Y5,Y6,Y7,Y8).

Now, the information-throughput-maximizing input distribution to a pair of independent channels is just the product of the throughput maximizing distributions for the two channels individually. In other words: for independent channels, we have independent throughput maximizing distribution.

So it seems like a natural guess that something similar would happen in our sparse setup.

Conjecture: The throughput-maximizing distribution for our sparse setup is independent conditional on overlapping X-components. E.g. in the example above, we'd guess that P[X]=P[X4]P[X1,X2,X3|X4]P[X5,X6,X7|X4] for the throughput maximizing distribution.

If that's true in general, then we can apply it to any Markov blanket in our sparse channel setup, so it implies that P[X] factors over any set of X components which is a Markov blanket splitting the original channel graph. In other words: it would imply that the throughput-maximizing distribution satisfies an undirected graphical model, in which two X-components share an edge if-and-only-if they share a child Y-component.

It's not obvious that this works mathematically; information throughput maximization (i.e. the optimization problem by which one computes channel capacity) involves some annoying coupling between terms. But it makes sense intuitively. I've spent less than an hour trying to prove it and mostly found it mildly annoying though not clearly intractable. Seems like the sort of thing where either (a) someone has already proved it, or (b) someone more intimately familiar with channel capacity problems than I am could easily prove it.

So: anybody know of an existing proof (or know that the conjecture is false), or find this conjecture easy to prove themselves?

Reply
Transportation as a Constraint
johnswentworth5d20

Tacking itself was one of the most important technological developments of transportation. IIUC there were occasional examples early on but it didn't really catch on as a common method until the Renaissance. After that, IIUC it was one of the major drivers of Europe's dominance of the seas.

Reply
Natural Latents: Latent Variables Stable Across Ontologies
johnswentworth6d30

Yup, that is correct.

If the the theorems approximately hold under approximate agreement on observables. 

Yeah, there is still the issue that the theorems aren't always robust to approximation on the Agreement on Observables condition, though the Solomonoff version is and there's probably other ways to sidestep the issue.

Reply
Load More
110Natural Latents: Latent Variables Stable Across Ontologies
Ω
10d
Ω
13
48When Both People Are Interested, How Often Is Flirtatious Escalation Mutual?
Q
11d
Q
14
66Do-Divergence: A Bound for Maxwell's Demon
Ω
18d
Ω
4
178Before LLM Psychosis, There Was Yes-Man Psychosis
19d
20
120(∃ Stochastic Natural Latent) Implies (∃ Deterministic Natural Latent)
Ω
22d
Ω
8
68Resampling Conserves Redundancy (Approximately)
Ω
23d
Ω
2
90Generalized Coming Out Of The Closet
1mo
48
35A Self-Dialogue on The Value Proposition of Romantic Relationships
1mo
71
79Follow-up to "My Empathy Is Rarely Kind"
1mo
40
71My Empathy Is Rarely Kind
1mo
217
Load More