Elliot Callender

Bounties (fractional funds distributed in good faith if you solve part of a problem):

1500$ for an algo which individuates a sufficient portion of activation space into semantically meaningful polytopes (or fuzzy loci) such we can detect steganography during training with minimal human oversight in polynomial (constant exponent across architectures) or faster time
750$ for strong handles on the sorts of downstream activation patterns by which we can cluster upstream polytopes, and additional 300$ for polynomial or faster clustering algo
Happy to fund solutions to other subproblems as well. Comment or dm.

Bounty: Detecting Steganography via Ontology Translation

23d

I am distributing bounties for contributions to this project. Thanks to Parv Mahajan and the Ga Tech AISI board for reviewing core ideas.

Abstract

I claim that LLM steganography requires representational divergences from legible models, which divergences I further claim are practicably locatable (though not necessarily differentiable from legitimate scratchpads) with minimal or no human interpretation. In particular, the morphism $F : M_{T} \to M_{S}$ between trusted and steganographic models, respectively, is always discontinuous under "semantic" topologies inferred from an appropriate test distribution. Algorithmically efficient generation of such topologies is the main challenge of this project.

1. Background

Two terminological notes. First, "semantic" identity is shorthand throughout for members of a symbolic structure (i.e., an ontology) that act as if... (read 952 more words →)

Replying toHow to game the METR plot

Elliot Callender2mo

How to game the METR plot

There's something completely different going on for tasks longer than 1 minute, clearly not explained by the log-linear fit.

Perhaps humans generating training data are, for longer tasks, taking cognitive steps which are opaque to these models, or at least relatively more difficult to learn?

I'd wager 1:1 that this sort of abstraction-domain mismatch between human training data and LLMs is causing more of the HCAST weirdness than skewed finetuning investment.

Replying toRecent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

Elliot Callender2mo

Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

Interesting!

What do we see if we apply interpretability tools to the filler tokens or repeats of the problem?

I would be especially interested in how this evolves through training, perhaps by training a more accessible model to do math / code classification with many filler tokens.

Overall, these results demonstrate a case where LLMs can do (very basic) meta-cognition without CoT.

Can you clarify what you mean by meta-cognition? I'm intuiting that these LLMs are using the extra embeddings afforded by appended tokens to do more parallel ops, which does not sound like meta-cognition to me.

Replying toCognition Augmentation Org

Elliot Callender2mo

Cognition Augmentation Org

I am aiming all of my resources at this, which for now looks externally like saving/investing personal capital, writing biological (molecular, NN) simulations, and searching for advice. Feel free to message me on Signal at (+1)-478-456-9667 if you want specific examples of my ideas; I expect that the entities I'm worried about accessing my research will do so after (if) it is legibly useful.

Replying toScientific breakthroughs of the year

Elliot Callender2mo

Scientific breakthroughs of the year

Awesome! I'm looking forward to reading many of these while traveling in the coming weeks.

Might I suggest, though, that you add to the importance score instead of multiplying? It doesn't make sense to multiply a non-log term by a logspace term.

Memory Consolidation

Elliot Callender

2mo

Recent advances in optogenetics and fluorescent protein markers have helped neuroscientists locate brain cells corresponding to individual memories (engrams). This post explains how such representations might physically and semantically shift.

Background

“Encoding” is the short-term enpatterning of neurons to store a memory. This seems to happen in the hippocampus, a small horn-shaped structure in the brain’s center.

Researchers have found that memories can be encoded in one neuron. These “engrams” were found using genetic edits which dyed cells activated during encoding. The preliminary model is that Hebbian associative learning and competition between prospect engrams is sufficient for their development; see here for visuals.

“Consolidation” occurs when the initial memory code is suffused to broader regions, e.g.... (read 593 more words →)

Cognition Augmentation Org

Elliot Callender

2mo

I'm looking to start an intelligence enhancement research focused on differentially accelerating AIS research ahead of capabilities.

But every legibly promising avenue looks like an exfohazard, and I have exactly 0 contacts who I think can help me navigate this. In fact, none of my friends or family are sane.

I currently make about 400k USD yearly and am happy to appropriately compensate you for your time.

Replying toEliezer's Unteachable Methods of Sanity

Elliot Callender2mo

Eliezer's Unteachable Methods of Sanity

And a fiat decision to stay sane, implemented by not instructing myself that any particular stupidity or failure will be my reaction to future stress.

I have not implemented the other two, but this decision I made during HPPD-like psychosis; yes, it is for some a learnable skill.

Replying toSo You Want To Make Marginal Progress...

Elliot Callender1y

So You Want To Make Marginal Progress...

How much would you say (3) supports (1) on your model? I'm still pretty new to AIS and am updating from your model.

I agree that marginal improvements are good for fields like medicine, and perhaps so too AIS. E.g. I can imagine self-other overlap scaling to near-ASI, though I'm doubtful about stability under reflection. I'll put 35% we find a semi-robust solution sufficient to not kill everyone.

Given my model, I think 20% generalizability is worth a person's time. Given yours, I'd say 1% is enough.

I think that the distribution of success probability of typical optimal-from-our-perspective solutions is very wide for both of the ways we describe generalizability; within that, we should weight generalizability heavier than my understanding of your model does.

Earlier:

Designing only best-worst-case subproblem solutions while waiting for Alice would be like restricting strategies in game to ones agnostic to the opponent's moves

Is this saying people should coordinate in case valuable solutions aren't in the apriori generalizable space?

Replying toSo You Want To Make Marginal Progress...

Elliot Callender1y

So You Want To Make Marginal Progress...

I strongly think cancer research has a huge space and can't think of anything more difficult within biology.

I was being careless / unreflective about the size of the cancer solution space, by splitting the solution spaces of alignment and cancer differently; nor do I know enough about cancer to make such claims. I split the space into immunotherapies, things which target epigenetics / stem cells, and "other", where in retrospect the latter probably has the optimal solution. This groups many small problems with possibly weakly-general solutions into a "bottleneck", as you mentioned:

aging may be a general factor to many diseases, but research into many of the things aging relates to is composed

Elliot Callender1y*

So You Want To Make Marginal Progress...

I think general solutions are especially important for fields with big solution spaces / few researchers, like alignment. ~~If you were optimizing for, say, curing cancer, it might be different (I think both the paradigm-and subproblem-spaces are smaller there).~~

From my reading of John Wentworth's Framing Practicum sequence, implicit in his (and my) model is that solution spaces for these sorts of problems are apriori enormous. We (you and I) might also disagree on what apriori feasibility would be "weakly" vs "strongly" generalizable; I think my transition is around 15-30%.

Replying toContrapositive Natural Abstraction - Project Intro

Elliot Callender2y

Contrapositive Natural Abstraction - Project Intro

Shoot, thanks. Hopefully it's clearer now.

Contrapositive Natural Abstraction - Project Intro

Elliot Callender

This was more of a research strategy than a specific project, and my foci have shifted substantially since this post.

Thanks to John Wentworth for pointers on an early draft.

TL;DR: I'm starting work on the Natural Abstraction Hypothesis from an overly-general formalization, and narrowing until it's true. This will start off purely information-theoretic, but I expect to add other maths eventually.

Motivation

My first idea upon learning about interpretability was to retarget the search. After reading a lot about deception auditing and Gabor filters, and starved of teloi, that dream began to die.

That was, until I found John Wentworth's works. We seem to have similar intuitions about a lot of things. I think this is... (read 451 more words →)

LESSWRONG
LW

LESSWRONG
LW

Bounty: Detecting Steganography via Ontology Translation

Contrapositive Natural Abstraction - Project Intro

Cognition Augmentation Org

Memory Consolidation

Elliot Callender

Bounty: Detecting Steganography via Ontology Translation

Memory Consolidation

Cognition Augmentation Org

Contrapositive Natural Abstraction - Project Intro

Elliot Callender

Bounty: Detecting Steganography via Ontology Translation

Contrapositive Natural Abstraction - Project Intro

Cognition Augmentation Org

Memory Consolidation

Elliot Callender

Bounty: Detecting Steganography via Ontology Translation

Memory Consolidation

Cognition Augmentation Org

Contrapositive Natural Abstraction - Project Intro

Abstract

1. Background

Motivation