tailcalled — LessWrong

I'm not sure I understand your question. By AI companies "making copying hard enough", I assume you mean making AIs not leak secrets from their prompt/training (or other conditioning). It seems true to me that this will raise the relevance of AI in society. Whether this increase is hard-alignment-problem-complete seems to depend on other background assumptions not discussed here.

Generalization and the Multiple Stage Fallacy?

Answer by tailcalledOct 07, 202562

The neural tangent kernel^[1] provides an intuitive story for how neural networks generalize: a gradient update on a datapoint will shift similar (as measured by the hidden activations of the NN) datapoints in a similar way.

The vast majority of LLM capabilities still arise from mimicking human choices in particular circumstances. This gives you a substantial amount of alignment "for free" (since you don't have to worry that the LLMs will grab excess power when humans don't), but it also limits you to ~human-level capabilities.

"Gradualism" can mean that fundamentally novel methods only make incremental progress on outcomes, but in most people's imagination I think it rather means that people will keep the human-mimicking capabilities generator as the source of progress, mainly focusing on scaling it up instead of on deriving capabilities by other means.

^{^}
Maybe I should be cautious about invoking this without linking to a comprehensible explanation of what it means, since most resources on it are kind of involved...

tailcalled's Shortform

tailcalled1mo20

Once you focus on "parts" of the brain, you're restricting consideration to mechanisms that are activated at sufficient scale to need to balloon up. I would expect the rarely-activating mechanisms to be much smaller in a physical sense than "parts" of the brain are

tailcalled's Shortform

tailcalled1mo20

Idk, the shift happened a while ago. Maybe mostly just reflecting on how evolution acts on a holistic scale, making it easy to incorporate "gradients" from events that occur only one or a few times in one's lifetime, if these events have enough effect on survival/reproduction. Part of a bigger change in priors towards the relevance of long tails associated with my LDSL sequence.

tailcalled's Shortform

tailcalled1mo71

I've switched from considering uploading to be obviously possible at sufficient technological advancement to considering it probably intractable. More specifically, I expect the mind to be importantly shaped by a lot of rarely-activating mechanisms, which are intractable to map out. You could probably eventually make a sort of "zombie upload" that ignores those mechanisms, but it would be unable to update to new extreme conditions.

Towards a comprehensive study of potential psychological causes of the ordinary range of variation of affective gender identity in males

tailcalled1mo30

Fixed

The Tortoise and the Language Model (A Fable After Hofstadter)

tailcalled2mo40

It was quite real since I wanted to negotiate about whether there was an interesting/nontrivial material project I could do as a favor for Claude.

AI development as the first fully-automated job

tailcalled2mo20

Humans contain the reproductive and hunting instincts. You could call this a bag of heuristics, but it's heuristics on a different level than AI, and in particular might not be chosen to be transferred to AIs. Furthermore, humans are harder to copy or parallelize, which leads to a different privacy profile compared to AIs.

The trouble with intelligence (both human and artificial and evolution) is that it's all about regarding the world as an assembly of the familiar. This makes data/experience a major bottleneck for intelligence.

AI development as the first fully-automated job

tailcalled2mo20

I'm imagining a case where there's no intelligence explosion per se, just bags-of-heuristics AIs with gradually increasing competence.

The Tortoise and the Language Model (A Fable After Hofstadter)

tailcalled2mo64

According to revealed preference, Claude certainly enjoys this sort of recursive philosophy - when I give Claude a choice, it's the sort of thing it tends to pick.

LESSWRONG
LW

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments