alexlyzhov — LessWrong

I am worried about near-term non-LLM AI developments

They trained on only the train examples of the train and validation puzzle sets, which is fair.

Yes, I agree—I wasn't implying that's foul play. I just thought it's less impressive than I thought because:

It's finetuning on task examples and not in-context few-shot learning
They finetune on the harder evaluation set and not only on easier train set, so they don't demonstrate generalization across the easy->hard distribution shift
The result I linked to was 20% on ARC-AGI-1 by only fitting examples for 1 evaluation task using an MLP-type network vs the 40% result in the paper using 1000 tasks. These numbers are not directly comparable because they did a fair bit of custom architectural engineering to reach 20%, but it really put 40% in perspective for me.

I am worried about near-term non-LLM AI developments

I was very impressed with the ARC-AGI results so I read the entire paper and also browsed the code a fair amount.

Only after browsing the code I realized that they likely train on all evaluation tasks in addition to training tasks—correct me if I'm wrong. During inference they only condition on x* and on the task embedding to predict y*, instead of on (x,y)_{1..3}. The only way they could get that task embedding is by training it. Evaluation tasks are harder than those in the train set.

They score a lot higher than the baseline transformer so clearly there's a lot of merit in what they're doing. But in the setting of training on evaluation tasks you can train on only 1 target ARC-AGI-1 task instead of 1000 tasks and still get 20% accuracy: https://iliao2345.github.io/blog_posts/arc_agi_without_pretraining/arc_agi_without_pretraining.html. Given this, it doesn't look earth-shattering.

Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

alexlyzhov2y52

For every token, model activations are computed once when the token is encountered and then never explicitly revised -> "only [seems like it] goes in one direction"

Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?

alexlyzhov2y92

with the only recursive element of its thought being that it can pass 16 bits to its next running

I would name activations for all previous tokens as the relevant "element of thought" here that gets passed, and this can be gigabytes.

From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.

Aura as a proprioceptive glitch

alexlyzhov2y21

Among them, one I found especially peculiar is that I distinctly started feeling some sort of sensations outside of my body.

I had this, and it lasted for a year after the retreat. I also found that there's a strong tendency for the sensations to happen in the area you described.

I could feel sensations substantially outside of the area accessible to my hands too, but they were a bit more difficult to feel. They could correspond to priors for tactile-like affordances for objects at a distance (e.g. graspability of a cup, or speed of a fast-moving vehicle) that are readily constructed by ordinary perception.

Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI

alexlyzhov2y30

Thriving in the Weird Times: Preparing for the 100X Economy

alexlyzhov2y30

I thought a bit about datasets before and to me it seems like what needs collecting most is detailed personal preference datasets. E.g. input-output examples of how you generally prefer information to be filtered, processed, communicated to you, refined with your inputs; what are your success criteria for tasks, where are the places in your day flow / thought flow where the thing needs to actively intervene and correct you. Especially in those places where you feel you can benefit from cognitive extensions most, based on your bottlenecks. It could initially be too hard to infer from screen logs alone.

alexlyzhov's Shortform

alexlyzhov2y10

Random idea about preventing model stealing. After finetuning a mixture of experts model with your magic sauce, place the trained experts on geographically distinct servers with heterogeneous tech stacks and security systems to avoid common vulnerabilities. Horcrux vibes

AI interpretability could be harmful?

alexlyzhov2y20

Vaguely related paper: Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models is an early attempt to prevent models from being re-purposed via fine-tuning.

It doesn't seem like a meaningfully positive result. For example, all their plots only track finetuning on up to 200 examples. I imagine they might have even had clear negative results in conditions with >200 examples available for finetuning. After 50-100 examples, the gap between normal finetuning and finetuning from random init, even though still small, grows fast. There are also no plots with x-axis = finetuning iterations. When they optimize for "non-finetunability", they don't aim to maintain the language modeling performance, instead, they only impose the constraint of "maintaining finetunability" on one downstream "professions detection task".

I expect naive solutions to continue to work very poorly on this problem.

Clarifying and predicting AGI

alexlyzhov2y10

I think "on most cognitive tasks" means for an AGI its t is defined as the first t for which it meets the expert level at most tasks. However, what exactly counts as a cognitive task does seem to introduce ambiguity and would be cool to clarify, e.g. by pointing to a clear protocol for sampling all such task descriptions from an LLM.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments