anaguma — LessWrong

Just like your mind can only see the rightside-up-stairs (w/ a blue wall closer to you) or the upside-down-stairs (w/ a green wall closer to you) (but never both of them at the same time), you can also suddenly shift from a viewpoint of being little brother, to being the full system.

If you aren’t able to see the second thing, try flipping your screen and upside down, and looking at it normally again.

anaguma's Shortform

anaguma3d*80

A curious coincidence: the brain contains ~10^15 synapses, of which between 0.5%-2.5% are active at any given time. Large MoE models such as Kimi K2 contains 10^12 parameters, of which 3.2% are active in any forward pass. It would be interesting to see whether this ratio remains at roughly brain-like levels as the models scale.

Jacob Pfau's Shortform

anaguma5d30

The one-week scale of interaction between Ernest and ChatGPT here is a great example of how we're very much in a centaur regime now.

How long do you expect this to last?

anaguma's Shortform

anaguma5d10

Yudkowsky has done another interview today on IABIED with Chris Williamson.

the gears to ascenscion's Shortform

anaguma6d10

I think we should delete them for the same reason we shouldn’t keep around samples of smallpox - the risk of a lab leak, e.g. by future historians interacting with it, or by it causing misalignment in other AIs seems nontrivial.

Perhaps a compromise: what do you think of keeping the training data and training code around, but deleting the weights? This keeps the option of bringing them back (training is usually deterministic), but only in a future where compute will be abundant and the misaligned models pose no threat to the existing AIs. We can get robust AI monitors when interacting with humans, etc.

the gears to ascenscion's Shortform

anaguma6d3-2

I don’t think we should keep future misaligned models around and let them interact with other models or humans.

Zach Stein-Perlman's Shortform

anaguma8d10

Can you say more about the projects you're spending your time on now?

faul_sname's Shortform

anaguma8d10

I think not making the CoTs weird is a tax on capabilities and limits the type of research they can do. Also they would need to train the CoTs to not display bad behavior, e.g. not offend the user, which is contra the Most Forbidden Technique because it makes CoT monitoring less useful.

anaguma's Shortform

anaguma11d10

Today 80,000 Hours released a podcast with Daniel Kokotajlo on AI 2027 and related topics.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments