LESSWRONG
LW

1673
anaguma
48481660
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
2anaguma's Shortform
10mo
61
Temporarily Losing My Ego
anaguma2d10

Just like your mind can only see the rightside-up-stairs (w/ a blue wall closer to you) or the upside-down-stairs (w/ a green wall closer to you) (but never both of them at the same time), you can also suddenly shift from a viewpoint of being little brother, to being the full system.


If you aren’t able to see the second thing, try flipping your screen and upside down, and looking at it normally again.

Reply
anaguma's Shortform
anaguma3d*80

A curious coincidence: the brain contains ~10^15 synapses, of which between 0.5%-2.5% are active at any given time. Large MoE models such as Kimi K2 contains 10^12 parameters, of which 3.2% are active in any forward pass. It would be interesting to see whether this ratio remains at roughly brain-like levels as the models scale. 

Reply
Jacob Pfau's Shortform
anaguma5d30

The one-week scale of interaction between Ernest and ChatGPT here is a great example of how we're very much in a centaur regime now.



How long do you expect this to last?

Reply
anaguma's Shortform
anaguma5d10

Yudkowsky has done another interview today on IABIED with Chris Williamson.

Reply
the gears to ascenscion's Shortform
anaguma6d10

I think we should delete them for the same reason we shouldn’t keep around samples of smallpox - the risk of a lab leak, e.g. by future historians interacting with it, or by it causing misalignment in other AIs seems nontrivial. 


Perhaps a compromise: what do you think of keeping the training data and training code around, but deleting the weights? This keeps the option of bringing them back (training is usually deterministic), but only in a future where compute will be abundant and the misaligned models pose no threat to the existing AIs. We can get robust AI monitors when interacting with humans, etc.

Reply
the gears to ascenscion's Shortform
anaguma6d3-2

I don’t think we should keep future misaligned models around and let them interact with other models or humans.

Reply
Zach Stein-Perlman's Shortform
anaguma8d10

Can you say more about the projects you're spending your time on now?

Reply
faul_sname's Shortform
anaguma8d10

I think not making the CoTs weird is a tax on capabilities and limits the type of research they can do. Also they would need to train the CoTs to not display bad behavior, e.g. not offend the user, which is contra the Most Forbidden Technique because it makes CoT monitoring less useful.

Reply
anaguma's Shortform
anaguma11d10

Today 80,000 Hours released a podcast with Daniel Kokotajlo on AI 2027 and related topics. 

Reply
Load More
4Anthropic Economic Index report
1mo
0
5GPT-1 was a comedic genius
1mo
3
18OpenAI Releases GPT-5
3mo
0
38OpenAI releases gpt-oss
3mo
6
32025 Alignment Predictions
Q
10mo
Q
3
2anaguma's Shortform
10mo
61
5OpenAI o1 + ChatGPT Pro release
11mo
0
11Anthropic - The case for targeted regulation
1y
0