LESSWRONG
LW

StanislavKrym's Shortform

by StanislavKrym
29th Apr 2025
1 min read
4

1

This is a special post for quick takes by StanislavKrym. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
StanislavKrym's Shortform
1StanislavKrym
1StanislavKrym
1StanislavKrym
1StanislavKrym
4 comments, sorted by
top scoring
Click to highlight new comments since: Today at 1:09 AM
[-]StanislavKrym1mo10

The ARC-AGI leaderboard got an update. IIRC, the base LLM Qwen3-235b-a22b Instruct (25/07) is the first Chinese model to excel at the Pareto frontier. Or is it likely to be closely matched by the West, as happened with DeepSeek R1 (released on Jaunary 20?) and o3-mini (January 31)? And is China likely to cheaply create higher-level models like an analogue of o3 BEFORE the West? If China does, then how are the two countries to reach the Slowdown Ending? 

Reply
[-]StanislavKrym3mo10

The two main problems with the slowdown ending of the AI-2027 scenario are the two optimistic assumptions, which I plan to cover in two different posts.

  1. If China invades Taiwan in March 2026 and steals Agent-2 in Jan 2027, then OpenBrain no longer has the absolute lead necessary for the unilateral slowdown.
  2. What if any sufficiently powerful AI either takes over or becomes a protective god, but not a servant, as I conjectured here? Then it could be the slowdown ending that has a greater chance to lead to doom, since then OpenBrain is stuck with an insoluble problem. 
Reply
[-]StanislavKrym3mo10

Why does the Race Ending of the AI-2027 Forecast claim that "there are compelling theoretical reasons to expect no aliens for another fifty million light years beyond that"? If it's false, then sapient alien lifeforms should also be moral patients in a way. For example, this implies that all or almost all resources in their home system (and, apparently, some part of space around them) should belong to them, not to humans or a human-aligned AI. And that's ignoring the possibility that humans encounter a planet having the chance to generate a sapient lifeform...

Reply
[-]StanislavKrym4mo*10

If we were to view raising the humans from birth to adulthood and training the AI agents from birth to deployment as similar processes, then what human analogues do the six goal types from the AI-2027 forecast have?  The analogues of developers are, obviously, the adults who have at least partial control over the human's life. Then the analogues of written Specs and developer-intended goals are the adults' intentions;  the analogues of reward/reinforcement seems to be short-term stimuli and the morals of one's communities. I also think that the best analogue for proxies and/or convergent goals is possession of resources (and knowledge, but the latter can be acquired without ethical issues), while the 'other goals' are, well, ideologies, morality[1] and tropes absorbed from the most concentrated form of training data available to humans, which is speech in all its forms. 

What exactly do the analogies above tell us about the perspectives of alignment? The possession of resources is the goal behind aggressive wars, colonialism and related evils[2]. If human culture managed to make them unacceptable, then does it imply that the AI will also not try the AI takeover?   

 

  1. ^

    I also think that humans rarely develop their own moral codes or ideologies;  instead, they usually adopt some moral code or ideology close to the one existing in the "training data". Could anyone comment on this?

  2. ^

    And crimes, but criminals, unlike colonizers, also try to avoid conflicts with the law enforcers that have at least similar power.

Reply
Moderation Log
More from StanislavKrym
View more
Curated and popular this week
4Comments
Mentioned in
3Colonialism in space: Does a collection of minds have exactly two attractors?