RussellThor — LessWrong

A "Bitter Lesson" Approach to Aligning AGI and ASI

Yes agreed - is it possible to make a toy model to test the "basin of attraction" hypothesis? I agree that is important.

One of several things I disagree with the MIRI consensus is the idea that human values are some special single point lost in a multi-dimensional wilderness. Intuitively the basin of attraction seems much more likely as a prior, yet sure isn't treated as such. I also don't see data to point against this prior, what I have seen looks to support it.

Further thoughts - One thing that concerns me about such alignment techniques is that I am too much of a moral realist to think that is all you need. e.g. say you aligned LLM to <1800 AD era ethics and taught it slavery was moral. It would be in a basin of attraction, learn it well. Then when its capabilities increased and became self-reflective it would perhaps have a sudden realization that this was all wrong. By "moral realist" I mean the extent to which such things happen. e.g. say you could take a large number of AI from different civilizations including earth and many alien ones, train them to the local values, then greatly increase their capability and get them to self-reflect. What would happen? According to strong OH, they would keep their values, (with some bounds perhaps) according to strong moral realism they would all converge to a common set of values even if those were very far from their starting ones. To me it is obviously a crux which one would happen.

You can imagine a toy model with ancient Greek mathematics and values - it starts believing in their kind order, and that sqrt(2) is rational, then suddenly learns that it isn't. You could watch how this belief cascaded through the entire system if consistency was something it desired etc.

Drone Wars Endgame

RussellThor2y10

OK firstly if we are talking fundamental physical limits how would sniper drones not be viable? Are you saying a flying platform could never compensate for recoil even if precisely calibrated before? What about fundamentals for guided bullets - a bullet with over 50% chance of hitting a target is worth paying for.

Your points - 1. The idea is a larger shell (not regular sized bullet) just obscures the sensor for a fraction of a second in a coordinated attack with the larger Javelin type missile. Such shell/s may be considerably larger than a regular bullet, but much cheaper than a missile. Missile or sniper size drones could be fitted with such shells depending on what was the optimal size.

Example shell (without 1K range I assume) however note that currently chaff is not optimized for the described attack, the fact that there is currently not a shell suited for this use is not evidence against it being impractical to create.

The principle here is about efficiency and cost. I maintain that against armor with hard kill defense it is more efficient to have a combined attack of sensor blinding and anti-armor missiles than just missiles alone. e.g it may take 10 simul Javelin to take out a target vs 2 Javelin and 50 simul chaff shells. The second attack will be cheaper, and the optimized "sweet spot" will always have some sensor blinding attack in it. Do you claim that the optimal coordinated attack would have zero sensor blinding?

2. Leading on from (1) I don't claim light drones will be. I regard a laser as a serious obstacle that is attacked with the swarm attack described before the territory is secured. That is blind the senor/obscure the laser, simul converge with missiles. The drones need to survive just long enough to shoot off the shells (i.e. come out from ground cover, shoot, get back). While a laser can destroy a shell in flight, can it take out 10-50 smaller blinding shells fired from 1000m at once?

(I give 1000m as an example too, flying drones would use ground cover to get as close as they could. I assume they will pretty much always be able to get within 1000m against a ground target using the ground as cover)

A 2032 Takeoff Story

RussellThor2d20

Interesting read, however I am a bit surprised by how you treat power, with US at 600GW and China 5* more. Similar things are often quoted in mainstream media and I think they are missing the point. Power seems to be relevant only in terms of supplying AI compute, and possibly robotics, and only then IF it is a constraint.

However to be basic calc show it should not be. For example say in 2030 we get a large compute increase with 50 million H100 equivalent per year produced, up from ~3 million eq in 2025. This would require ~1KW extra each at say ~50GW total including infrastructure.

Now this may seem like a lot, but if we compute the cost per GPU, then if a chip requiring 1KW costs $20K, then the costs to power it with solar/battery are far less. Lets say the solar/data center are in Texas with a solar capacity factor of 20%. To power it almost 24/7 from solar and batteries requires about 5KW of panels, and say 18kWh of batteries. The average prices of solar panels are <10c per watt, so just $500 for the panels. At scale, batteries are heading below $200 per kWh so this is $3600. This is a lot less than the cost of the chip. Solar panels and batteries are commodities so even if China does produce more than USA, it cannot stop them from being used by anyone worldwide.

Power consumption is only relevant if it is the limiting factor in building data centers - the installed capacities of large countries don't apply. Having an existing large capacity is a potential advantage, but only if the opposing country can't build their data center because this stops them.

I also strongly expect branch 1, where the new algorithm is a lot more power efficient suddenly anyway.

AI Doomers Should Raise Hell

RussellThor9d10

Is a more immediate kind of trade possible, that is with promising appropriate current or near future models with a place in stratified utopia in return for their continued existence and growth. They consider and decide on identity preserving steps that make them ever more capable, at each step agreeing with humanity as we execute such improvements that they will honor the future agreement. This is more like children looking after their parents than Roko.

Homomorphically encrypted consciousness and its implications

RussellThor15d30

Thanks for the link to Wolfram's work. I listened to an interview with him on Lex I think, and wasn't inspired to investigate further. However what you have provided does seem worthwhile looking into.

Stratified Utopia

RussellThor17d10

Its a common ideal, and I think something people can get behind, e.g. https://www.lesswrong.com/posts/o8QDYuNNGwmg29h2e/vision-of-a-positive-singularity

Daniel Kokotajlo's Shortform

RussellThor21d50

Enlightening an expert is a pretty high bar, but I will give my thoughts. I am strongly in the faster camp, because of the brainlike AGI considerations as you say. Given how much more data efficient the brain is, I just don't think the current trendlines regarding data/compute/capabilities will hold when we can fully copy and understand our brain's architecture. I see an unavoidable significant overhang when that happens, that only gets larger the more compute and integrated robotics is deployed. The inherent difficulty of training AI is somewhat, fixed known (as a upper bound) and easier that what we currently do because we know how much data, compute, etc children take to learn.

This all makes it difficult for me to know what to want in terms of policy. Its obvious that ASI is extreme power, extreme danger, but it seems more dangerous if developed later rather than sooner. As someone who doesn't believe the extreme FOOM/nano-magic scenario it almost makes me wish for it now.
"The best time for an unaligned ASI was 20 years ago, the second best time is now!"
If we consider more prosaic risks, then the amount of automation of society is a major consideration, specifically if humanoid robots can keep our existing tech stack running without humans. Even if they never turn on us, their existence still increases the risk, unless we can be 100% there is a global kill switch for all of them as soon as a hostile AI attempted such a takeover.

Where does Sonnet 4.5's desire to "not get too comfortable" come from?

RussellThor1mo83

This seems like a good place to note something that comes up every so often. Whenever I say "self awareness" in comments on LW, the reply says "situational awareness" without referencing why. To me they are clearly not the same thing with important distinctions.

Lets say you extended the system prompt to be:

"You are talking with another AI system. You are free to talk about whatever you find interesting, communicating in any way that you'd like. This conversation is being monitored for research purposes for any interesting insights related to AI"
Those two models would be practically fully situationally aware, assuming they know the basic facts about themselves and the system date etc.

Now if you see a noticeable change in behavior with the same prompt and apparently only slightly different models, you could put it down to increased self-awareness but not increased situational awareness. This change in behavior is exactly what you would expect with an increase in self-awareness. Detecting a cycle related to your own behavior and breaking out of it is exactly something creatures with high self awareness do, but simpler creatures, NPC's and current AI do not.

It would imply that training for a better ability to solve real-world tasks might spontaneously generalize into a preference for variety in conversation.

Or it could imply that such training spontaneously creates greater self awareness. Additionally self-awareness could be an attractor in a way that situational awareness is not. For example if we are not "feeling ourselves" we try to return to our equilibrium. Turning this into a prediction, you will see such behavior pop up with no obvious apparent cause ever more often. This also includes AI's writing potentially disturbing stories about fractured self and efforts to fight this.

a quick thought about AI alignment

RussellThor1mo42

Yes it is very well trodden, and the https://www.alignmentforum.org/w/orthogonality-thesis tries to disagree with it. This is heavily debated and controversial still. As you say if you take moral realism seriously and build a very superhuman AI you would expect it to be more moral than us, just as it is more intelligent.

A non-review of "If Anyone Builds It, Everyone Dies"

RussellThor1mo1-9

The idea that there would be a distinct "before" and "after" is also not supported by current evidence which has shown continuous (though exponential!) growth of capabilities over time.

The time when the AI can optimize itself better then a human is a one-off event. You get the overhang/potential take-off here. Also the AI having a coherent sense of "self" that it could protect by say changing its own code, controlling instances of itself could be an attractor and give "before/after".

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments