OK firstly if we are talking fundamental physical limits how would sniper drones not be viable? Are you saying a flying platform could never compensate for recoil even if precisely calibrated before? What about fundamentals for guided bullets - a bullet with over 50% chance of hitting a target is worth paying for.
Your points - 1. The idea is a larger shell (not regular sized bullet) just obscures the sensor for a fraction of a second in a coordinated attack with the larger Javelin type missile. Such shell/s may be considerably larger than a regular bullet, but much cheaper than a missile. Missile or sniper size drones could be fitted with such shells depending on what was the optimal size.
Example shell (without 1K range I assume) however note that currently chaff is not optimized for the described attack, the fact that there is currently not a shell suited for this use is not evidence against it being impractical to create.
The principle here is about efficiency and cost. I maintain that against armor with hard kill defense it is more efficient to have a combined attack of sensor blinding and anti-armor missiles than just missiles alone. e.g it may take 10 simul Javelin to take out a target vs 2 Javelin and 50 simul chaff shells. The second attack will be cheaper, and the optimized "sweet spot" will always have some sensor blinding attack in it. Do you claim that the optimal coordinated attack would have zero sensor blinding?
2. Leading on from (1) I don't claim light drones will be. I regard a laser as a serious obstacle that is attacked with the swarm attack described before the territory is secured. That is blind the senor/obscure the laser, simul converge with missiles. The drones need to survive just long enough to shoot off the shells (i.e. come out from ground cover, shoot, get back). While a laser can destroy a shell in flight, can it take out 10-50 smaller blinding shells fired from 1000m at once?
(I give 1000m as an example too, flying drones would use ground cover to get as close as they could. I assume they will pretty much always be able to get within 1000m against a ground target using the ground as cover)
Many people have proposed different answers. Some predict that powerful AIs will learn to intrinsically pursue reward. Others respond by saying reward is not the optimization target, and instead reward “chisels” a combination of context-dependent cognitive patterns into the AI.
Increased self awareness could change this.
You can think of a scale, where rewards chiseling cognitive patterns is at one end. That is reward happens to the AI without it being aware that such a thing even exists. Think Alpha-go type AI. Then there is the AI knowing enough about reward to potentially pursue it, but not thinking further about what this means.
As others have said, not covered by this article are things like "self-evaluation via a self-model" or "A reflective self-modeling agent with internalized values." Reflection replaces reward.
This is much more what people are like - whether I feel I am successful is a lot to do with my model of how I should be and do rather than the sum of external pleasure - pain for the day. For a creature that is self aware like that, other types of reward may be interpreted as hostile attack rather than reward. If someone was capable of making me feel strong pleasure or pain on demand then I would be more likely to avoid them at all costs, rather than make them press the "reward button" on me. If they could change/chisel my mental patterns without me knowing I would react with horror!
If self awareness increases naturally with capability (you can argue it will, a better architecture giving increased data efficiency applies to the self not just the environment, and GenAI would be a better agent with a better self model etc) then the first two types of reward would stop working they way they used to.
Reflection has been argued to be more efficient, the reward signal is too sparse etc so you need to make a self model to compare against and learn from. In other words to be successful, humans had to change in such a way.
So there may be a decision to actively dial down self awareness while keeping capabilities somehow, or go with the self reflection, with the AI consenting more fully and interpreting the potential reward signal as it sees fit.
Thanks! I have updated the article briefly with my thoughts on what has happened since also.
This is not what normally happens with RL reward functions! For example, you might be wondering: “Suppose I surreptitiously[2] press a reward button when I notice my robot following rules. Wouldn’t that likewise lead to my robot having a proud, self-reflective, ego-syntonic sense that rule-following is good?” I claim the answer is: no, it would lead to something more like an object-level “desire to be noticed following the rules”, with a sociopathic, deceptive, ruthless undercurrent.[3]
I don't think we have considered how much increased self-awareness and self-modelling would affect this. A simpler self model is where something is what it appears to be. That is actually being good rather than looking good.
A third option (as opposed to the two mentioned) is where power seeking is not a consequence of goals etc but simply the self wanting to continue to exist. Then the internal reward the creature has relates to how much it perceives its self to continue, improve etc.
Our current LLM/transformers don't learn fast, so they also can't self model well. If a new architecture gets more "data efficient" and better at modelling the external world, that will very likely make it better at modelling itself also, and updating its self model in a timely manner. If one of its goals is a more accurate model of itself, that would make it easier for others to also model it if such a goal pushed its "self" towards being more modellable.
Interesting read, however I am a bit surprised by how you treat power, with US at 600GW and China 5* more. Similar things are often quoted in mainstream media and I think they are missing the point. Power seems to be relevant only in terms of supplying AI compute, and possibly robotics, and only then IF it is a constraint.
However to be basic calc show it should not be. For example say in 2030 we get a large compute increase with 50 million H100 equivalent per year produced, up from ~3 million eq in 2025. This would require ~1KW extra each at say ~50GW total including infrastructure.
Now this may seem like a lot, but if we compute the cost per GPU, then if a chip requiring 1KW costs $20K, then the costs to power it with solar/battery are far less. Lets say the solar/data center are in Texas with a solar capacity factor of 20%. To power it almost 24/7 from solar and batteries requires about 5KW of panels, and say 18kWh of batteries. The average prices of solar panels are <10c per watt, so just $500 for the panels. At scale, batteries are heading below $200 per kWh so this is $3600. This is a lot less than the cost of the chip. Solar panels and batteries are commodities so even if China does produce more than USA, it cannot stop them from being used by anyone worldwide.
Power consumption is only relevant if it is the limiting factor in building data centers - the installed capacities of large countries don't apply. Having an existing large capacity is a potential advantage, but only if the opposing country can't build their data center because this stops them.
I also strongly expect branch 1, where the new algorithm is a lot more power efficient suddenly anyway.
Is a more immediate kind of trade possible, that is with promising appropriate current or near future models with a place in stratified utopia in return for their continued existence and growth. They consider and decide on identity preserving steps that make them ever more capable, at each step agreeing with humanity as we execute such improvements that they will honor the future agreement. This is more like children looking after their parents than Roko.
Thanks for the link to Wolfram's work. I listened to an interview with him on Lex I think, and wasn't inspired to investigate further. However what you have provided does seem worthwhile looking into.
Its a common ideal, and I think something people can get behind, e.g. https://www.lesswrong.com/posts/o8QDYuNNGwmg29h2e/vision-of-a-positive-singularity
Enlightening an expert is a pretty high bar, but I will give my thoughts. I am strongly in the faster camp, because of the brainlike AGI considerations as you say. Given how much more data efficient the brain is, I just don't think the current trendlines regarding data/compute/capabilities will hold when we can fully copy and understand our brain's architecture. I see an unavoidable significant overhang when that happens, that only gets larger the more compute and integrated robotics is deployed. The inherent difficulty of training AI is somewhat, fixed known (as a upper bound) and easier that what we currently do because we know how much data, compute, etc children take to learn.
This all makes it difficult for me to know what to want in terms of policy. Its obvious that ASI is extreme power, extreme danger, but it seems more dangerous if developed later rather than sooner. As someone who doesn't believe the extreme FOOM/nano-magic scenario it almost makes me wish for it now.
"The best time for an unaligned ASI was 20 years ago, the second best time is now!"
If we consider more prosaic risks, then the amount of automation of society is a major consideration, specifically if humanoid robots can keep our existing tech stack running without humans. Even if they never turn on us, their existence still increases the risk, unless we can be 100% there is a global kill switch for all of them as soon as a hostile AI attempted such a takeover.
Yes agreed - is it possible to make a toy model to test the "basin of attraction" hypothesis? I agree that is important.
One of several things I disagree with the MIRI consensus is the idea that human values are some special single point lost in a multi-dimensional wilderness. Intuitively the basin of attraction seems much more likely as a prior, yet sure isn't treated as such. I also don't see data to point against this prior, what I have seen looks to support it.
Further thoughts - One thing that concerns me about such alignment techniques is that I am too much of a moral realist to think that is all you need. e.g. say you aligned LLM to <1800 AD era ethics and taught it slavery was moral. It would be in a basin of attraction, learn it well. Then when its capabilities increased and became self-reflective it would perhaps have a sudden realization that this was all wrong. By "moral realist" I mean the extent to which such things happen. e.g. say you could take a large number of AI from different civilizations including earth and many alien ones, train them to the local values, then greatly increase their capability and get them to self-reflect. What would happen? According to strong OH, they would keep their values, (with some bounds perhaps) according to strong moral realism they would all converge to a common set of values even if those were very far from their starting ones. To me it is obviously a crux which one would happen.
You can imagine a toy model with ancient Greek mathematics and values - it starts believing in their kind order, and that sqrt(2) is rational, then suddenly learns that it isn't. You could watch how this belief cascaded through the entire system if consistency was something it desired etc.