RussellThor

Wiki Contributions

Comments

Yes agreed - is it possible to make a toy model to test the "basin of attraction" hypothesis? I agree that is important. 

One of several things I disagree with the MIRI consensus is the idea that human values are some special single point lost in a multi-dimensional wilderness. Intuitively the basin of attraction seems much more likely as a prior, yet sure isn't treated as such. I also don't see data to point against this prior, what I have seen looks to support it.

Further thoughts - One thing that concerns me about such alignment techniques is that I am too much of  a moral realist to think that is all you need. e.g. say  you aligned LLM to <1800 AD era ethics and taught it slavery was moral. It would be in a basin of attraction, learn it well. Then when its capabilities increased and became self-reflective it would perhaps have a sudden realization that this was all wrong. By "moral realist" I mean the extent to which such things happen. e.g. say you could take a large number of AI from different civilizations including earth and many alien ones, train them to the local values, then greatly increase their capability and get them to self-reflect. What would happen? According to strong OH, they would keep their values, (with some bounds perhaps) according to strong moral realism they would all converge to a common set of values even if those were very far from their starting ones. To me it is obviously a crux which one would happen.

You can imagine a toy model with ancient Greek mathematics and values - it starts believing in their kind order, and that sqrt(2) is rational, then suddenly learns that it isn't. You could watch how this belief cascaded through the entire system if consistency was something it desired etc.

OK firstly if we are talking fundamental physical limits how would sniper drones not be viable? Are you saying a flying platform could never compensate for recoil even if precisely calibrated before? What about fundamentals for guided bullets - a bullet with over 50% chance of hitting a target is worth paying for.

Your points - 1. The idea is a larger shell (not regular sized bullet) just obscures the sensor for a fraction of a second in a coordinated attack with the larger Javelin type missile. Such shell/s may be considerably larger than a regular bullet, but much cheaper than a missile. Missile or sniper size drones could be fitted with such shells depending on what was the optimal size.

Example shell (without 1K range I assume) however note that currently chaff is not optimized for the described attack, the fact that there is currently not a shell suited for this use is not evidence against it being impractical to create.

The principle here is about efficiency and cost. I maintain that against armor with hard kill defense it is more efficient to have a combined attack of sensor blinding and anti-armor missiles than just missiles alone. e.g it may take 10 simul Javelin to take out a target vs 2 Javelin and 50 simul chaff shells. The second attack will be cheaper, and the optimized "sweet spot" will always have some sensor blinding attack in it. Do you claim that the optimal coordinated attack would have zero sensor blinding?

2. Leading on from (1) I don't claim light drones will be. I regard a laser as a serious obstacle that is attacked with the swarm attack described before the territory is secured. That is blind the senor/obscure the laser, simul converge with missiles. The drones need to survive just long enough to shoot off the shells (i.e. come out from ground cover, shoot, get back). While a laser can destroy a shell in flight, can it take out 10-50 smaller blinding shells fired from 1000m at once?

(I give 1000m as an example too, flying drones would use ground cover to get as close as they could. I assume they will pretty much always be able to get within 1000m against a ground target using the ground as cover)

Sure it doesn't prevent a deceptive model being made,  but if AI engineers made NN with such self awareness at all levels from the ground up, that wouldn't happen in their models. The encouraging thing if it holds up is that there is little to no "alignment tax" to make the models understandable - they are also better.

Self modelling in NN https://arxiv.org/pdf/2407.10188 Is this good news for mech interpretability? If the model makes it easily predictable, then that really seems to limit the possibilities for deceptive alignment

If it is truly impossible to break symmetry you could argue that there isn't a clone and you are in fact the same. I.e. there is just one instance of you, it just looks like there are two. After all if you are absolutely identical including the universe in what sense are there two of you? Upon further thought, you couldn't tell if a perfectly translational clone was a clone at all, or just a perfect mirror/force field. There would be no way to tell. If you put you hand out to touch the mirror, or your mirror hand, if it was perfectly aligned you would not feel texture, but instead an infinitely hard surface. There would be no rubbing of your fingers against the clone, no way to tell if there was a perfect mirror, or another copy.

OK thanks, will look some more at your sequence. Note I brought up Greek philosophy as obviously not being stable under reflection with the proof of sqrt(2) being irrational as a simple example, not sure why you are only reasonably sure its not.

Answer by RussellThor50

Don't think there is a conclusion, just more puzzling situations the deeper you go:

"Scott Aaronson: To my mind, one of the central things that any account of consciousness needs to do, is to explain where your consciousness “is” in space, which physical objects are the locus of it.  I mean, not just in ordinary life (where presumably we can all agree that your consciousness resides in your brain, and especially in your cerebral cortex—though which parts of your cerebral cortex?), but in all sorts of hypothetical situations that we can devise.  What if we made a backup copy of all the information in your brain and ran it on a server somewhere?  Knowing that, should you then expect there’s a 50% chance that “you’re” the backup copy?  Or are you and your backup copy somehow tethered together as a single consciousness, no matter how far apart in space you might be?  Or are you tethered together for a while, but then become untethered when your experiences start to diverge?  Does it matter if your backup copy is actually “run,” and what counts as running it?  Would a simulation on pen and paper (a huge amount of pen and paper, but no matter) suffice?  What if the simulation of you was encrypted, and the only decryption key was stored in some other galaxy?  Or, if the universe is infinite, should you assume that “your” consciousness is spread across infinitely many physical entities, namely all the brains physically indistinguishable from yours—including “Boltzmann brains” that arise purely by chance fluctuations?"
Link

The point here is that you could have a system that to an outside observer looked random or encrypted but with the key would be revealed to be a conscious creature. But what if the key was forever destroyed? Does the universe then somehow know to assign it consciousness?

You also need to fully decide when replaying vs computing apparently conscious behavior counts. If you compute a digital sim once, then save the states and replay it the 2nd time what does that mean? What about playing it backwards?

Boltzmann brains really mess things up further.

It seems to lead to the position then that its just all arbitrary and there is no objective truth, or uncountable infinities of consciousness in un-causal timeless situations. Embracing this view doesn't lead anywhere useful from what I can see and of course I don't want it to be the logical conclusion.

What about ASICs? I heard someone is making them for inference and of course claims an efficiency gain. ASIC improvement needs to be thought of as part of the status quo

To do that and achieve something looking like take-off they would need to have to get to the level of advanced AI researcher, rather than just coding assistant. That is come up with novel architectures to test. Even if the LLM could write all the code for a top researcher 10* faster that's not a 10* speedup in timelines, probably 50% at most if much of the time is thinking up theoretical concepts and waiting for training runs to test results.

I am clearly in the skeptic camp, in the sense that I don't believe the current architecture will get to AGI with our resources. That is if all the GPU, training data in the world where used it wouldn't be sufficient and maybe no amount of compute/data would.

To me the strongest evidence that our architecture doesn't learn and generalize well isn't LLM but in fact Tesla autopilot. It has ~10,000* more training data than a person, much more FLOPS and is still not human level. I think Tesla is doing pretty much everything major right with their training setup. Our current AI setups just don't learn or generalize as well as the human brain and similar. They don't extract symbols or diverse generalizations from high bandwidth un-curated data like video. Scaffolding doesn't change this.

A medium term but IMO pretty much guaranteed way to get this would be to study and fully characterize the cortical column in the human/mammal brain.

Load More