LESSWRONG
LW

63
RussellThor
617102130
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3RussellThor's Shortform
9mo
40
A "Bitter Lesson" Approach to Aligning AGI and ASI
RussellThor1y12

Yes agreed - is it possible to make a toy model to test the "basin of attraction" hypothesis? I agree that is important. 

One of several things I disagree with the MIRI consensus is the idea that human values are some special single point lost in a multi-dimensional wilderness. Intuitively the basin of attraction seems much more likely as a prior, yet sure isn't treated as such. I also don't see data to point against this prior, what I have seen looks to support it.

Further thoughts - One thing that concerns me about such alignment techniques is that I am too much of  a moral realist to think that is all you need. e.g. say  you aligned LLM to <1800 AD era ethics and taught it slavery was moral. It would be in a basin of attraction, learn it well. Then when its capabilities increased and became self-reflective it would perhaps have a sudden realization that this was all wrong. By "moral realist" I mean the extent to which such things happen. e.g. say you could take a large number of AI from different civilizations including earth and many alien ones, train them to the local values, then greatly increase their capability and get them to self-reflect. What would happen? According to strong OH, they would keep their values, (with some bounds perhaps) according to strong moral realism they would all converge to a common set of values even if those were very far from their starting ones. To me it is obviously a crux which one would happen.

You can imagine a toy model with ancient Greek mathematics and values - it starts believing in their kind order, and that sqrt(2) is rational, then suddenly learns that it isn't. You could watch how this belief cascaded through the entire system if consistency was something it desired etc.

Reply
Drone Wars Endgame
RussellThor2y10

OK firstly if we are talking fundamental physical limits how would sniper drones not be viable? Are you saying a flying platform could never compensate for recoil even if precisely calibrated before? What about fundamentals for guided bullets - a bullet with over 50% chance of hitting a target is worth paying for.

Your points - 1. The idea is a larger shell (not regular sized bullet) just obscures the sensor for a fraction of a second in a coordinated attack with the larger Javelin type missile. Such shell/s may be considerably larger than a regular bullet, but much cheaper than a missile. Missile or sniper size drones could be fitted with such shells depending on what was the optimal size.

Example shell (without 1K range I assume) however note that currently chaff is not optimized for the described attack, the fact that there is currently not a shell suited for this use is not evidence against it being impractical to create.

The principle here is about efficiency and cost. I maintain that against armor with hard kill defense it is more efficient to have a combined attack of sensor blinding and anti-armor missiles than just missiles alone. e.g it may take 10 simul Javelin to take out a target vs 2 Javelin and 50 simul chaff shells. The second attack will be cheaper, and the optimized "sweet spot" will always have some sensor blinding attack in it. Do you claim that the optimal coordinated attack would have zero sensor blinding?

2. Leading on from (1) I don't claim light drones will be. I regard a laser as a serious obstacle that is attacked with the swarm attack described before the territory is secured. That is blind the senor/obscure the laser, simul converge with missiles. The drones need to survive just long enough to shoot off the shells (i.e. come out from ground cover, shoot, get back). While a laser can destroy a shell in flight, can it take out 10-50 smaller blinding shells fired from 1000m at once?

(I give 1000m as an example too, flying drones would use ground cover to get as close as they could. I assume they will pretty much always be able to get within 1000m against a ground target using the ground as cover)

Reply
Daniel Kokotajlo's Shortform
RussellThor2d30

Auto-turrets aren't ready yet, but UKR does have FPV drones with props facing forward that can go ~400kph. (For anti-Shahed) These could work as interceptors and allow a small number to cover a larger area if they have a buffer zone - that is one interceptor can travel faster than the attacker so can be spread out more. Also drones can't be stealthy (prop noise/radar etc) so there isn't the element of surprise. It may only be 10 minutes but thats enough to get inside, in an internal room in most cases. No living by the border though in that case... 

Reply
My talk on AI risks at the National Conservatism conference last week
RussellThor6d*10

I am glad there are people working on nuclear safety and absolutely agree there should be more AI safety inside governments. 
I also think pre-LLM AI tech should get more attention - Peter Thiel I think makes the point that software has very little regulation compared to much physical things yet it can have enormous influence. I'm sure I don't need to persuade you that the current dating situation is not ideal. What can be practically done about it all things considered however is not so clear.

However those nuke safety people aren't working inside Russia as far as I am aware? My point is that we still don't know what such risk is as of now, nor do we have much of an estimate in the coming decades. The justifiable uncertainty is huge. My position when considering a pause/stop depends on weighing up things we can really only guess at.

To consider when say delaying ASI 50+ years we need to know:

What is the chance of nuke war/lethal pandemic etc in that time? 2%, 90%?

What will LLM tech and similar do to our society? 
Specifically what is the chance that it will degrade our society in some way that when we do choose to go ahead with ASI we get "imagine a boot stamping on a human face – for ever." While pure x-risk may be higher with immediate ASI, I think S-risk will be higher with a delay. In the past, centralization and dictators would eventually fail. Now imagine if a place like N Korea gave everyone a permanent bracelet that recorded everything they said paired to an LLM that also understood their hand gestures and facial expressions. They additionally let pre-ASI AI run their society so that central planning actually could work. I think that there is no coming back from that.

Now even if such a country is economically weaker than a free one, if there is a % chance each decade that free societies fall into such an attractor, then eventually the majority of economic output ends up in such a system. They then "solve" alignment getting an ASI that does their bidding.

What is the current X-risk, and what would it be after 50 years of alignment research?

I believe that pre GPT-3/3.5, further time spent on alignment would be essentially a dead end. Without actual data we get into diminishing returns, and likely false confidence on results and paradigms. However it is clear that X-risk could be a lot lower if done right. To me that means actually building ever more powerful AI in very limited and controlled situations. So yes a well managed delay could greatly reduce X-risk.

There are 4 very important unknowns here, potentially 5 if you separate out S risk. How to decide? Is +2% more S-risk acceptable if you take X risk from 50% to 5%? Different numbers for these situations will give very different recommendations. If the current world was going well, then sure its easy to see that a pause/stop is the best option.

What to do?

From this it is clear that work on actually making the current world safer is very valuable. That is protecting institutions that work, anticipating future threats and making the world more robust against them. Unfortunately that doesn't mean that keeping the current situation as long as possible is the best all things considered.

If someone thought there is a high chance that ASI is coming soon or that even with the best efforts the current world can't be made sufficiently safe, then they would want to work on making ASI go well, for example mechanistic interpretability research or other practical alignment work.

Expressing such uncertainty on my part probably won't get me invited to make speeches and can come across as a lack of moral clarity. However it is my position and I don't think behavior based on the outcome of those uncertainties should be up for moral stigmatization.

These are not my numbers but lets say you have 50% for nuke war/similar event, then 50% for S-risk from the surviving worlds over the next 100 years with no ASI, but 20% X risk/1% S risk from ASI < 5 years.  Your actions and priorities are then clear and morally defensible from your probabilities. Some e/acc people may genuinely have these beliefs.

Edited later for my reference
Does pursuing WBE change this? Perhaps if you think we can delay ASI but just 20 years to get WBE and believe that they will be better aligned. If you get ASI first and then use them to create WBE that can be seen perhaps as a pivotal act. Stop pure AI but only create WBE is not a strategy I have seen pushed seriously. It doesn't seem possible without first having massive GPU control etc as its pretty clear without constraints pure AI will be made first. For example if you have the tech to scan enough of a brain, then you are pretty much guaranteed to be able to make ASI from what you have learnt before you have scanned the whole brain.

Reply
My talk on AI risks at the National Conservatism conference last week
RussellThor7d91

Understandable position well articulated.
An important issue I have with conservatism (and many AI safety positions) is that it assumes the kind of world that arguably doesn't exist. That is one that is somewhat stable, safe, and sustainable. The assumption is that without AI the good things we currently know would continue for long enough to matter.
If instead we view ourselves in an unlikely timeline  where the most likely outcome from the last 100 years is that we have had a full on nuke war then that changes the perspective. Considering all the close calls, if there in hindsight was a 75% chance of nuke war from 1960 till now and we are just lucky then that changes much. 

Given that such odds probably havn't changed i.e. great power conflict with China taking the place of Russia in the next 75 years will give similar dangers, then our current situation is not one to preserve, but instead change as soon as we can. You talk about Russian Roulette but perhaps want to preserve a situation where we arguably already play it every 100 years with 5 bullets in the chamber. That is not including new threats - does pre LLM AI/social media cause collapse after time? Does LLM + dictators cause a permanent 1984 style world given time? 

If you believe that humanity is an a very dangerous phase of unavoidable change then it is about getting out of that situation with the highest chance, rather then attempting to preserve the current seemingly safe-ish situation. ASI is one way, large scale space colonization (different stars) is another.

Reply1
Thomas Kwa's Shortform
RussellThor14d10

"I would prefer a 1-year pause with say Buck Shlegeris in charge " I think the best we can realistically do is something like a 1 year pause, and if done well gives a good chance of success. As you say 1 year with ~ASI will get a lot done. Some grand bargain where everyone pauses for one year in return for no more pauses perhaps. 
Unfortunately it will be extremely hard for some org not to push the "self optimize" button during this time however. Thats why I would rather as few as possible leading AI labs during this time.
I would go so far as to say I would rather have 1 year like that  than 100 years with current AI capabilities paused and alignment research progressed.

Reply
Training a Reward Hacker Despite Perfect Labels
RussellThor1mo21

If we assume that the current LLM/Transformers dont get to ASI, how much does this help aligning a new architecture. (My best guess is one copied from biology/neo-cortex) Do all the lessons transfer? 

Reply
GPT-5 writing a Singularity scenario
RussellThor1mo20

Havn't read it in detail, but was there mention of other actors copying Sable?  "other things waking up." is the closest i see there. For example many orgs/countries will get Sable weights, fine tune it so they own it then it is a different actor etc. Then its several countries with their own AGI perhaps aligned to them and them alone.

Reply
Thomas Kwa's Shortform
RussellThor1mo120

Sounds interesting - the main point is that I don't think you can hit the reentry vehicle because of turbulent jitter caused by the atmosphere. Looks like normal jitter is ~10m which means a small drone can't hit it. So could the drone explode into enough fragments to guarantee a hit and with enough energy to kill it? Not so sure about that. Seems less likely. 
Then what about countermeasures - 
1. I expect the ICBM can amplify such lateral movement in the terminal phase with grid fins etc without needing to go full HGV - can you retrofit such things?
2. What about a chain of nukes where the first one explodes 10km up in the atmosphere purely to make a large fireball distraction. The 2nd in the chain then flies through this fireball 2km from its center say 5 seconds later.  (enough to blind sensors but not destroy the nuke) The benefit of that is that when the first nuke explodes, the 2nd changes its position randomly with its grid fins SpaceX style. It is untrackable during the 1st explosion phase so throws off the potential interceptors, letting it get through. You could have 4-5 in a chain exploding ever lower to the ground.

I have wondered if railguns could also stop ICBM - even if the rails only last 5-10 shots that is enough and cheaper than a nuke. Also "Brilliant pebbles" is now possible.
https://www.lesswrong.com/posts/FNRAKirZDJRBH7BDh/russellthor-s-shortform?commentId=FSmFh28Mer3p456yy

Reply
RussellThor's Shortform
RussellThor1mo00

GPT fail leads to shorter timelines?

If you are of the opinion that the transformer architecture cannot scale to AGI and a more brain inspired approach is needed, then the sooner that everyone realizes that scaling LLM/Tx is not sufficient, the sooner the search begins in earnest. At present the majority of experimental compute and researcher effort is probably on such LLM/Tx systems, however if that changes to exploring new approaches then we can expect a speedup on finding such better architectures.  
For existing companies,  https://thinkingmachines.ai/ and https://ssi.inc/ are probably already doing a lot of this, and Deepmind is not just transformers, but there is a lot of scope for effort/compute to shift from LLM to other ideas in the wider industry.

Reply
Load More
5The anti-Kardashev scale is a better measure of civilizational power
2mo
2
18Beliefs and state of mind into 2025
8mo
10
4Dishbrain and implications.
9mo
0
4Vision of a positive Singularity
9mo
0
3RussellThor's Shortform
9mo
40
7Of Birds and Bees
1y
9
36Drone Wars Endgame
2y
71
5Little attention seems to be on discouraging hardware progress
2y
3
5P-zombies, Compression and the Simulation Hypothesis
2y
0
12Has the Symbol Grounding Problem just gone away?
2y
3
Load More