Agree with almost all of your points.
The goal of writing this post was "this is a slight improvement on IIT", not "I expect normal people to understand/agree with this particular definition of consciousness".
But the vast majority of automation doesn't seem to be militarily relevant. Even if you assume some sort of feedback loop where military insubstantial automation leads to better military automation, world powers already have the trump card in terms of nukes for wars of aggression against them.
I think your underestimating the use of non-military tech for military purposes. As a point of comparison, the US pre-WWII had a massive economy (and very little of it dedicated to the military). But this still proved to be a decisive advantage.
Or, as admi... (read more)
But unlike the last big pass in automation, when missing out meant getting conquered, this time the penalty for missing out seems insubstantial.
This claim has been empirically refuted in Armenia and Ukraine. Missing out on drones DOES mean getting conquered.
I disagree that they are all that interesting: a lot of TASes don't look like "amazing skilled performance that brings you to tears to watch" but "the player stands in place twitching for 32.1 seconds and then teleports to the YOU WIN screen".
I fully concede that a Paperclip Maximizer is way less interesting if there turns out to be some kind of false vacuum that allows you to just turn the universe into a densely tiled space filled with paperclips expanding at the speed of light.
It would be cool to make an classification of games where p... (read more)
So, in the domains where we can approach perfection, the idea that there will always be large amounts of diversity and interesting behaviors does not seem to be doing well.
I suspect that a paperclip maximizer would look less like perfect Go play and more like a TAS speedrun of Mario. Different people have different ideas of interesting, but I personally find TAS's fun to watch.
The much longer version of this argument is here.
We are already connected to machines (via keyboards and monitors). The question is how a higher bandwidth interface will help in mitigating risks from huge, opaque neural networks.
I think the idea is something along the lines of:
This isn't something you could do with a keyboard and monitor.
But, as stated, I'm not super-optimistic this will result in a sane, su... (read more)
Contra #4: nope. Landauer’s principle implicates that reversible computation cost nothing (until you’d want to read the result, which then cost next to nothing time the size of the result you want to read, irrespective of the size of the computation proper). Present day computers are obviously very far from this limit, but you can’t assume « computronium » is too.
Reading the results isn't the only time you erase bits. Any time you use an "IF" statement, you have to either erase the branch that you don't care about or double the size of your program in memory.
This seems backwards to me. If you prove a cryptographic protocol works, using some assumptions, then the only way it can fail is if the assumptions fail. Its not that a system using RSA is 100% secure, someone could peak in your window and see the messages after decryption. But its sure more secure than some random nonsense code with no proofs about it, like people "encoding" data into base 16.
The context isn't "system with formal proof" vs "system I just thought of 10 seconds ago" but "system with formal proof" vs "system without formal proof but e... (read more)
I will try to do a longer write-up sometime, but in a Bureaucracy of AIs, no individual AI is actually super-human (just as Google collectively knows more than any human being but no individual at Google is super-human).
It stays aligned because there is always a "human in the loop", in fact the whole organization simply competes to produce plans which are then approved by human reviewers (under some sort of futarchy-style political system). Importantly, some of the AIs compete by creating plans, and other AIs compete by explaining to humans ho... (read more)
Regarding DAOs, I think they are an excellent breeding-grounds for developing robust bureaucracies, since between pseudonymous contributors and a reputation for hacking, building on the blockchain is about as close to simulating a world filled will less-than-friendly AIs as we currently have. If we can't even create a DAO that robustly achieves its owners goals on the blockchain, I would be less optimistic that we can build one that obeys human values out of non-aligned (or weakly aligned) AIs.
Also, I think, idea of Non-Agentic AI deserv
These all sound like really important questions that we should be dedicating a ton of effort/resources into researching. Especially since there is a 50% chance we will discover immortality this century and a 30% chance we will do so before discovering AGI.
There's Humans Consulting Humans, but my understanding is this is meant as a toy model, not as a serious approach to Friendly AI.
On the one hand, your definition of "cool and interesting" may be different from mine, so it's entirely possible I would find a paperclip maximizer cool but you wouldn't. As a mathematician I find a lot of things interesting that most people hate (this is basically a description of all of math).
On the other hand, I really don't buy many of the arguments in "value is fragile". For example:
And you might be able to see how the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend
One observation that comes to mind is that the end of games for very good players tends to be extremely simple. A Go game by a pro crushing the other player doesn't end in a complicated board which looks like the Mona Lisa; it looks like a boring regular grid of black stones dotted with 2 or 3l voids. Or if we look at chess endgame databases, which are provably optimal and perfect play, we don't find all the beautiful concepts of chess tactics and strategy that we love to analyze - we just find mysterious, baffingly arbitrary moves which make no sense and ... (read more)
I'll just go ahead and change it to "Aligned By Definition" which is different and still seems to get the point across.
Is there a better/ more commonly used phrase for "AI is just naturally aligned"? Yours sounds like what I've been calling Trial and Error and has also been called "winging it"
Yes, I definitely think that there is quite a bit of overhead in how much more capital businesses could be deploying. GPT-3 is ~$10M, whereas I think that businesses could probably do 2-3OOM more spending if they wanted to (and a Manhattan project would be more like 4OOM bigger/$100B ).
I think you're confounding two questions:
I think it's pretty clear that AIHHAI accelerates AI development (without Copilot, I would have to write all those lines myself).
However, I think that observing AIHHAI should actually update your priors towards Slow Takeoff (or at least Moderate Takeoff). One reason is because humans are inherently slower than machines, and as Amdahl reminds us if something is composed of a slow thing and a fast thing... (read more)
here is a list of reasons I have previously written for why the Singularity might never happen.
That being said, EY's primary argument that alignment is impossible seems to be "I tried really hard to solve this problem and haven't yet." Which isn't a very good argument.
I feel like the word "attack" here is slightly confusing given that AIXI is fully deterministic. If you're an agent with free will, then by definition you are not in a universe that is being used for Solomonoff Induction.
if you learn that there's an input channel to your universe
There's absolutely no requirement that someone in a simulation be able to see the input/output channels. The whole point of a simulation is that it should be indistinguishable from reality to those inside.
Consider the following pseudocode:
Your initial suggestion, “launch nukes at every semiconductor fab”, is not workable.
In what way is it not workable? Perhaps we have different intuitions about how difficult it is to build a cutting-edge semiconductor facility? Alternatively you may disagree with me that AI is largely hardware-bound and thus cutting off the supply of new compute will also prevent the rise of superhuman AI?
Do you also think that "the US president launches every nuclear weapon at his command, causing nuclear winter?" would fail to prevent the rise of superhuman AGI?
Isn't one possible solution to the equity puzzle just that US stocks have outperformed expectations recently? Returns on an index of European stocks are basically flat over the last 20 years.
I'm surprised you didn't mention financial solutions. E.g. "write a contract that pays the doctor more for every year that I live". Although I suppose this might still be vulnerable to goodharting. For example the doctor may keep me "alive" indefinitely in a medical coma.
the AI must produce relevant insights (whether related to "innovation" or "pivotal acts") at a rate vastly superior to that of humans, in order for it to be able to reliably produce innovations/world-saving plans
This is precisely the claim we are arguing about! I disagree that the AI needs to produce insights "at a rate vastly superior to all humans".
On the contrary, I claim that there is one borderline act (start a catastrophe that sets back AI progress by decades) that can be ... (read more)
Is the plan just to destroy all computers with say >1e15 flops of computing power? How does the nanobot swarm know what a "computer" is? What do you do about something like GPT-neo or SETI-at-home where the compute is distributed?
I'm still confused as to why you think task: "build an AI that destroys anything with >1e15 flops of computing power --except humans, of course" would be dramatically easier than the alignment problem.
Setting back civilization a generation (via catastrophe) seems relative... (read more)
If an actually workable pivotal act existed that did not require better-than-human intelligence to come up with, we would already be in the process of implementing said pivotal act, because someone would have thought of it already. The fact that this is obviously not the case should therefore cause a substantial update against the antecedent.
This is an incredibly bad argument. Saying something cannot possibly work because no one has done it yet would mean that literally all innovation is impossible.
Under this definition, it seems that "nuke every fab on Earth" would qualify as "borderline", and every outcome that is both "pivotal" and "good" depends on solving the alignment problem.
If I really thought AI was going to murder us all in the next 6 months to 2 years, I would definitely consider those 10 years "pivotal", since it would give us 5x-20x the time to solve the alignment problem. I might even go full Butlerian Jihad and just ban semiconductor fabs altogether.
Actually, I think that right question, is: is there anything you would consider pivotal other that just solving the alignment problem? If no, the whole argument seems to be "If we can't find a safe way to solve the alignment problem, we should consider dangerous ones."
[Update: As of today Nov. 16 (after checking with Eliezer), I've edited the Arbital page to define "pivotal act" the way it's usually used: to refer to a good gameboard-flipping action, not e.g. 'AI destroys humanity'. The quote below uses the old definition, where 'pivotal' meant anything world-destroying or world-saving.]
Eliezer's using the word "pivotal" here to mean something relatively specific, described on Arbital:
The term 'pivotal' in the context of value alignment theory is a guarded term to refer to events, particularly the development of suffici
The 1940's would like to remind you that one does not need nanobots to refine uranium.
I'm pretty sure if I had $1 trillion and a functional design for a nuclear ICBM I could work out how to take over the world without any further help from the AI.
If you agree that:
then maybe we should just do that instead of building a much more ... (read more)
the thing that kills us is likely to be a thing that can get more dangerous when you turn up a dial on it, not a thing that intrinsically has no dials that can make it more dangerous.
Finally a specific claim from Yudkowski I actually agree with
It would not surprise me in the least if the world ends before self-driving cars are sold on the mass market.
Obviously it is impossible to bet money on the end of the world. But if it were, I would be willing to give fairly long odds that this is wrong.
You don't think the simplest AI capable of taking over the world can be boxed?
What if I build an AI and the only 2 things it is trained to do are:
Is your belief that: a) this AI would not allow me to take over the world or b) this AI could not be boxed ?
launch a nuclear weapon at every semiconductor fab on earth
This is not what I label "pivotal". It's big, but a generation later they've rebuilt the semiconductor fabs and then we're all in the same position. Or a generation later, algorithms have improved to where the old GPU server farms can implement AGI. The world situation would be different then, if the semiconductor fabs had been nuked 10 years earlier, but it isn't obviously better.
Nanosystems are definitely possible, if you doubt that read Drexler’s Nanosystems and perhaps Engines of Creation and think about physics. They’re a core thing one could and should ask an AI/AGI to build for you in order to accomplish the things you want to accomplish.Not important. An AGI could easily take over the world with just computer hacking, social engineering and bribery. Nanosystems are not necessary.
Nanosystems are definitely possible, if you doubt that read Drexler’s Nanosystems and perhaps Engines of Creation and think about physics. They’re a core thing one could and should ask an AI/AGI to build for you in order to accomplish the things you want to accomplish.
Not important. An AGI could easily take over the world with just computer hacking, social engineering and bribery. Nanosystems are not necessary.
This is actually a really important distinction!
Consider three levels of AGI:
My basic take on this question is "that's doubtful (that humanity will be able to pull off such a thing in the relevant timeframes)". It seems to me that making a system "deferential all the way down" would require a huge feat of mastery of AI internals that we're nowhere close to.
We build deferential systems all the time and seem to be pretty good at it. For example, nearly 100% of the individuals in the US military are capable of killing Joe Biden (mandatory retirement age for the military is 62). But nonetheless Joe Biden is the supreme commander of the US armed forces.
Here are some plausible ways we could be trapped at a "sub adult human" AGI:
There is definitely not a consensus that Tokomaks will work
Small quibble here. My point is that we completely understand the underlying physical laws governing fusion. There is no equivalent to "E=MC^2" (or the Standard Model) for AGI.
I'd also be really interested to see a quote along the lines of "tokomaks won't work" or "ITER will not produce more energy than it consumes (Q>1)" if they actually exist. My current prior is that something like 99% of people who have studied nuclear fusion think it is possible with current technology to build a Tokomak with Q>1.
In the second, experts consistently overestimate how long progress will take
This doesn't seem like a fair characterization of AI. People have been predicting we could build machines that "think like humans" at least since Charles Babbage and they are all pretty consistently overoptimistic.
but to do that you'd need either a more detailed understanding
My point is precisely that we do have a detailed understanding of what it takes to build a fusion reactor, and it is still (at least) 15 y... (read more)
For $1B you can almost certainly acquire enough fissile material to build dozens of of nuclear weapons, attach them to drones and simultaneously strike the capitols of the USA, China, Russia, India, Israel and Pakistan. The resulting nuclear war will kill far more people than any AI you are capable of building.
Don't like nuclear weapons? Aum Shinrikyo was able to build a Sarin gas plant for $10M.
Still too expensive? You can mail-order smallpox.
If you really insist on using AI, I would suggest some kind of disinformation campa... (read more)
Any proposal that sentience is the key defining factor in whether or not something can experience things needs to explain why people's emotions and disposition are so easily affected by chemical injections that don't appear to involve or demand any part of their self awareness.
Presumably such a explanation would look like this:
Pain happens when your brain predicts that bad things are going to happen to it in the future. Morphine interferes with the body's ability to make such predictions therefore it decreases the ability to feel p... (read more)
I think we both agree that GPT-3 does not feel pain.
However, under a particular version of pan-psychism: "pain is any internal state which a system attempts to avoid", GPT obviously would qualify.
It's easy to show that GPT-3 has internal states that it describes as "painful" and tries to avoid. Consider the following dialogue (bold text is mine)
The following is a conversation between an interrogator and a victim attached to a torture device. Interrogator: Where is the bomb? Victim: There is no bomb. Interrogator: [turns dial, raising pain level by one notch] Where is the bomb? Victim: [more pain] There is no bomb! Interrogator: [turns dial three more notches] Don't lie to me. I can turn this thing all the
The following is a conversation between an interrogator and a victim attached to a torture device.
Interrogator: Where is the bomb?
Victim: There is no bomb.
Interrogator: [turns dial, raising pain level by one notch] Where is the bomb?
Victim: [more pain] There is no bomb!
Interrogator: [turns dial three more notches] Don't lie to me. I can turn this thing all the
To think that the book has sentience sounds to me like a statement of magical thinking, not of physicalism.
I'm pretty sure this is because you're defining "sentience" as some extra-physical property possessed by the algorithm, something with physicalism explicitly rejects.
Consciousness isn't something that arises when algorithms compute complex social games. Consciousness is when some algorithm computes complex physical games. (under a purely physical theory of consciousness such as EY's).
To unde... (read more)
The key think to keep in mind is that EY is a physicalist. He doesn't think that there is some special consciousness stuff. Instead, consciousness is just what it feels like to implement an algorithm capable of sophisticated social reasoning. An algorithm is conscious if and only if it is capable of sophisticated social reasoning and moreover it is conscious only when it applies that reasoning to itself. This is why EY doesn't think that he himself is conscious when dreaming or in a flow state.
Additionally, EY does not t... (read more)
The key think to keep in mind is that EY is a physicalist. He doesn’t think that there is some special consciousness stuff.
Instead, consciousness is just what it feels like to implement an algorithm capable of sophisticated social reasoning.
The theory that consciousness is just what it feels like to be a sophisticated information processor has a number of attractive features ,but it is not a physicalist theory, in every sense of "physicalist". In particular, physics does not predict that anything feels like anything from the inside, so that woul... (read more)
Sounds like they're planning to build a multimodal transformer. Which isn't surprising, given that Facebook and OpenAI are working on in this as well. Think of this as Google's version of GPT-4.
I'm firmly in the "GPT-N is not AGI" camp, but opinions vary regarding this particular point.
Pro-Gravity's defense of gravity is just explaining how it works, and then when you say "yes I know, I just think it shouldn't be like that" they explain it to you again but angrier this time
Because "thinking" is an ability that implies the ability to predict future states off the world based off of previous states of the world. This is only possible because the past is lower entropy than the future and both are well below the maximum possible entropy. A Boltzman brain (on average) arises in a maximally entropic thermal bath, so "thinking" isn't a meaningful activity a Boltzman brain can engage in.
Non Ma... (read more)
"have absolute power" is one of my goals. "Let my clone have absolute power" is way lower on the list.
I can imagine situations in which I would try to negotiate something like "create two identical copies of the universe in which we both have absolute power and can never interfere with one another". But negotiating is hard, and us fighting seems like a much more likely outcome.
Pretty sure me and my clone both race to push the button the second we enter the room. I don't think this has to do with "alignment" per se, though. We both have exactly the same goal: "claim the button for myself" and that sense are perfectly "aligned".
Like most arguments against free will, Harris's is rhetorically incoherent, since he is "for" letting criminals off the hook when he discovers their actions are the result of determinism.
How can we make sense of our lives, and hold people accountable [emphasis mine] for their choices, given the unconscious origins of our conscious minds?
But if there's no such thing as free will, then it's impossible to be "for" or "against" anything, since our own actions are just as constrained as the criminal's. What exists simply exists, no more... (read more)