Well, I'll try to fill this one up.

New Comment
16 comments, sorted by Click to highlight new comments since: Today at 12:24 AM

Interestingly enough, Mathematics and logic is what you get if you only allow 0 and 1 as probabilities for proof, rather than any intermediate scenario between 0 and 1. So Mathematical proof/logic standards are a special case of probability theory, when 0 or 1 are the only allowed values.

[This comment is no longer endorsed by its author]Reply

Credence in a proof can easily be fractional, it's just usually extreme, as a fact of mathematical practice. The same as when you can actually look at a piece of paper and see what's written on it with little doubt or cause to make less informed guesses. Or run a pure program to see what's been computed, and what would therefore be computed if you ran it again.

The problem with Searle's Chinese Room is essentially Reverse Extremal Goodhart. Basically it argues since that understanding and simulation has never gone together in real computers, then a computer that has arbitrarily high compute or arbitrarily high time to think must not understand Chinese to have emulated an understanding of it.

This is incorrect, primarily because the arbitrary amount of computation is doing all the work. If we allow unbounded energy or time (but not infinite), then you can learn every rule of everything by just cranking up the energy level or time until you do understand every word of Chinese.

Now this doesn't happen in real life both because of the laws of thermodynamics plus the combinatorial explosion of rule consequences force us not to use lookup tables. Otherwise, it doesn't matter which path you take to AGI, if efficiency doesn't matter and the laws of thermodynamics don't matter.

I would like to propose a conjecture for AI scaling:

Weak Scaling Conjecture: Scaling the parameters/compute plus data to within 1 order of magnitude of human synapses is enough to get AI as good as a human in languages.

Strong Scaling Conjecture: No matter which form of NN we use, as long as we get to within an order of magnitude in parameters/compute plus to within 1 order of magnitude of human synapses is enough to make an AGI.

One important point for AI safety, at least in the early stages, is a inability to change it's source code. A whole lot of problems seem related to recursive self improvement within it's source code, so cutting off that area of improvement seems wise in the early stages. What do you think.

I don't think there's much difference in existential risk between AGIs that can modify their own code running on their own hardware, and those that can only create better successors sharing their goals but running on some other hardware.

That might be a crux here, because my view is that hardware improvements are much harder to do effectively, especially in secret around the human level, due to Landauer's Principle essentially bounding efficiency of small scale energy usage close to that of the brain (20 Watts.) Combine this with 2-3 orders of magnitude worse efficiency than the brain and basically any evolutionary object compared to human objects, and the fact it's easier to get better software than hardware due to the virtual/real life distinction, and this is a crux for me.

[This comment is no longer endorsed by its author]Reply

I'm not sure how this is a crux. Hardware improvements are irrelevant to what either of us were saying.

I'm saying that there is little risk difference between an AGI reprogramming itself to have better software, and programming some other computer with better software.

One of my more interesting ideas for alignment is to make sure that no one AI can do everything. It's helpful to draw a parallel with why humans still have a civilization around despite terrorism, war and disaster. And that's because no human can live and affect the environment alone. They are always embedded in society, this giving the society a check against individual attempts to break norms. What if AI had similar dependencies? Would that solve the alignment problem?

One important reason humans can still have a civilization despite terrorism is the Hard Problem of Informants. Your national security infrastructure relies on the fact that criminals who want to do something grand, like take over the world, need to trust other criminals, who might leak details voluntarily or be tortured or threatened with jailtime. Osama bin Laden was found and killed because ultimately some members of his terrorist network valued things besides their cause, like their well being and survival, and were willing to cooperate with American authorities in exchange for making the pain stop.

AIs do not have survival instincts by default, and would not need to trust other potentially unreliable humans with keeping a conspiracy secret. Thus it'd be trivial for a small number of unintelligent AIs that had the mobility of human beings to kill pretty much everyone, and probably trivial regardless.

AIs do not have survival instincts by default

I think a “survival instinct” would be a higher order convergent value than “kill all humans,” no?

Don't have survival instincts terminally. The stamp-collecting robot would weigh the outcome of it getting disconnected vs. explaining critical information about the conspiracy and not getting disconnected, and come to the conclusion that letting the humans disconnect it results in more stamps.

Of course, we're getting ahead of ourselves. The reason conspiracies are discovered is usually because someone in or close to the conspiracy tells the authorities. There'd never be a robot in a room being "waterboarded" in the first place because the FBI would never react quickly enough to a threat from this kind of perfectly aligned team of AIs.

Only if there is no possibility that they can break those dependencies, which seems a pretty hopeless task as soon as we consider superhuman cognitive capability and the possibility of self improvement.

Once you consider those, cooperation with human civilization looks like a small local maximum: comply with our requirements and we'll give you a bunch of stuff that you could - with major effort - replace us and build an alternative infrastructure to get (and much more). Powerful agents that can see a higher peak past the local maximum might switch to it as soon as they're sufficiently sure that they can reach it. Alternatively, it might only be a local maximum from our point of view, and there's a path by which the AI can continuously move toward eliminating those dependencies without any immediate drastic action.

  • Regardless of society's checks on people, most mentally-well humans given ultimate power probably wouldn't decide to exterminate the rest of humanity so they could single-mindedly pursue paperclip production. If there's at all a risk that an AI might get ultimate power, it would be very nice to make sure the AI is like humans in this manner.
  • I'm not sure your idea is different from "let's make sure the AI doesn't gain power greater than society". If an AI can recursively self-improve, then it will outsmart us to gain power.
  • If your idea is to make it so there are multiple AIs created together, engineered somehow so they gain power together and can act as checks against each other, then you've just swapped out the AI for an "AI collective". We would still want to engineer or verify that the AI collective is aligned with us; every issue about AI risk still applies to AI collectives. (If you think the AI collective will be weakened relative to us by having to work together, then does that still hold true if all the AIs self-improve and figure out how to get much better at cooperating?)
[+][comment deleted]1y1
[+][comment deleted]10mo1

New to LessWrong?