New Answer

New Comment

2 Answers sorted by
top scoring

Mar 25, 2021

Another note : why would the AI touch this layer at all? Actual prototype autonomy systems (my day job) there are device drivers, an RTOS, many hardware details. A surprising amount of complexity for a machine that's only role is to execute some graph on input X and produce control output Y and logs Z.

Most of the improvements you might make are changing the nodes and structure of the graph. There will normally be no need or benefit to changing the graph execution infrastructure. (Nodes are the actual neural networks or algorithms that choose from the outputs the one to use per some rule or choose which boxes in an image detector are probable, and so on. An AGI would presumably be an enormous graph with thousands of nodes)

Growing AI systems may need more - more hardware, of a newer generation - but they won't need to touch how it works as there would be no benefit in speed but most changes would cause it to outright fail. So not useful to optimize.

So yes, flaws of the class above could be hidden for years. There are ways to find them by decompiling and analyzing the bytecode but an AI wouldn't necessarily find such a flaw in itself.

[+][comment deleted]5y10

Saran

Mar 26, 2021

Wouldn't AI rebuild itself from zero to prevent such trojans anyway? Then it is pointless.

I'm sure AI would be aware of such a threat, for example it could scan the internet and stumble upon posts such as this.

[-][anonymous]5y10

Why would it bother? Every last bit of compute power has to be obsessed with [whatever task humans have assigned to it]. An AI that isn't using all it's compute towards it's assigned task is one that gets replaced with one that is.

1sxae5y

We can't really speculate too strongly about the goals of an emerging AGI, so we have to consider all possibilities. "Bothering" is a human construct of thinking that an AGI is under no obligation to conform to. This is why I specify that this is an emerging AGI, where we are in a situation where the result of the iterator is so complex that only the thing iterating it understands the relationship between symbols and output. We can provide discriminators - as I also describe - to try and track an AGI's alignment towards the goals we want, but we absolutely can't guarantee that every last bit of compute is going to be dedicated to anything in particular.

1[anonymous]5y

With tight enough bounds we can. Update: what I mean more exactly: build AIs from modules that are mostly well defined and well optimized. This means that they are already as sparse as we can make them. (meaning they have only necessary weights and the model is scoring the best out of all models of this size on the dataset). This suggests a solution to the alignment problem, actually. Example architecture : a paperclip maximizer. Layer 0 : modules for robotics pathing and manipulation Layer 1: modules for robotics perception Layer 2: modules for laying out robotics on factory floors Layer 3: modules for analyzing return on financial investment Layer 4: high level executive function with the purpose, regressed against paperclips made, to issue commands to lower layers. If we design some of the lower layers well enough - and disable any modification from higher layers - we can restrict what actions the paperclip maximizer even is capable of doing.

Rendering 1/4 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:54 PM

[-]habryka5y30

Edit note: I fixed your images for you. They seemed broken on Chrome since the server on which they were hosted didn't support https.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

15

[ Question ]

Is a Self-Iterating AGI Vulnerable to Thompson-style Trojans?

15

15

2 Answers sorted by
top scoring

Mar 25, 2021

Mar 26, 2021

A concrete example:

Consequences

15

[ Question ]

Is a Self-Iterating AGI Vulnerable to Thompson-style Trojans?

15

15

2 Answers sorted by top scoring

Mar 25, 2021

Mar 26, 2021

A concrete example:

Consequences

2 Answers sorted by
top scoring