All of robertzk's Comments + Replies

The Main Sources of AI Risk?

Inspecting code against a harm detection predicate seems recursive. What if the code or execution necessary to perform that inspection properly itself is harmful? An AGI is almost certainly a distributed system with no meaningful notion of global state, so I doubt this can be handwaved away.

For example, a lot of distributed database vendors, like Snowflake, do not offer a pre-execution query planner. This can only be performed just-in-time as the query runs or retroactively after it has completed, as the exact structure may be dependent on co-location of d

... (read more)
3John_Maxwell2yOne possibility is a sort of proof by induction, where you start with code which has been inspected by humans, then that code inspects further code, etc. Daemons and mindcrime seem most worrisome for superhuman systems, but a human-level system is plausibly sufficient to comprehend human values (and thus do useful inspections). For daemons, I think you might even be able to formalize the idea without leaning hard on any specific utility function. The best approach might involve utility uncertainty on the part of the AI that becomes narrower with time, so you can gradually bootstrap your way to understanding human values while avoiding computational hazards according to your current guesses about human values on your way there. People already choose not to think about particular topics on the basis of information hazards and internal suffering. Sometimes these judgements are made in an interrupt fashion partway through thinking about a topic; others are outside view judgments ("thinking about topic X always makes me feel depressed").
Examples of AI's behaving badly

Isn't this an example of a reflection problem? We induce this change in a system, in this case an evaluation metric, and now we must predict not only the next iteration but the stable equilibria of this system.

In Praise of Maximizing – With Some Caveats

Did you remove the vilification of proving arcane theorems in algebraic number theory because the LessWrong audience is more likely to fall within this demographic? (I used to be very excited about proving arcane theorems in algebraic number theory, and fully agree with you.)

2David Althaus6yYou've got me there :)
Restrictions that are hard to hack

Incidentally, for a community whose most important goal is solving a math problem, why is there no MathJax or other built-in Latex support?

Restrictions that are hard to hack

The thing that eventually leapt out when comparing the two behaviours is that behaviour 2 is far more informative about what the restriction was, than behaviour 1 was.

It sounds to me like the agent overfit to the restriction R. I wonder if you can draw some parallels to the Vapnik-style classical problem of empirical risk minimization, where you are not merely fitting your behavior to the training set, but instead achieve the optimal trade-off between generalization ability and adherence to R.

In your example, an agent that inferred the boundaries of our... (read more)

1Stuart_Armstrong6yThanks, looking at the Vapnik stuff now.
5robertzk6yIncidentally, for a community whose most important goal is solving a math problem, why is there no MathJax or other built-in Latex support?
Andrew Ng dismisses UFAI concerns

However, UFFire does not uncontrollably exponentially reproduce or improve its functioning. Certainly a conflagration on a planet covered entirely by dry forest would be an unmitigatable problem rather quickly.

In fact, in such a scenario, we should dedicate a huge amount of resources to prevent it and never use fire until we have proved it will not turn "unfriendly".

-2Locaha6yDo you realize this is a totally hypothetical scenario?