## LESSWRONGLW

Gyrodiot

I'm Jérémy Perret. Based in France. PhD in AI (NLP). AI Safety & EA meetup organizer. Information sponge. Mostly lurking since 2014. Seeking more experience, and eventually a position, in AI safety/governance.

Extremely annoyed by the lack of an explorable framework for AI risk/benefits. Working on that.

# Sequences

XiXiDu's AI Risk Interview Series

Sorted by New

# Wiki Contributions

Paradigms of AI alignment: components and enablers

Thank you for this post, I find this distinction very useful and would like to see more of it. Has the talk been recorded, by any chance (or will you give it again)?

[$20K in Prizes] AI Safety Arguments Competition Thank you, that's was my understanding. Looking forward to the second competition! And, good luck sorting out all the submissions for this one. [$20K in Prizes] AI Safety Arguments Competition

[Meta comment]

The deadline is past, should we keep the submissions coming or is it too late? Some of the best arguments I could find elsewhere are rather long, in the vein of the Superintelligence FAQ. I did not want to copy-paste chunks of it and the arguments stand better as part of a longer format.

Anyway, signalling that the lack of money incentive will not stop me from trying to generate more compelling arguments... but I'd rather do it in French instead of posting here (I'm currently working on some video scripts on AI alignment, there's not enough French content of that type).

[$20K in Prizes] AI Safety Arguments Competition (Policymakers) We have a good idea of what make bridges safe, through physics, materials science and rigorous testing. We can anticipate the conditions they'll operate in. The very point of powerful AI systems is to operate in complex environments better than we can anticipate. Computer science can offer no guarantees if we don't even know what to check. Safety measures aren't catching up quickly enough. We are somehow tolerating the mistakes of current AI systems. Nothing's ready for the next scale-up. [$20K in Prizes] AI Safety Arguments Competition

(ML researchers) We still don't have a robust solution to specification gaming: powerful agents find ways to get high reward, but not in the way you'd want. Sure, you can tweak your objective, add rules, but this doesn't solve the core problem, that your agent doesn't seek what you want, only a rough operational translation.

What would a high-fidelity translation would look like? How would create a system that doesn't try to game you?

[$20K in Prizes] AI Safety Arguments Competition (Policymakers) There is outrage right now about AI systems amplifying discrimination and polarizing discourse. Consider that this was discovered after they were widely deployed. We still don't know how to make them fair. This isn't even much of a priority. Those are the visible, current failures. Given current trajectories and lack of foresight of AI research, more severe failures will happen in more critical situations, without us knowing how to prevent them. With better priorities, this need not happen. [$20K in Prizes] AI Safety Arguments Competition

(Tech execs) "Don’t ask if artificial intelligence is good or fair, ask how it shifts power". As a corollary, if your AI system is powerful enough to bypass human intervention, it surely won't be fair, nor good.

[$20K in Prizes] AI Safety Arguments Competition (ML researchers) Most policies are unsafe in a large enough search space; have you designed yours well, or are you optimizing through a minefield? [$20K in Prizes] AI Safety Arguments Competition

(Policymakers) AI systems are very much unlike humans. AI research isn't trying to replicate the human brain; the goal is, however, to be better than humans at certain tasks. For the AI industry, better means cheaper, faster, more precise, more reliable. A plane flies faster than birds, we don't care if it needs more fuel. Some properties are important (here, speed), some aren't (here, consumption).

When developing current AI systems, we're focusing on speed and precision, and we don't care about unintended outcomes. This isn't an issue for most systems: a plane autopilot isn't making actions a human pilot couldn't do; a human is always there.

However, this constant supervision is expensive and slow. We'd like our machines to be autonomous and quick. They perform well on the "important" things, so why not give them more power? Except, here, we're creating powerful, faster machines that will reliably do thing we didn't have time to think about. We made them to be faster than us, so we won't have time to react to unintended consequences.

This complacency will lead us to unexpected outcomes. The more powerful the systems, the worse they may be.

[\$20K in Prizes] AI Safety Arguments Competition

(Tech execs) Tax optimization is indeed optimization under the constraints of the tax code. People aren't just stumbling on loopholes, they're actually seeking them, not for the thrill of it, but because money is a strong incentive.

Consider now AI systems, built to maximize a given indicator, seeking whatever strategy is best, following your rules. They will get very creative with them, not for the thrill of it, but because it wins.

Good faith rules and heuristics are no match for adverse optimization.