Olivier Coutu - LessWrong

All AGI Safety questions welcome (especially basic ones) [May 2023]

Charlie is correct in saying that GPT-4 does not actively learn based on its input. But a related question is whether we are missing key technical insights for AGI, and Stampy has an answer for that. He also has an answer explaining scaling laws.

All AGI Safety questions welcome (especially basic ones) [May 2023]

Olivier Coutu1y10

These are interesting questions that modern philosophers have been pondering. Stampy has an answer on forcing people to change faster than they would like and we are working on adding more answers that attempt to guess what an (aligned) superintelligence might do.

All AGI Safety questions welcome (especially basic ones) [May 2023]

Olivier Coutu1y10

These are great questions! Stampy does not currently have an answer for the first one, but its answer on prosaic alignment could get you started on ways that some people think might work without needing additional breakthroughs.

Regarding the second question, the plan seems to be to use less powerful AIs to align more powerful AIs and the hope would be that these helper AIs would not be powerful enough for misalignment to be an issue.

All AGI Safety questions welcome (especially basic ones) [May 2023]

Olivier Coutu1y30

These question-answering AIs are often called Oracles, you can find some info on them here. Their cousin tool AI is also relevant here. You'll discover that they are probably safer but by no means entirely safe.

We are working on an answer for the safety of Oracles for Stampy, keep your eyes peeled it should show up soon.

Problems of people new to AI safety and my project ideas to mitigate them

Olivier Coutu2y10

The 80k hours job board is a good one, there might be others.

Why I hate the "accident vs. misuse" AI x-risk dichotomy (quick thoughts on "structural risk")

Olivier Coutu2y30

I feel like the adjective "accidentally" still makes sense to convey "not on purpose", and the "naked on the porch" situation is a good example of that. This distinction can be made regardless of the level of blame (or here shame) that should be inflicted. I don't feel like this applies to the noun "accident" and I doubt that the radio hosts would have called this "an accident where a man was locked outside naked".

Regarding the kid, I agree with you that it suggests "I should be blamed less (or not at all)" and the level of blame should somewhat depend on whether the action was intentional or not.

The Three Mile Island accident is interesting. If I were to guess, the phrasing of the commission was chosen to emphasize that this was not believed to be intentional sabotage. I would have preferred to call it an "incident" (more technically a "partial meltdown", but that's pretty scary for a commission name).

From what I'm reading, my understanding is accident → not intentional → reduce blame whereas you disagree with that last arrow or at least the strength of the reduction. It is my opinion that this term should not be used when we do not want to reduce blame, e.g. for sloppy AI safety measures. I feel that our disagreement has been made clear and we are arguing about the meaning of a word, but you're welcome to reply if you don't believe we have reached the crux.

Why I hate the "accident vs. misuse" AI x-risk dichotomy (quick thoughts on "structural risk")

Olivier Coutu2y86

In the field of road safety, the term "accident" is being depreciated. My understanding is that "It was an accident" suggests "It wasn't done on purpose, I didn't see it coming and I shouldn't be blamed for it", as a child would say to a parent after breaking a vase through negligence. In my mind, people get blamed for failures, not accidents, and your example sentences suggest someone attempting to dodge the responsibility for their actions.

With this framing, I can't currently think of an event that I would label "an accident", although "an accidental collision" would make sense to differentiate it from a collision that was done on purpose.

LESSWRONG
LW

Posts

Wiki Contributions

Comments