Wiki Contributions

Comments

I think this is a good thing to do! I reccomend looking up things like "reflections on my LTFF upskilling grant" for similar pieces from lesser known researchers / aspiring researchers.

Jack O'Brien10mo1211

Hey thanks for writing this up! I thought you communicated the key details excellently - in particular these 3 camps of varying alignment difficulty worlds, and the variation within those camps. Also I think you included just enough caveats and extra details to give readers more to think about, but without washing out the key ideas of the post.

Just wanted to say thanks, this post makes a great reference for me to link to.

Definition: .


A useful alternate definition of this is:

Where  refers to . Proof:

Yep, fair point. In my original comment I seemed to forget about the problem of AIs goodharting our long reflection. I probably agree now that doing a pivotal act into a long reflection is approximately as difficult as solving alignment.

(Side-note about how my brain works: I notice that when I think through all the argumentative steps deliberately, I do believe this statement: "Making an AI which helps humans clarify their values is approximately as hard as making an AI care about any simple, specific thing." However it does not come to mind automatically when I'm reasoning about alignment. 2 Possible fixes:

  1. Think more concretely about Retargeting the Search when I think about solving alignment. This makes the problems seem similar in difficulty.
  2. Meditate on just how hard it is to target an AI at something. Sometimes I forget how Goodhartable any objective is. )

This post was incredibly interesting and useful to me. I would strong-upvote it, but I don't think this post should be promoted to more people. I've been thinking about the question of "who are we aligning AI to" for the past two months.

I really liked your criticism of the Long Reflection because it is refreshingly different from e.g. Macaskill and Ord's writing on the long reflection. I'm still not convinced that we can't avoid all of the hellish things you mentioned like synthetic superstimuli cults and sub-AGI drones. Why can't we just have a simple process of open dialogue with values of truth, individual agency during the reflection, and some clearly defined contract at the end of the long reflection to like, take power away from the AGI drones?

3 is my main reason for wanting to learn more pure math, but I use 1 and 2 to help motivate me

which of these books are you most excited about and why? I also want to do more fun math reading

Let's be optimistic and prove that an agentic AI will be beneficial for the long-term future of humanity. We probably need to prove these 3 premises:

Premise 1:  Training story X will create an AI model which approximates agent formalism A
Premise 2: Agent formalism A is computable and has a set of alignment properties P
Premise 3: An AI with a set of alignment properties P will be beneficial for the long-term future.

Aaand so far I'm not happy with our answers to any of these.

Load More