Micurie - LessWrong

Difficulty classes for alignment properties

Thank you for your answer! You clarified my confusion!

I would be interested to know more about your concept of (inner) optimization in its full complexity and nuances. I would really appreciate it if you could point me to any previous writings, regarding this.

My previous reads on this topic include this post from Yudkowsky and this post from Flint where (to the best of my understanding) an optimizing system evolves according to some preference ordering that has a low probability of occurring spontaneously. I find their definitions to be a bit more general than the one you are referring to here (please correct me if I am wrong).

I am curious about the above because I am currently working on a project related to this topic. I am interested in formalizing some concepts regarding optimizers and their potential evolution towards agentic structure in some limit with rigorous math.

Difficulty classes for alignment properties

Micurie3mo30

Hi, I enjoyed reading the post, as it clarified some thoughts that I was having regarding the topic. Could you please briefly elaborate on this part of your post:

..., then for the right operationalization properties about optimization are properties that belong to the complexity class of the system it’s internal to.

I understand the first part of that sentence and the rough idea of what you're getting at, but not exactly what you mean in this particular section.

Personal predictions

Micurie3mo10

This was a very useful and timely post for me. I was on the lookout for a tool to use to evaluate the quality of my predictions, and the Brier score was a concept I didn't know before now. I will try to incorporate this in my daily routine. Thank you!

One additional variable one could also keep track of is something like a "correction" factor. If I predict a given task will take me 1 week to complete, and it ends up taking 2 weeks instead, then next time I am faced with a task of a similar nature I should remember that last time I was wrong by a factor 2. This "correction" factor should be taken into account when making the next prediction.

The caveat of this approach is that it's too simplistic and might not fully grasp say the factors that lead to the delay from (predicted) 1 week to (reality) 2 weeks, which might be caused by external factors. Additionally, I now have better knowledge of the factors that lead me to make a wrong prediction in the first place, and I should (maybe) be better at making predictions.

But I think that the best strategy is keeping the "correction" factor simple and not starting to account for all of these factors. I would rather update the "correction" factor in the next iteration.

LESSWRONG
LW

Posts

Wiki Contributions

Comments