curtisrussell - LessWrong

This started out as more of an intuition, so this is mostly an attempt to verbalize that in a concrete way.

If we could formalize a series of relatively simple problems, and similarly formalize what "drifting" from the core values of those toy problems would look like, I wonder if we would either find new patterns, rules, or intuitions.

(I had a pithy remark to the effect of While(drifting) { dont() } )

I think I'm wondering if we can expand and formalize our knowledge of what values drift means, in a way that generalizes independent of any specific, formalized values.

Worlds Where Iterative Design Fails

curtisrussell2yΩ390

I wonder, almost just idle curiosity, whether or not the "measuring-via-proxy will cause value drift" is something we could formalize and iterate on first. Is the problem stable on the meta-level, or is there a way we can meaningfully define "not drifting from the proxy" without just generally solving alignment.

Intuitively I'd guess this is the "don't try to be cute" class of thought, but I was afraid to post at all and decided that I wanted to interact, even at the cost of (probably) saying something embarassing.

Welcome to Less Wrong! (8th thread, July 2015)

curtisrussell9y110

Hello everyone! Came to less wrong as a lurker something like a two years ago (Perhaps more, my grasp on time is... fragile at best), and binged through all of HPMOR that was up then, and waited with bated breath for the rest. After a long time spent lurking, reading the blogs and then the e-book, I decided I wanted to do more than aimlessly wander through readings and sequences.

So here I am! I posted to the lounge on reddit, and now I'm posting here. The essence of why I'm posting now is simple: I want to start down a road towards aiding in the work towards FAI. I graduated a year and a half ago, and I want to start learning in a directed and purposeful way. So I'm here to ask for advice on where and how to get started, outside of standard higher education.

LESSWRONG
LW

Posts

Wiki Contributions

Comments