adam_shimi — LessWrong

New Paper Expanding on the Goodhart Taxonomy

First, thanks to both of you for writing this really nice paper. I have two questions, which are more about my understanding (or lack thereof) than about issues with the work.

My first question is some example I have in mind, and if my classification for it makes sense. In this example, the regulator is a grant-maker, and the agents are researchers in a field that is very theoretical. Our regulator wants to optimize for as much concrete applications as possible, regardless of what researchers are interested in (any similarity with the real world is obviously unintended). In order to reach this goal, the regulator will fund in priority grant-proposals promising applications. Yet the researchers could just write about applications while only pursuing the theory (still no intented similarity with the real world...).

It seems clear to me that this is an instance of adversarial goodhart, and more specifically of adversarial misalignment goodhart. Do you also think it is the case? If not, why?

My second question is more of a request: what "concrete" example could you give me of non-causal cobra effect goodhart? I have some trouble visualizing it.

Deregulating Distraction, Moving Towards the Goal, and Level Hopping

adam_shimi11y10

I especially like the Moving Towards the Goal trick, since doubting between possible "life projects" and related fields of study is my number one nemesis in getting things done.

For the level hopping, I tend to do a little more hardcore version : setting a quite impossible goal (like learning quantum computation in a month). I keep doing it mostly because of the thrill of the impossible challenge (use of my ego) and the fact that failure isn't quite stressing anymore. My goal is by definition impossible, so the point is to complete it as much as possible, not to perfectly finish it.

Welcome to Less Wrong! (7th thread, December 2014)

adam_shimi11y00

Welcome Matt. :) Can you explain a little more what you mean by rationalist rap battle? Seems fun.

Habitual Productivity

adam_shimi11y-10

I'm astonished. Absolutely on my ass. You used your guilt when having "useless activities" to actually do something productive! Why on Earth didn't I think of that? I feel quite like you explained, never satisfied or relaxed by activities not improving my skills and my intellectual abilities. But, following my entourage's advices, I always tried to force myself to enjoy "having fun". Whereas I should have done the opposite!

Thank you. Really.

Welcome to Less Wrong! (7th thread, December 2014)

adam_shimi11y130

Hello LessWrongers! After discovering the blog and MIRI research papers through a friend (Gyrodiot ) a few weeks ago, I finally decided to register here. For I keep seeing fascinating discussions I want to be part of, and I also would like to share my ideas about IA and rationnalism.

Currently, I am a first year student in an french Engineering school in Computer science and applied mathematics. Before that, I was in "Classes Préparatoires" for two years, an intensive formation in mathematics and physics to pass engineering school contests. Even If it was quite harsh (basically 30 hours of classes + 5 hours exam + homeworks impossible to finish every week), it gave me some kicks to become a post-rigorous mathematics student. (post-rigorous being here the definition of Terence Tao : http://terrytao.wordpress.com/career-advice/there%E2%80%99s-more-to-mathematics-than-rigour-and-proofs/ )

For my interest, I am actually working with one of my teacher on a online handwriting OCR based on a model of oscillatory handwriting he developped. But we also explore the cognitive consequences of the model, mostly Piaget's idea of assimilation, which can be linked to modern discoveries about mirror neurons. I also self-study Quantum Computation, even more now that there is high probability I will be on a summer research internship on Quantum information theory.

On the topics I saw here on LW and on the MIRI web-site, I think the corrigibility is the one that interests me the most.

That's all folks. ;)

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments