I put up a project for AISC. The easiest way to evaluate your fit is to look at the applicant questions I created. If the kind of reasoning required to answer these questions, is something you are excited about doing, then you are probably a good fit.

For more details, you can look at the project proposal. This is the introductory section thereof:

Modern deep learning is about having a simple program (SGD) search over a space of possible programs (the weights of a neural network) and select one that performs well according to a loss function.

Even though the search program is simple, the programs it finds are neither simple nor understandable. This makes it determine if a program that is spit out by the search procedure will be misaligned. It also makes it difficult to robustly bias our search procedure to select programs with specific properties such as non-deception, corrigibility, honesty, wanting what we want, etc.

My goal is to build an AI system that enables a pivotal act by figuring out the algorithms of intelligence directly, without running algorithmic search procedures that yield uninterpretable results. You could say I want to make myself play the part that SGD plays in the modern paradigm. The ideal outcome is to figure out how to write down the entire pivotal system as a non-self-modifying program explicitly, similar to how I can write down the algorithm for quicksort.

The idea is to create an algorithm that is analyzable and amenable. At every step during the design process, I want to push the system towards being understandable, aligned, and capable. Any alignment-related problems should be fixed as they come up, by deeply reaching inside the system and making the necessary changes instead of applying a superficial patch.

The goal is not to solve alignment in full generality but to build a highly restricted system that enables a pivotal act. Note that I am using Eliezer's definition of pivotal act which means something quite specific. People tend to get confused by this term. I wrote this article in an attempt to clear up some of the confusion.

The next three sections will become increasingly more concrete. The "Background" section presents background concepts to this agenda. "Directions" outlines some directions I would like to explore during the AISC. "The Plan" outlines a concrete plan, mainly for the beginning of the AISC.

New to LessWrong?

New Comment