You may recognize several familiar names there, such as Paul Christiano, Benja Fallenstein, Katja Grace, Nick Bostrom, Anna Salamon, Jacob Steinhardt, Stuart Russell... and me. (the $20,000 for my project was the smallest grant that they gave out, but hey, I'm definitely not complaining. ^^)

20 comments, sorted by Click to highlight new comments since: Today at 12:23 AM
New Comment

Anyone know more about this proposal from IDSIA?

Technical Abstract: "Whenever one wants to verify that a recursively self-improving system will robustly remain benevolent, the prevailing tendency is to look towards formal proof techniques, which however have several issues: (1) Proofs rely on idealized assumptions that inaccurately and incompletely describe the real world and the constraints we mean to impose. (2) Proof-based self-modifying systems run into logical obstacles due to Löb's theorem, causing them to progressively lose trust in future selves or offspring. (3) Finding nontrivial candidates for provably beneficial self-modifications requires either tremendous foresight or intractable search.

Recently a class of AGI-aspiring systems that we call experience-based AI (EXPAI) has emerged, which fix/circumvent/trivialize these issue. They are self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated or dismissed when falsified. We expect EXPAI to have high impact due to its practicality and tractability. Therefore we must now study how EXPAI implementations can be molded and tested during their early growth period to ensure their robust adherence to benevolence constraints.

I did some searching but Google doesn't seem to know anything about this "EXPAI".

I didn't find anything on EXPAI either, but there's the PI's list of previous publications. At least his Bounded Seed-AGI paper sounds somewhat related:

Abstract. Four principal features of autonomous control systems are left both unaddressed and unaddressable by present-day engineering methodologies: (1) The ability to operate effectively in environments that are only partially known at design time; (2) A level of generality that allows a system to re-assess and redefine the fulfillment of its mission in light of unexpected constraints or other unforeseen changes in the environment; (3) The ability to operate effectively in environments of significant complexity; and (4) The ability to degrade gracefully— how it can continue striving to achieve its main goals when resources become scarce, or in light of other expected or unexpected constraining factors that impede its progress. We describe new methodological and engineering principles for addressing these shortcomings, that we have used to design a machine that becomes increasingly better at behaving in underspecified circumstances, in a goal-directed way, on the job, by modeling itself and its environment as experience accumulates. The work provides an architectural blueprint for constructing systems with high levels of operational autonomy in underspecified circumstances, starting from only a small amount of designer-specified code—a seed. Using value-driven dynamic priority scheduling to control the parallel execution of a vast number of lines of reasoning, the system accumulates increasingly useful models of its experience, resulting in recursive self-improvement that can be autonomously sustained after the machine leaves the lab, within the boundaries imposed by its designers. A prototype system named AERA has been implemented and demonstrated to learn a complex real-world task—real-time multimodal dialogue with humans—by on-line observation. Our work presents solutions to several challenges that must be solved for achieving artificial general intelligence.

[-][anonymous]7y 4

I saw this news and came back just to say congrats Kaj! I'm looking forward to reading about your thesis work.

I'm surprised and pleased by the diversity of the research space they are exploring. Specifically it's great to see proposals investigating robustness for machine learning and the applications of mechanism design to AI dynamics.

I'm disappointed that my group's proposal to work on AI containment wasn't funded, and no other AI containment work was funded, either. Still, some of the things that were funded do look promising. I wrote a bit about what we proposed and the experience of the process here.

When considering possible failure modes for this proposal, one possibility I didn’t consider was that original research portions would look too much like summaries of existing work.

Oh man, that sucks. :(

I am not an expert (not even an amateur) in the area, but I wonder if the AI containment work would be futile without corrigibility figured out, and superfluous once it is? What is the window of AI intelligence where it is not yet super-human (too late to contain), but already too smart to be contained by the standard means?

I feel for you. I agree with salvatier's point in the linked page. Why don't you try to talk to FHI directly? They should be able to get some funding your way.