You may recognize several familiar names there, such as Paul Christiano, Benja Fallenstein, Katja Grace, Nick Bostrom, Anna Salamon, Jacob Steinhardt, Stuart Russell... and me. (the $20,000 for my project was the smallest grant that they gave out, but hey, I'm definitely not complaining. ^^)

New Comment
20 comments, sorted by Click to highlight new comments since: Today at 2:12 AM

Anyone know more about this proposal from IDSIA?

Technical Abstract: "Whenever one wants to verify that a recursively self-improving system will robustly remain benevolent, the prevailing tendency is to look towards formal proof techniques, which however have several issues: (1) Proofs rely on idealized assumptions that inaccurately and incompletely describe the real world and the constraints we mean to impose. (2) Proof-based self-modifying systems run into logical obstacles due to Löb's theorem, causing them to progressively lose trust in future selves or offspring. (3) Finding nontrivial candidates for provably beneficial self-modifications requires either tremendous foresight or intractable search.

Recently a class of AGI-aspiring systems that we call experience-based AI (EXPAI) has emerged, which fix/circumvent/trivialize these issue. They are self-improving systems that make tentative, additive, reversible, very fine-grained modifications, without prior self-reasoning; instead, self-modifications are tested over time against experiential evidences and slowly phased in when vindicated or dismissed when falsified. We expect EXPAI to have high impact due to its practicality and tractability. Therefore we must now study how EXPAI implementations can be molded and tested during their early growth period to ensure their robust adherence to benevolence constraints.

I did some searching but Google doesn't seem to know anything about this "EXPAI".

I didn't find anything on EXPAI either, but there's the PI's list of previous publications. At least his Bounded Seed-AGI paper sounds somewhat related:

Abstract. Four principal features of autonomous control systems are left both unaddressed and unaddressable by present-day engineering methodologies: (1) The ability to operate effectively in environments that are only partially known at design time; (2) A level of generality that allows a system to re-assess and redefine the fulfillment of its mission in light of unexpected constraints or other unforeseen changes in the environment; (3) The ability to operate effectively in environments of significant complexity; and (4) The ability to degrade gracefully— how it can continue striving to achieve its main goals when resources become scarce, or in light of other expected or unexpected constraining factors that impede its progress. We describe new methodological and engineering principles for addressing these shortcomings, that we have used to design a machine that becomes increasingly better at behaving in underspecified circumstances, in a goal-directed way, on the job, by modeling itself and its environment as experience accumulates. The work provides an architectural blueprint for constructing systems with high levels of operational autonomy in underspecified circumstances, starting from only a small amount of designer-specified code—a seed. Using value-driven dynamic priority scheduling to control the parallel execution of a vast number of lines of reasoning, the system accumulates increasingly useful models of its experience, resulting in recursive self-improvement that can be autonomously sustained after the machine leaves the lab, within the boundaries imposed by its designers. A prototype system named AERA has been implemented and demonstrated to learn a complex real-world task—real-time multimodal dialogue with humans—by on-line observation. Our work presents solutions to several challenges that must be solved for achieving artificial general intelligence.


I saw this news and came back just to say congrats Kaj! I'm looking forward to reading about your thesis work.

I'm surprised and pleased by the diversity of the research space they are exploring. Specifically it's great to see proposals investigating robustness for machine learning and the applications of mechanism design to AI dynamics.

Strange that there is no direct investments in MIRI. Most of Bostroms ideas from the book "Superintelligence" came from EY.

There's the $250,000 to Benja Fallenstein (employed at MIRI) and the "Aligning Superintelligence With Human Interests" project, which also happens to be the name of MIRI's technical research agenda... :)

That is false. Bostrom thought of FAI before Eliezer. Paul thought of the Crypto. Bostrom and Armstrong have done more work on orthogonality. Bostrom/Hanson came up with most of the relevant stuff in multipolar scenarios. Sandberg/EY were involved in the oracle/tool/sovereign distinction.

TDT, which is EY work does not show up prominently in Superintelligence. CEV, of course, does, and is EY work. Lots of ideas on Superintelligence are causally connected to Yudkowksy, but no doubt there is more value from Bostrom there than from Yudkowsky.

Bostrom got 1.500.000 and MIRI, through Benja, got 250.000. This seems justified conditional on what has been produced by FHI and MIRI in the past.

Notice also that CFAR, through Anna, has received resources that will also be very useful to MIRI, since it will make potential MIRI researchers become CFAR alumni.

Bostrom thought of FAI before Eliezer.

To be completely fair, although Nick Bostrom realized the importance of the problem before Eliezer, Eliezer actually did more work on it, and published his work earlier. The earliest publication I can find from Nick on the topic is this short 2003 paper basically just describing the problem, at which time Eliezer had already published Creating Friendly AI 1.0 (which is cited by Nick).

Bostrom thought of FAI before Eliezer.

Do you have the link for that or at least the keywords? I assume Bostrom called it something else.

See this 1998 discussion between Eliezer and Nick. Some relevant quotes from the thread:

Nick: For example, if it is morally preferred that the people who are currently alive get the chance to survive into the postsingularity world, then we would have to take this desideratum into account when deciding when and how hard to push for the singularity.

Eliezer: Not at all! If that is really and truly and objectively the moral thing to do, then we can rely on the Post-Singularity Entities to be bound by the same reasoning. If the reasoning is wrong, the PSEs won't be bound by it. If the PSEs aren't bound by morality, we have a REAL problem, but I don't see any way of finding this out short of trying it.

Nick: Indeed. And this is another point where I seem to disagree with you. I am not at all certain that being superintelligent implies being moral. Certainly there are very intelligent humans that are also very wicked; I don't see why once you pass a certain threshold of intelligence then it is no longer possible to be morally bad. What I might agree with, is that once you are sufficiently intelligent then you should be able to recognize what's good and what's bad. But whether you are motivated to act in accordance with these moral convictions is a different question.

Eliezer: Do you really know all the logical consequences of placing a large value on human survival? Would you care to define "human" for me? Oops! Thanks to your overly rigid definition, you will live for billions and trillions and googolplexes of years, prohibited from uploading, prohibited even from ameliorating your own boredom, endlessly screaming, until the soul burns out of your mind, after which you will continue to scream.

Nick: I think the risk of this happening is pretty slim and it can be made smaller through building smart safeguards into the moral system. For example, rather than rigidly prescribing a certain treatment for humans, we could add a clause allowing for democratic decisions by humans or human descendants to overrule other laws. I bet you could think of some good safety-measures if you put your mind to it.

Nick: How to contol a superintelligence? An interesting topic. I hope to write a paper on that during the Christmas holiday. [Unfortunately it looks like this paper was never written?]

I assume Bostrom called it something else.

He used "control", which is apparently still his preferred word for the problem today, as in "AI control".

This is fascinating, thank you! It feels like while Nick is pointing in the right direction and Eliezer in the wrong direction here, this is from a time before either of them have had the insights that bring us to seeing the problem in anything like the way we see it today. Large strides have been made by the time of the publication of CFAI three years later, but as Eliezer tells it in "coming of age" story, his "naturalistic awakening" isn't till another couple of years after that.

Also, remember Elizier was only 20 years old at this time. I am the same age and had just started college then in 98. Bostrom was 25.

I find this interesting in particular:

For example, rather than rigidly prescribing a certain treatment for humans, we could add a clause allowing for democratic decisions by humans or human descendants to overrule other laws. I bet you could think of some good safety-measures if you put your mind to it.

They could be talking about a new government, rather than an AI.

Eliezer was only 20 years old at this time

Actually 19!

For those who haven't been around as long as Wei Dai…

Eliezer tells the story of coming around to a more Bostromian view, circa 2003, in his coming of age sequence.

In turn Nick, for his part, very regularly and explicitly credits the role that Eliezer's work and discussions with Eliezer have played in his own research and thinking over the course of the FHI's work on AI safety.

I'm disappointed that my group's proposal to work on AI containment wasn't funded, and no other AI containment work was funded, either. Still, some of the things that were funded do look promising. I wrote a bit about what we proposed and the experience of the process here.

When considering possible failure modes for this proposal, one possibility I didn’t consider was that original research portions would look too much like summaries of existing work.

Oh man, that sucks. :(

I am not an expert (not even an amateur) in the area, but I wonder if the AI containment work would be futile without corrigibility figured out, and superfluous once it is? What is the window of AI intelligence where it is not yet super-human (too late to contain), but already too smart to be contained by the standard means?

I feel for you. I agree with salvatier's point in the linked page. Why don't you try to talk to FHI directly? They should be able to get some funding your way.