Maxime Riché


Safe exploration and corrigibility

One particularly interesting recent work in this domain was Leike et al.'s “Learning human objectives by evaluating hypothetical behaviours,” which used human feedback on hypothetical trajectories to learn how to avoid environmental traps. In the context of the capability exploration/objective exploration dichotomy, I think a lot of this work can be viewed as putting a damper on instrumental capability exploration.

Isn't this work also linked to objective exploration? One of the four "hypothetical behaviours" used is the selection of trajectories which maximizes reward uncertainty. Trajectories which are then evaluated by humans. 

Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe?
Is your suggestion to run this system as a source of value, simulating lives for their own sake rather than to improve the quality of life of sentient beings in our universe? Our history (and present) aren't exactly utopian, and I don't see any real reason to believe that slight variations on it would lead to anything happier.

I am thinking about if we should reasonably expect to produce better result by trying to align an AGI with our value than by simulating a lot of alternate universes. I am not saying that this is net-negative or net-positive. It seems to me that the expected value of both cases may be identical.

Also by history, I also meant the future, not only the past and present. (I edited the question to replace "histories" by "trajectories")

Can we expect more value from AI alignment than from an ASI with the goal of running alternate trajectories of our universe?

(About the first part of your comment) Thank you for pointing to three confused points:

First, I don't know if you intended this, but "stimulating the universe" carries a connotation of a low-level physics simulation. This is computationally impossible. Let's have it model the universe instead, using the same kind of high-level pattern recognition that people use to predict the future.

To be more precise, what I had in mind is that the ASI is an agent which goal is:

  • to model the sentient part of the universe finely enough to produce sentience in an instance of its model (and it will also need to model the necessary non-sentient "dependencies")
  • and to instantiate this model N times. For example, playing them from 1000 A.D. to the time where no sentience remains in a given instance of modeled universe. (all of this efficiently)

(To reduce complexity, I didn't mention it but we could think of heuristics to reduce playing to much of the "past" and "future" history filled suffering)

Second, if the AGI is simulating itself, the predictions are wildly undetermined; it can predict that it will do X, and then fulfill its own prophecy by actually doing X, for any X. Let's have it model a counterfactual world with no AGIs in it.

An instance of the modeled universe would not be our present universe. It would be "another seed", starting before that the ASI exists and thus it would not need to model itself but only possible ("new") ASI produced inside the instances.

Third, you need some kind of interface. Maybe you type in "I'm interested in future scenarios in which somebody cures Alzheimer's and writes a scientific article describing what they did. What is the text of that article?" and then it runs through a bunch of scenarios and prints out its best-guess article in the first 50 scenarios it can find. (Maybe also print out a retrospective article from 20 years later about the long-term repercussions of the invention.) For a different type of interface, see microscope AI.

In the scenario I had in mind, the ASI would fill our universe will computing machines to produce as many instances as possible. (We would not use it and thus we will not need interface with the ASI)

Covid 7/9: Lies, Damn Lies and Death Rates
Explanations 1+5: We are doing a better job treating people who get infected.
Explanations 2+3+6: Different people are getting infected who are less vulnerable.
Explanation 4: We are increasingly covering up deaths.

I did not read everything... but between the 1st and 2 wage, there is ~ x5 time less death but ~ x2 more daily cases currently. Could this be also explained by much more tests being done?

Then the first wage would have been ~x10 time higher than reported in the comparaison and the second wage would currently be still below the first.

Open & Welcome Thread - February 2020

Offering 100-300h of technical work on an AI Safety project

I am a deep learning engineer (2y exp), I currently develop vision models to be used on satellite images (I also do some software engineering around that) (Linkedin profile On my spare time, I am organizing a EA local group in Toulouse (France), learning RL, doing a research project on RL for computer vision (only expecting indirect utility from this) and developing an EAA tool (EffectiveAnimalAdvocacy). I have been in the French EA community for 4 years. In 2020, I chose to work part time to dedicate 2 to 3 days of work per week to EA aligned projects.Thus for the next 8 months, I have ~10h / week that I want to dedicate to assist an AI safety project. For myself, I am not looking for funds, nor to publish myself a paper, nor a blog post.To me the ideal project would be:

  • a relevant technical AI safety project (research or not). I am looking for advice on the "relevant" part.
  • where I would be able to help the project to achieve better quality results than otherwise without my contribution. (e.g. through writing better code, doing more experiments, testing other designs)
  • where I can learn more about technical AI safety
  • where my contribution would include writing code. If it is a research proposal, then implement experiments. If there is no experimental part currently in the project, I could take charge of creating one.