Co-founder of and researcher at Convergence. Convergence does foundational existential risk strategy research. See here for our growing list of publications.

Past: R&D Project Manager, Software Engineer.

Wiki Contributions


The case for aligning narrowly superhuman models

The amount of effort going into AI as a whole ($10s of billions per year) is currently ~2 orders of magnitude larger than the amount of effort going into the kind of empirical alignment I’m proposing here, and at least in the short-term (given excitement about scaling), I expect it to grow faster than investment into the alignment work.

There's a reasonable argument (shoutout to Justin Shovelain) that the risk is that work such as this done by AI alignment people will be closer to AGI than the work done by standard commercial or academic research, and therefore accelerate AGI more than average AI research would. Thus, $10s of billions per year into general AI is not quite the right comparison, because little of that money goes to matters "close to AGI".

That said, on balance, I'm personally in favor of the work this post outlines.

Anti-Aging: State of the Art

Unfortunately, there is no good 'where to start' guide for anti-aging. This is insane, given this is the field looking for solutions to the biggest killer on Earth today.

Low hanging fruit intervention: Create a public guide to that effect on a web site.

Is this viable physics?

That being said, I would bet that one would be able to find other formalisms that are equivalent after kicking down the door...

At least, we've now hit one limit in the shape of universal computation: No new formalism will be able to do something that couldn't be done with computers. (Unless we're gravely missing something about what's going on in the universe...)

Good and bad ways to think about downside risks

When it comes to the downside risk, it's often that there are more unknown unknown that produce harm then positive unknown unknown. People are usually biased to overestimate the positive effects and underestimate the negative effects for the known unknown.

This seems plausible to me. Would you like to expand on why you think this is the case?

The asymmetry between creation and destruction? (I.e., it's harder to build than it is to destroy.)

Good and bad ways to think about downside risks

Very good point! The effect of not taking an action depends on what the counterfactual is: what would happen otherwise/anyway. Maybe the article should note this.

mind viruses about body viruses

Excellent comment, thank you! Don't let the perfect be the enemy of the good if you're running from an exponential growth curve.

The recent NeurIPS call for papers requires authors to include a statement about the potential broader impact of their work

Looks promising to me. Technological development isn't by default good.

Though I agree with the other commenters that this could fail in various ways. For one thing, if a policy like this is introduced without guidance on how to analyze the societal implications, people will think of wildly different things. ML researchers aren't by default going to have the training to analyze societal consequences. (Well, who does? We should develop better tools here.)

Jan Bloch's Impossible War

Or, at least, include a paragraph or a few to summarize it!

A point of clarification on infohazard terminology

Some quick musings on alternatives for the "self-affecting" info hazard type:

  • Personal hazard
  • Self info hazard
  • Self hazard
  • Self-harming hazard
AI alignment concepts: philosophical breakers, stoppers, and distorters

I wrote this comment to an earlier version of Justin's article:

It seems to me that most of the 'philosophical' problems are going to get solved as a matter of solving practical problems in building useful AI. You could call ML systems, AI, that is getting developed now 'empirical'. From the perspective of the people building current systems, they likely don't consider what they're doing as solving philosophical problems. Symbol grounding problem? Well, an image classifier built on a convolutional neural network learns to get quite proficient at grounding out classes like 'cars' and 'dogs' (symbols) from real physical scenes.

So, the observation I want to make, is that the philosophical problems we can think of that might trip over a system are likely to turn out to look like technical/research/practical problems that need to be solved by default for practical reasons in order to make useful systems.

The image classification problem wasn't solved in one day, but it was solved using technical skills, engineering skills, more powerful hardware, and more data. People didn't spend decades discussing philosophy: the problem was solved from some advances in the ideas of building neural networks and from more powerful computers.
Of course, image classification doesn't solve the symbol grounding problem in full. But other aspects of symbol grounding that people might find mystifying are getting solved piece-wise, as researchers and engineers are solving practical problems of AI.

Let's look at a classic problem formulation from MIRI, 'Ontology Identification':

Technical problem (Ontology Identification). Given goals specified in some ontology and a world model, how can the ontology of the goals be identified in the world model? What types of world models are amenable to ontology identification? For a discussion, see Soares (2015).

When you create a system that performs any function in the real world, you are in some sense giving it goals. Reinforcement Learning-trained systems are pursuing 'goals'. An autonomous car takes you from chosen points A to chosen points B; it has the overall goal of transporting people. The ontology identification problem is getting solved piece-wise as a practical matter. Perhaps the MIRI-style theory could give us a deeper understanding that helps us avoid some pitfalls, but it's not clear why these wouldn't be caught as practical problems.

What would a real philosophical landmine look like? A class of philosophical problems that wouldn't get solved as a practical matter, and pose a risk for harm against humanity would be the real philosophical landmines.

Load More