Sammy Martin. Philosophy and Physics BSc, AI MSc at Edinburgh, starting a PhD at King's College London. Interested in ethics, general philosophy and AI Safety.


A voting theory primer for rationalists
You seem to be comparing Arrow's theorem to Lord Vetinari, implying that both are undisputed sovereigns?

It was a joke about how if you take Arrow's theorem literally, the fairest 'voting method' (at least among ranked voting methods), the only rule which produces a definite transitive preference ranking and which meets the unanimity and independence conditions is 'one man, one vote', i.e. dictatorship.

And frankly, I think that the model used in the paper bears very little relationship to any political reality I know of. I've never seen a group of voters who believe "I would love it if any two of these three laws pass, but I would hate it if all three of them passed or none of them passed" for any set of laws that are seriously proposed and argued-for.

Such a situation doesn't seem all that far-fetched to me - suppose there are three different stimulus bills on offer, and you want some stimulus spending but you also care about rising national debt. You might not care which bills pass, but you still want some stimulus money, but you also don't want all of them to pass because you think the debt would rise too high, so maybe you decide that you just want any 2 out of 3 of them to pass. But I think the methods introduced in that paper might be most useful not to model the outcomes of voting systems, but for attempts to align an AI to multiple people's preferences.

Covid 8/27: The Fall of the CDC
I’m still periodically scared in an existential or civilization-is-collapsing-in-general kind of way, but not in a ‘the economy is about to collapse’ or ‘millions of Americans are about to die’ kind of way. 
I’m not sure whether this is progress.

It definitely is progress. If we were in the latter situation, there would be nothing at all to do except hope you personally don't die, whereas in the former there's a chance for things to get better - if we learn the lesson.

By strange coincidence, it's exactly 6 months since I wrote this, and I think it's important to remember just how dire the subjective future seemed at the end of February - that (subjectively, anyway) could have happened, but didn't.

SDM's Shortform
The tl;dr is that instead of thinking of ethics as a single unified domain where "population ethics" is just a straightforward extension of "normal ethics," you split "ethics" into a bunch of different subcategories:
Preference utilitarianism as an underdetermined but universal morality
"What is my life goal?" as the existentialist question we have to answer for why we get up in the morning
"What's a particularly moral or altruistic thing to do with the future lightcone?" as an optional subquestion of "What is my life goal?" – of interest to people who want to make their life goals particularly altruistically meaningful

This is very interesting - I recall from our earlier conversation that you said you might expect some areas of agreement, just not on axiology:

(I say elements because realism is not all-or-nothing - there could be an objective 'core' to ethics, maybe axiology, and much ethics could be built on top of such a realist core - that even seems like the most natural reading of the evidence, if the evidence is that there is convergence only on a limited subset of questions.)

I also agree with that, except that I think axiology is the one place where I'm most confident that there's no convergence. :)
Maybe my anti-realism is best described as "some moral facts exist (in a weak sense as far as other realist proposals go), but morality is underdetermined."

This may seem like an odd question, but, are you possibly a normative realist, just not a full-fledged moral realist? What I didn't say in that bracket was that 'maybe axiology' wasn't my only guess about what the objective, normative facts at the core of ethics could be.

Following Singer in the expanding circle, I also think that some impartiality rule that leads to preference utilitarianism, maybe analogous to the anonymity rule in social choice, could be one of the normatively correct rules that ethics has to follow, but that if convergence among ethical views doesn't occur the final answer might be underdetermined. This seems to be exactly the same as your view, so maybe we disagree less than it initially seemed.

In my attempted classification (of whether you accept convergence and/or irreducible normativity), I think you'd be somewhere between 1 and 3. I did say that those views might be on a spectrum depending on which areas of Normativity overall you accept, but I didn't consider splitting up ethics into specific subdomains, each of which might have convergence or not:

Depending on which of the arguments you accept, there are four basic options. These are extremes of a spectrum, as while the Normativity argument is all-or-nothing, the Convergence argument can come by degrees for different types of normative claims (epistemic, practical and moral)

Assuming that it is possible to cleanly separate population ethics from 'preference utilitarianism', it is consistent, though quite counterintuitive, to demand reflective coherence in our non-population ethical views but allow whatever we want in population ethics (this would be view 1 for most ethics but view 3 for population ethics).

(This still strikes me as exactly what we'd expect to see halfway to reaching convergence - the weirder and newer subdomain of ethics still has no agreement, while we have reached greater agreement on questions we've been working on for longer.)

It sounds like your contrasting my statement from The Case for SFE ("fit all one’s moral intuitions into an overarching theory based solely on intuitively appealing axioms") with "arbitrarily halting the search for coherence" / giving up on ethics playing a role in decision-making. But those are not the only two options: You can have some universal moral principles, but leave a lot of population ethics underdetermined.

Your case for SFE was intended to defend a view of population ethics - that there is an asymmetry between suffering and happiness. If we've decided that 'population ethics' is to remain undetermined, that is we adopt view 3 for population ethics, what is your argument (that SFE is an intuitively appealing explanation for many of our moral intuitions) meant to achieve? Can't I simply declare that my intuitions say different, and then we have nothing more to discuss, if we already know we're going to leave population ethics undetermined?

Forecasting Thread: AI Timelines

The 'progress will be continuous' argument, to apply to our near future, does depend on my other assumptions - mainly that the breakthroughs on that list are separable, so agentive behaviour and long-term planning won't drop out of a larger GPT by themselves and can't be considered part of just 'improving up language model accuracy'.

We currently have partial progress on human-level language comprehension, a bit on cumulative learning, but near zero on managing mental activity for long term planning, so if we were to suddenly reach human level on long-term planning in the next 5 years, that would probably involve a discontinuity, which I don't think is very likely for the reasons given here.

If language models scale to near-human performance but the other milestones don't fall in the process, and my initial claim is right, that gives us very transformative AI but not AGI. I think that the situation would look something like this:

If GPT-N reaches par-human:

discovering new action sets
managing its own mental activity
(?) cumulative learning
human-like language comprehension
perception and object recognition
efficient search over known facts

So there would be 2 (maybe 3?) breakthroughs remaining. It seems like you think just scaling up a GPT will also resolve those other milestones, rather than just giving us human-like language comprehension. Whereas if I'm right and also those curves do extrapolate, what we would get at the end would be an excellent text generator, but it wouldn't be an agent, wouldn't be capable of long-term planning and couldn't be accurately described as having a utility function over the states of the external world, and I don't see any reason why trivial extensions of GPT would be able to do that either since those seem like problems that are just as hard as human-like language comprehension. GPT seems like it's also making some progress on cumulative learning, though it might need some RL-based help with that, but none at all on managing mental activity for longterm planning or discovering new action sets.

As an additional argument, admittedly from authority - Stuart Russell also clearly sees human-like language comprehension as only one of several really hard and independent problems that need to be solved.

A humanlike GPT-N would certainly be a huge leap into a realm of AI we don't know much about, so we could be surprised and discover that agentive behaviour and having a utility function over states of the external world spontaneously appears in a good enough language model, but that argument has to be made, and you need that argument to hold and GPT to keep scaling for us to reach AGI in the next five years, and I don't see the conjunction of those two as that likely - it seems as though your argument rests solely on whether GPT scales or not, when there's also this other conceptual premise that's much harder to justify.

I'm also not sure if I've seen anyone make the argument that GPT-N will also give us these specific breakthroughs - but if you have reasons that GPT scaling would solve all the remaining barriers to AGI, I'd be interested to hear it. Note that this isn't the same as just pointing out how impressive the results scaling up GPT could be - Gwern's piece here, for example, seems to be arguing for a scenario more like what I've envisaged, where GPT-N ends up a key piece of some future AGI but just provides some of the background 'world model':

Models like GPT-3 suggest that large unsupervised models will be vital components of future DL systems, as they can be ‘plugged into’ systems to immediately provide understanding of the world, humans, natural language, and reasoning.

If GPT does scale, and we get human-like language comprehension in 2025, that will mean we're moving up that list much faster, and in turn suggests that there might not be a large number of additional discoveries required to make the other breakthroughs, which in turn suggests they might also occur within the Deep Learning paradigm, and relatively soon. I think that if this happens, there's a reasonable chance that when we do build an AGI a big part of its internals looks like a GPT, as gwern suggested, but by then we're already long past simply scaling up existing systems.

Alternatively, perhaps you're not including agentive behaviour in your definition of AGI - a par-human text generator for most tasks that isn't capable of discovering new action sets or managing its mental activity is, I think a 'mere' transformative AI and not a genuine AGI.

SDM's Shortform

So to sum up, a very high-level summary of the steps in this method of preference elicitation and aggregation would be:

    1. With a mixture of normative assumptions and multi-channel information (approval and actions) as inputs, use a reward-modelling method to elicit the debiased preferences of many individuals.
      1. Determining whether there actually are significant differences between stated and revealed preferences when performing reward modelling is the first step to using multi-channel information to effectively separate biases from preferences.
    2. Create 'proxy agents' using the reward model developed for each human (this step is where intent-aligned amplification can potentially occur).
    3. Place the proxies in an iterated voting situation which tends to produce sensible convergent results. The use of RL proxies here can be compared to the use of human proxies in liquid democracy.
      1. Which voting mechanisms tend to work in iterated situations with RL agents can be determined in other experiments (probably with purely artificial agents)
    4. Run the voting mechanism until an unambiguous winner is decided, using methods like those given in this paper.

This seems like a reasonable procedure for extending a method that is aligned to one human's preferences (step 1,2) to produce sensible results when trying to align to an aggregate of human preferences (step 3,4). It reduces reliance on the specific features of one voting method, Other than the insight that multiple channels of information might help, all the standard unsolved problems with preference learning from one human remain.

Even though we can't yet align an AGI to one human's preferences, trying to think about how to aggregate human preferences in a way that is scalable isn't premature, as has sometimes been claimed.

In many 'non-ambitious' hypothetical settings where we aren't trying to build an AGI sovereign over the whole world (for example, designing a powerful AI to govern the operations of a hospital), we still need to be able to aggregate preferences sensibly and stably. This method would do well at such intermediate scales, as it doesn't approach the question of preference aggregation from a 'final' ambitious value-learning perspective but instead tries to look at preference aggregation the same way we look at elicitation, with an RL-based iterative approach to reaching a result.

However, if you did want to use such a method to try and produce the fabled 'final utility function of all humanity', it might not give you Humanity's CEV, since some normative assumptions (preferences count equally and in the way given by the voting mechanism), are built in. By analogy with CEV, I called the idealized result of this method a coherent extrapolated framework (CEF). This is a more normatively direct method of aggregating values than CEV, (since you fix a particular method of aggregating preferences in advance), as it extrapolates from a voting framework rather than extrapolating based on our volition, more broadly (and vaguely) defined, hence CEF.

A voting theory primer for rationalists
Kenneth Arrow, proved that the problem that Condorcet (and Llull) had seen was in fact a fundamental issue with any ranked voting method. He posed 3 basic "fairness criteria" and showed that no ranked method can meet all of them:
Ranked unanimity, Independence of irrelevant alternatives, Non-dictatorial

I've been reading up on voting theory recently and Arrow's result - that the only voting system which produces a definite transitive preference ranking, that will pick the unanimous answer if one exists, and doesn't change depending on irrelevant alternatives - is 'one man, one vote'.

“Ankh-Morpork had dallied with many forms of government and had ended up with that form of democracy known as One Man, One Vote. The Patrician was the Man; he had the Vote.”

In my opinion, aside from the utilitarian perspective offered by VSE, the key to evaluating voting methods is an understanding of strategic voting; this is what I'd call the "mechanism design" perspective. I'd say that there are 5 common "anti-patterns" that voting methods can fall into; either where voting strategy can lead to pathological results, or vice versa.

One recent extension to these statistical approaches is to use RL agents in iterated voting and examine their convergence behaviour. The idea is that we embrace the inevitable impossibility results (such as Arrow and GS theorems) and consider agents' ability to vote strategically as an opportunity to reach stable outcomes. This paper uses very simple Q-learning agents with a few different policies - epsilon-greedy, greedy and upper confidence bound, in an iterated voting game, and gets behaviour that seems sensible. Many thousands of rounds of iterated voting isn't practical for real-world elections, but for preference elicitation in other contexts (such as value learning) it might be useful as a way to try and estimate people's underlying utilities as accurately as possible.

Open & Welcome Thread - August 2020

A first actually credible claim of coronavirus reinfection? Potentially good news as the patient was asymptomatic and rapidly produced a strong antibody response.

Forecasting Thread: AI Timelines
Answer by SDMAug 23, 202026Ω9

Here's my answer. I'm pretty uncertain compared to some of the others!

AI Forecast

First, I'm assuming that by AGI we mean an agent-like entity that can do the things associated with general intelligence, including things like planning towards a goal and carrying that out. If we end up in a CAIS-like world where there is some AI service or other that can do most economically useful tasks, but nothing with very broad competence, I count that as never developing AGI.

I've been impressed with GPT-3, and could imagine it or something like it scaling to produce near-human level responses to language prompts in a few years, especially with RL-based extensions.

But, following the list (below) of missing capabilities by Stuart Russell, I still think things like long-term planning would elude GPT-N, so it wouldn't be agentive general intelligence. Even though you might get those behaviours with trivial extensions of GPT-N, I don't think it's very likely.

That's why I think AGI before 2025 is very unlikely (not enough time for anything except scaling up of existing methods). This is also because I tend to expect progress to be continuous, though potentially quite fast, and going from current AI to AGI in less than 5 years requires a very sharp discontinuity.

AGI before 2035 or so happens if systems quite a lot like current deep learning can do the job, but which aren't just trivial extensions of them - this seems reasonable to me on the inside view - e.g. it takes us less than 15 years to take GPT-N and add layers on top of it that handle things like planning and discovering new actions. This is probably my 'inside view' answer.

I put a lot of weight on a tail peaking around 2050 because of how quickly we've advanced up this 'list of breakthroughs needed for general intelligence' -

There is this list of remaining capabilities needed for AGI in an older post I wrote, with the capabilities of 'GPT-6' as I see them underlined:

Stuart Russell’s List

human-like language comprehension

cumulative learning

discovering new action sets

managing its own mental activity

For reference, I’ve included two capabilities we already have that I imagine being on a similar list in 1960

perception and object recognition

efficient search over known facts

So we'd have discovering new action sets, and managing mental activity - effectively, the things that facilitate long-range complex planning, remaining.

So (very oversimplified) if around the 1980s we had efficient search algorithms, by 2015 we had image recognition (basic perception) and by 2025 we have language comprehension courtesy of GPT-8, that leaves cumulative learning (which could be obtained by advanced RL?), then discovering new action sets and managing mental activity (no idea). It feels a bit odd that we'd breeze past all the remaining milestones in one decade after it took ~6 to get to where we are now. Say progress has sped up to be twice as fast, then it's 3 more decades to go. Add to this the economic evidence from things like Modelling the Human Trajectory, which suggests a roughly similar time period of around 2050.

Finally, I think it's unlikely but not impossible that we never build AGI and instead go for tool AI or CAIS, most likely because we've misunderstood the incentives such that it isn't actually economical or agentive behaviour doesn't arise easily. Then there's the small (few percent) chance of catastrophic or existential disaster which wrecks our ability to invent things. This is the one I'm most unsure about - I put 15% for both but it may well be higher.

SDM's Shortform

I don't think that excuse works in this case - I didn't give it a 'long-winded frame', just that brief sentence at the start, and then the list of scenarios, and even though I reran it a couple of times on each to check, the 'cranberry/grape juice kills you' outcome never arose.

So, perhaps they switched directly from no prompt to an incredibly long-winded and specific prompt without checking what was actually necessary for a good answer? I'll point out didn't really attempt any sophisticated prompt programming either - that was literally the first sentence I thought of!

Load More