Marcus Ogren


Sorted by New

Wiki Contributions


(Disclosure: Vanessa is my wife.)

I want to share my thoughts on how the LTA can have a large impact. I think the main plan - to understand agency and intelligence fully enough to construct a provably aligned AI (perhaps modulo a few reasonable assumptions about the real world) - is a good plan. It’s how a competent civilization would go about solving the alignment problem, and a non-negligible chunk of the expected impact of the LTA comes from it working about as planned. But there are also plenty of other, less glamorous ways for it to make a big difference as well.

The LTA gives us tools to think about AI better. Even without the greater edifice of LTA and without a concentrated effort to complete the LTA and build an aligned AI in accordance with it, it can yield insights that help other alignment researchers. The LTA can identify possible problems that need to be solved and currently-unknown pitfalls that could make an AI unsafe (along the lines of the problem of privilege and acausal attack). It can also produce “tools” for solving certain aspects of the alignment problem that could be applied in an ad-hoc manner (such as the individual components of PSI). While this is decidedly inferior to creating a provably aligned AI, it is also far more likely to happen.

As for PSI, I think it’s a promising plan for creating an aligned AI in and of itself; it doesn’t appear to require greatly reduced capabilities and gives the AI an unhackable pointer to human values. But its main significance is as a proof of concept: the LTA has delivered this alignment proposal, and the LTA isn’t even close to being finished. My best guess is that, given enough time, some variant of PSI could be created as a provably aligned AI (modulo assumptions about human psychology, etc.). But I also expect better ideas in the future. PSI demonstrates that considering the fundamental questions of agency can lead to novel and elegant solutions to the alignment problem. Before Vanessa came up with PSI, I thought the main value of her research lay in solving weird-sounding problems (like acausal attack) that originally sounded more like an AI being stupid than like the AI being misaligned. PSI shows that the LTA is much, much more than this.

The two main proposals are sequential proportional approval voting (SPAV) and proportional approval voting (PAV).

SPAV proceeds in rounds. In the first round, the candidate with the most votes wins. In the second round, the ballots are reweighted such that those which have the first winner selected have 1/2 weight and all others retain full weight. This is repeated until each seat is filled, and, in each round, a ballot that has voted for n candidates who have already been elected is weighted at 1/(n + 1).

For an election to fill N seats, PAV looks at each possible set of N candidates and elects the set which maximizes the utility function given by 1*(# of ballots with at least one of the candidates selected) + 1/2*(# of ballots with at least two of the candidates selected) ... + 1/n*(# of ballots with at least n of the candidates selected).

Both PAV and SPAV yield proportional representation while using approval-style ballots. PAV yields better results, but SPAV is easier to explain the outcomes of. Another option is to use the allocated score algorithm on approval ballots.