Wiki Contributions


Why not use a subset of the human brain as the benchmark for general intelligence? E.g. linguistic cortex + prefrontal cortex + hippocampus, or the whole cerebral cortex? There's a lot we don't need for general intelligence.

GPT-4 is supposed to have 500x as many parameters as GPT-3. If you use such a subset of the human brain as the benchmark, would GPT-4 match it in optimization power? Do you think GPT-4 will be an AGI?

I think the more important takeaway is that the (countable) sure thing principle and transitivity together rule out preferences allowing St. Petersburg-like lotteries, and so "unbounded" preferences.

I recommend https://onlinelibrary.wiley.com/doi/abs/10.1111/phpr.12704

It discusses more ways preferences allowing St. Petersburg-like lotteries seem irrational, like choosing dominated strategies, dynamic inconsistency and paying to avoid information. Furthermore, they argue that the arguments supporting the finite Sure Thing Principle are generally also arguments in favour of the Countable Sure Thing Principle, because they don't depend on the number of possibilities in a lottery being finite. So, if you reject the Countable Sure Thing Principle, you should probably reject the finite one, too, and if you accept St. Petersburg-like lotteries, you need to in principle accept behaviour that seems irrational.

They also have a general vNM-like representation theorem, dropping the Archimedean/continuity axiom, and replacing the Independence axiom with Countable Independence, and with transitivity and completeness, they get utility functions with values in lexicographically ordered ordinal sequences of bounded real utilities. (They say the sequences can have any ordinal to order them, but that seems wrong to me, since I'd think infinite length lexicographically ordered sequences get you St. Petersburg-like lotteries and violate Limitedness, but maybe I'm misunderstanding. EDIT: I think they meant you can have a an infinite sequence of dominated components, not an infinite sequence of dominating components, so you check the most important component first, and then the second, and continue for possibly infinitely many. Well-orderedness ensures there's always a next one to check.)

When I think about digital signatures, the AGI would need to not know its private key or be able to share access to its channel to sign with. I think they would need a trusted and hard to manipulate third party to verify or provide proof, e.g. a digital signature on the model, or read-only access to where the model is held + history. I suppose this could just be a server the AGI is running on, if it is run by such a third party, but it might not be.

One reason for optimism about AGI conflict is that AGIs may be much better at credible commitment and disclosure of private information. For example, AGIs could make copies of themselves and let their counterparts inspect these copies until they are satisfied that they understand what kinds of commitments their counterpart has in place.

Basic question: how would you verify that you got a true copy, and not a fake?

Minor readability suggestion to reduce the number of negations. Change:

Conflict reduction won’t make a difference if the following conditions don’t hold: (a) AGIs won’t always avoid conflict, despite it being materially costly and (b) intent alignment is either insufficient or unnecessary for conflict reduction work to make a difference.


Conflict reduction will make a difference only if the following conditions both hold: (a) AGIs won’t always avoid conflict, despite it being materially costly and (b) intent alignment is either insufficient or unnecessary for conflict reduction work to make a difference.


If conflict reduction will make a difference, then the following conditions both hold: (a) AGIs won’t always avoid conflict, despite it being materially costly and (b) intent alignment is either insufficient or unnecessary for conflict reduction work to make a difference.

Fair. I've stricken out the "fairly weak". I think this is true of the vNM axioms, too. Still, "completely and extremely physically impossible" to me just usually means very very low probability, not probability 0. We could be wrong about physics. See also Cromwell's rule. So, if you want your theory to cover all extremely unlikely but not actually totally ruled out (probability 0), it really needs to cover a lot. There may some things you can reasonably assign probability 0 to (other than individual events drawn from a continuum, say) or some probability assignments that you aren't forced to consider (they are your subjective probabilities after all), so Savage's axioms could be stronger than necessary.

I don't think it's reasonable to rule out all possible realizations of Christiano's St. Petersburg lotteries, though. You could still ignore these possibilities, and I think this is basically okay, too, but it seems hard to come up with a satisfactory principled reason to do so, so I'd guess it's incompatible with normative realism about decision theory (which I doubt, anyway).

I don't think it's a mistake to focus on animal suffering over human suffering (if we’re only comparing these two), since it seems likely we can reduce animal suffering more cost-effectively, and possibly much more cost-effectively, depending on your values. See:





"If you want to help the environment or animals, the only plausible way to do so is to help align AI with your values (including your value of the environment and animals). We're at a super weird crux point where everything channels through that."

We can still prevent suffering up until AGI arrives, AGI might not come for decades, and even after it comes, if we don't go extinct (which would very plausibly come with the end of animal suffering!), there can still be popular resistance to helping or not harming animals. You might say influencing AI values is the most cost-effective way to help animals and this is plausible, but not obvious. Some people are looking at moral circle expansion as a way to improve the far future, like Sentience Institute, but mostly for artificial sentience.

You could just drop MATH and make a bet at different odds on the remaining items.

I agree that they probably would have missed their chance to catch up with the frontier of your expansion.

Maybe an electromagnetic radiation-based assault could reach you if targeted (the speed of light is constant relative to you in a vacuum, even if you're traveling in the same direction), although unlikely to get much of the frontier of your expansion, and there are plausibly effective defenses, too.

Do you also mean they wouldn't be able to take most what you've passed through, though? Or it wouldn't matter? If so, how would this be guaranteed (without any violation of the territory of sovereign states on Earth)? Exhaustive extraction in space? An advantage in armed space conflicts?

Is cooperative inverse reinforcement learning promising? Why or why not?

Load More