Wiki Contributions

Comments

If bounded below, you can just shift up to make it positive. But the geometric expected utility order is not preserved under shifts.

Violating the Continuity Axiom is bad because it allows you to be money pumped.

Violations of continuity aren't really vulnerable to proper/standard money pumps. The author calls it "arbitrarily close to pure exploitation" but that's not pure exploitation. It's only really compelling if you assume a weaker version of continuity in the first place, but you can just deny that.

I think transitivity (+independence of irrelevant alternatives) and countable independence (or the countable sure-thing principle) are enough to avoid money pumps, and I expect give a kind of expected utility maximization form (combining McCarthy et al., 2019 and Russell & Isaacs, 2021).

Against the requirement of completeness (or the specific money pump argument for it by Gustafsson in your link), see Thornley here.

To be clear, countable independence implies your utilities are "bounded" in a sense, but possibly lexicographic. See Russell & Isaacs, 2021.

Even if we instead assume that by ‘unconditional’, people mean something like ‘resilient to most conditions that might come up for a pair of humans’, my impression is that this is still too rare to warrant being the main point on the love-conditionality scale that we recognize.

I wouldn't be surprised if this isn't that rare for parents for their children. Barring their children doing horrible things (which is rare), I'd guess most parents would love their children unconditionally, or at least claim to. Most would tolerate bad but not horrible. And many will still love children who do horrible things. Partly this could be out of their sense of responsibility as a parent or attachment to the past.

I suspect such unconditional love between romantic partners and friends is rarer, though, and a concept of mid-conditional love like yours could be more useful there.

Maybe I’m out of the loop regarding the great loves going on around me, but my guess is that love is extremely rarely unconditional. Or at least if it is, then it is either very broadly applied or somewhat confused or strange: if you love me unconditionally, presumably you love everything else as well, since it is only conditions that separate me from the worms.

I would think totally unconditional love for a specific individual is allowed to be conditional on facts necessary to preserve their personal identity, which could be vague/fuzzy. If your partner asks you if you'd still love them if they were a worm and you do love them totally unconditionally, the answer should be yes, assuming they could really be a worm, at least logically. This wouldn't require you to love all worms. But you could also deny the hypothesis if they couldn't be a worm, even logically, in case a worm can't inherit their identity from a human.

That being said, I'd also guess that love is very rarely totally unconditional in this way. I think very few would continue to love someone who tortures them and others they care about. I wouldn't be surprised if many people (>0.1%, maybe even >1% of people) would continue to love someone after that person turned into a worm, assuming they believed their partner's identity would be preserved.

Answer by MichaelStJulesApr 11, 202420

It's conceivable how the characters/words are used across English and Alienese have a strong enough correspondence that you can guess matching words much better than chance. But, I'm not confident that you'd have high accuracy.

Consider encryption. If you encrypted messages by mapping the same character to the same character each time, e.g. 'd' always gets mapped to '6', then this can be broken with decent accuracy by comparing frequency statistics of characters in your messages with the frequency statistics of characters in the English language.

If you mapped whole words to strings instead of character to character, you could use frequency statistics for whole words in the English language.

Then, between languages, this mostly gets way harder, but you might be able to make some informed guesses, based on

  1. how often you expect certain concepts to be referred to (frequency statistics, although even between human languages, there are probably very important differences)
  2. guesses about extremely common words like 'a', 'the', 'of'
  3. possible grammars
  4. similar words being written similarly, like verb tenses of the same verb, noun and verb forms of the same word, etc..
  5. (EDIT) Fine-grained associations between words, e.g. if a given word is used in a random sentence, how often another given word is used in that same sentence. Do this for all ordered pairs of words.

An AI might use similar facts or others, and many more, about much fine-grained and specific uses of words and associations, to guess, but I’m not sure an LLM token predictor mostly just trained on both languages in particular would do a good job.

EDIT: Unsupervised machine translation as Steven Byrnes pointed out seems to be on a better track.

Also, I would add that LLMs trained without perception of things other than text don't really understand language. The meanings of the words aren't grounded, and I imagine it could be possible to swap some in a way that would mostly preserve the associations (nearly isomorphic), but I’m not sure.

The reason SDG doesn't overfit large neural networks is probably because of various measures specifically intended to prevent overfitting, like weight penalties, dropout, early stopping, data augmentation + noise on inputs, and large enough learning rates that prevent convergence. If you didn't do those, running SDG to parameter convergence would probably cause overfitting. Furthermore, we test networks on validation datasets on which they weren't trained, and throw out the networks that don't generalize well to the validation set and start over (with new hyperparameters, architectures or parameter initializations). These measures bias us away from producing and especially deploying overfit networks.

Similarly, we might expect scheming without specific measures to prevent it. What could those measures look like? Catching scheming during training (or validation), and either heavily penalizing it, or fully throwing away the network and starting over? We could also validate out-of-training-distribution. Would networks whose caught scheming has been heavily penalized or networks selected for not scheming during training (and validation) generalize to avoid all (or all x-risky) scheming? I don't know, but it seems more likely than counting arguments would suggest.

Thanks!

I would say experiments, introspection and consideration of cases in humans have pretty convincingly established the dissociation between the types of welfare (e.g. see my section on it, although I didn't go into a lot of detail), but they are highly interrelated and often or even typically build on each other like you suggest.

I'd add that the fact that they sometimes dissociate seems morally important, because it makes it more ambiguous what's best for someone if multiple types seem to matter, and there are possible beings with some types but not others.

If someone wants to establish probabilities, they should be more systematic, and, for example, use reference classes. It seems to me that there's been little of this for AI risk arguments in the community, but more in the past few years.

Maybe reference classes are kinds of analogies, but more systematic and so less prone to motivated selection? If so, then it seems hard to forecast without "analogies" of some kind. Still, reference classes are better. On the other hand, even with reference classes, we have the problem of deciding which reference class to use or how to weigh them or make other adjustments, and that can still be subject to motivated reasoning in the same way.

We can try to be systematic about our search and consideration of reference classes, and make estimates across a range of reference classes or weights to them. Do sensitivity analysis. Zach Freitas-Groff seems to have done something like this in AGI Catastrophe and Takeover: Some Reference Class-Based Priors, for which he won a prize from Open Phil's AI Worldviews Contest.

Of course, we don't need to use direct reference classes for AI risk or AI misalignment. We can break the problem down.

There's also a decent amount of call option volume+interest at strike prices of $17.5, $20, $22.5, $25, (same links as the comment I'm replying to) which suggests to me that the market is expecting lower upside on successful merger than you. The current price is about $15.8/share, so $17.5 is only +10% and $25 is only +58%.

There's also of course volume+interest for call option at higher strike prices, $27.5, $30, $32.5.

I think this also suggests the market-implied odds calculations giving ~40% to successful merger are wrong, because the expected upside is overestimated.  The market-implied odds are higher.

From https://archive.ph/SbuXU, for calculating the market-implied odds:

Author's analysis - assumed break price of $5 for Hawaiian and $6 for Spirit.

also:

  • Without a merger, Spirit may be financially distressed based on recent operating results. There's some risk that Spirit can't continue as a going concern without a merger.
  • Even if JetBlue prevails in court, there is some risk that the deal is recut as the offer was made in a much more favorable environment for airlines, though clauses in the merger agreement may prevent this.

So maybe you're overestimating the upside?

 

From https://archive.ph/rmZOX:

In my opinion, Spirit Airlines, Inc. equity is undervalued at around $15, but you're signing up for tremendous volatility over the coming months. The equity can get trashed under $5 or you can get the entire upside.

Load More