Cleo Nardo

DMs open.

Sequences

Game Theory without Argmax

Wiki Contributions

Comments

Sorted by

I've skimmed the business proposal.

The healthcare agents advise patients on which information to share with their doctor, and advises doctors on which information to solicit from their patients.

This seems agnostic between mental and physiological health. 

Thanks for putting this together — very useful!

If I understand correctly, the maximum entropy prior will be the uniform prior, which gives rise to Laplace's law of succession, at least if we're using the standard definition of entropy below:

But this definition is somewhat arbitrary because the the "" term assumes that there's something special about parameterising the distribution with it's probability, as opposed to different parameterisations (e.g. its odds, its logodds, etc). Jeffrey's prior is supposed to be invariant to different parameterisations, which is why people like it.

But my complaint is more Solomonoff-ish. The prior should put more weight on simple distributions, i.e. probability distributions that describe short probabilistic programs. Such a prior would better match our intuitions about what probabilities arise in real-life stochastic processes. The best prior is the Solomonoff prior, but that's intractable. I think my prior is the most tractable prior that resolved the most egregious anti-Solomonoff problems with Laplace/Jeffrey's priors.

You raise a good point. But I think the choice of prior is important quite often:

  1. In the limit of large i.i.d. data (N>1000), both Laplace's Rule and my prior will give the same answer. But so too does the simple frequentist estimate n/N. The original motivation of Laplace's Rule was in the small N regime, where the frequentist estimate is clearly absurd.
  2. In the small data regime (N<15), the prior matters. Consider observing 12 successes in a row: Laplace's Rule: P(next success) = 13/14 ≈ 92.3%. My proposed prior (with point masses at 0 and 1): P(next success) ≈ 98%, which better matches my intuition about potentially deterministic processes.
  3. When making predictions far beyond our observed data, the likelihood of extreme underlying probabilities matters a lot. For example, after seeing 12/12 successes, how confident should we be in seeing a quadrillion more successes? Laplace's uniform prior assigns this very low probability, while my prior gives it significant weight.

Hinton legitimizes the AI safety movement

Hmm. He seems pretty periphery to the AI safety movement, especially compared with (e.g.) Yoshua Bengio.

Hey TurnTrout.

I've always thought of your shard theory as something like path-dependence? For example, a human is more excited about making plans with their friend if they're currently talking to their friend. You mentioned this in a talk as evidence that shard theory applies to humans. Basically, the shard "hang out with Alice" is weighted higher in contexts where Alice is nearby.

  • Let's say  is a policy with state space  and action space .
  • A "context" is a small moving window in the state-history, i.e. an element of  where  is a small positive integer.
  • A shard is something like , i.e. it evaluates actions given particular states.
  • The shards  are "activated" by contexts, i.e.  maps each context to the amount that shard  is activated by the context.
  • The total activation of , given a history , is given by the time-decay average of the activation across the contexts, i.e. 
  • The overall utility function  is the weighted average of the shards, i.e. 
  • Finally, the policy  will maximise the utility function, i.e. 

Is this what you had in mind?

Why do you care that Geoffrey Hinton worries about AI x-risk?

  1. Why do so many people in this community care that Hinton is worried about x-risk from AI?
  2. Do people mention Hinton because they think it’s persuasive to the public?
  3. Or persuasive to the elites?
  4. Or do they think that Hinton being worried about AI x-risk is strong evidence for AI x-risk?
  5. If so, why?
  6. Is it because he is so intelligent?
  7. Or because you think he has private information or intuitions?
  8. Do you think he has good arguments in favour of AI x-risk?
  9. Do you think he has a good understanding of the problem?
  10. Do you update more-so on Hinton’s views than on Yann LeCun’s?

I’m inspired to write this because Hinton and Hopfield were just announced as the winners of the Nobel Prize in Physics. But I’ve been confused about these questions ever since Hinton went public with his worries. These questions are sincere (i.e. non-rhetorical), and I'd appreciate help on any/all of them. The phenomenon I'm confused about includes the other “Godfathers of AI” here as well, though Hinton is the main example.

Personally, I’ve updated very little on either LeCun’s or Hinton’s views, and I’ve never mentioned either person in any object-level discussion about whether AI poses an x-risk. My current best guess is that people care about Hinton only because it helps with public/elite outreach. This explains why activists tend to care more about Geoffrey Hinton than researchers do.

Answer by Cleo Nardo20

This is a Trump/Kamala debate from two LW-ish perspectives: https://www.youtube.com/watch?v=hSrl1w41Gkk

Cleo NardoΩ120

the base model is just predicting the likely continuation of the prompt. and it's a reasonable prediction that, when an assistant is given a harmful instruction, they will refuse. this behaviour isn't surprising.

it's quite common for assistants to refuse instructions, especially harmful instructions. so i'm not surprised that base llms systestemically refuse harmful instructions from than harmless ones.

Load More