Comments

Wei Dai13h40

Perhaps half of the value of misaligned AI control is from acausal trade and half from the AI itself being valuable.

Why do you think these values are positive? I've been pointing out, and I see that Daniel Kokotajlo also pointed out in 2018 that these values could well be negative. I'm very uncertain but my own best guess is that the expected value of misaligned AI controlling the universe is negative, in part because I put some weight on suffering-focused ethics.

Wei Dai14h197

If something is both a vanguard and limited, then it seemingly can't stay a vanguard for long. I see a few different scenarios going forward:

  1. We pause AI development while LLMs are still the vanguard.
  2. The data limitation is overcome with something like IDA or Debate.
  3. LLMs are overtaken by another AI technology, perhaps based on RL.

In terms of relative safety, it's probably 1 > 2 > 3. Given that 2 might not happen in time, might not be safe if it does, or might still be ultimately outcompeted by something else like RL, I'm not getting very optimistic about AI safety just yet.

The argument is that with 1970′s tech the soviet union collapsed, however with 2020 computer tech (not needing GenAI) it would not.

I note that China is still doing market economics, and nobody is trying (or even advocating, AFAIK) some very ambitious centrally planned economy using modern computers, so this seems like pure speculation? Has someone actually made a detailed argument about this, or at least has the agreement of some people with reasonable economics intuitions?

Wei Dai2d1518

I've arguably lived under totalitarianism (depending on how you define it), and my parents definitely have and told me many stories about it. I think AGI increases risk of totalitarianism, and support a pause in part to have more time to figure out how to make the AI transition go well in that regard.

Even if someone made a discovery decades earlier than it otherwise would have been, the long term consequences of that may be small or unpredictable. If your goal is to "achieve high counterfactual impact in your own research" (presumably predictably positive ones) you could potentially do that in certain fields (e.g., AI safety) even if you only counterfactually advance the science by a few months or years. I'm a bit confused why you're asking people to think in the direction outlined in the OP.

Some of my considerations for college choice for my kid, that I suspect others may also want to think more about or discuss:

  1. status/signaling benefits for the parents (This is probably a major consideration for many parents to push their kids into elite schools. How much do you endorse it?)
  2. sex ratio at the school and its effect on the local "dating culture"
  3. political/ideological indoctrination by professors/peers
  4. workload (having more/less time/energy to pursue one's own interests)

I added this to my comment just before I saw your reply: Maybe it changes moment by moment as we consider different decisions, or something like that? But what about when we're just contemplating a philosophical problem and not trying to make any specific decisions?

I mostly offer this in the spirit of "here's the only way I can see to reconcile subjective anticipation with UDT at all", not "here's something which makes any sense mechanistically or which I can justify on intuitive grounds".

Ah I see. I think this is incomplete even for that purpose, because "subjective anticipation" to me also includes "I currently see X, what should I expect to see in the future?" and not just "What should I expect to see, unconditionally?" (See the link earlier about UDASSA not dealing with subjective anticipation.)

ETA: Currently I'm basically thinking: use UDT for making decisions, use UDASSA for unconditional subjective anticipation, am confused about conditional subjective anticipation as well as how UDT and UDASSA are disconnected from each other (i.e., the subjective anticipation from UDASSA not feeding into decision making). Would love to improve upon this, but your idea currently feels worse than this...

As you would expect, I strongly favor (1) over (2) over (3), with (3) being far, far worse for ‘eating your whole childhood’ reasons.

Is this actually true? China has (1) (affirmative action via "Express and objective (i.e., points and quotas)") for its minorities and different regions and FWICT the college admissions "eating your whole childhood" problem over there is way worse. Of course that could be despite (1) not because of it, but does make me question whether (3) ("Implied and subjective ('we look at the whole person').") is actually far worse than (1) for this.

Intuitively this feels super weird and unjustified, but it does make the "prediction" that we'd find ourselves in a place with high marginal utility of money, as we currently do.

This is particularly weird because your indexical probability then depends on what kind of bet you're offered. In other words, our marginal utility of money differs from our marginal utility of other things, and which one do you use to set your indexical probability? So this seems like a non-starter to me... (ETA: Maybe it changes moment by moment as we consider different decisions, or something like that? But what about when we're just contemplating a philosophical problem and not trying to make any specific decisions?)

By "acausal games" do you mean a generalization of acausal trade?

Yes, didn't want to just say "acausal trade" in case threats/war is also a big thing.

This was all kinda rambly but I think I can summarize it as "Isn't it weird that ADT tells us that we should act as if we'll end up in unusually important places, and also we do seem to be in an incredibly unusually important place in the universe? I don't have a story for why these things are related but it does seem like a suspicious coincidence."

I'm not sure this is a valid interpretation of ADT. Can you say more about why you interpret ADT this way, maybe with an example? My own interpretation of how UDT deals with anthropics (and I'm assuming ADT is similar) is "Don't think about indexical probabilities or subjective anticipation. Just think about measures of things you (considered as an algorithm with certain inputs) have influence over."

This seems to "work" but anthropics still feels mysterious, i.e., we want an explanation of "why are we who we are / where we're at" and it's unsatisfying to "just don't think about it". UDASSA does give an explanation of that (but is also unsatisfying because it doesn't deal with anticipations, and also is disconnected from decision theory).

I would say that under UDASSA, it's perhaps not super surprising to be when/where we are, because this seems likely to be a highly simulated time/scenario for a number of reasons (curiosity about ancestors, acausal games, getting philosophical ideas from other civilizations).

Load More