LESSWRONG
LW

819
abramdemski
20617Ω3926239210690
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Pointing at Normativity
Implications of Logical Induction
Partial Agency
Alternate Alignment Ideas
Filtered Evidence, Filtered Arguments
CDT=EDT?
Embedded Agency
Hufflepuff Cynicism
11abramdemski's Shortform
Ω
5y
Ω
68
Condensation
abramdemski1dΩ330
  1. Sam's theory gives some evidence for  broad commonality between the data-structures different minds would use to represent the same problem (because it proves this happens under some conditions).
  2. Sam's theory gives a framework for proving useful things about these structures.
  3. Sam's theory of interpretability comes from a kind of usefulness, so there's at least some hope here.
Reply
Condensation
abramdemski1dΩ220

The optimal condensation is not (typically) 1 book per question. Instead, it typically recovers the meaningful latents which you'd want to write down to model the problem. Really, the right thing to do is to work examples to get an intuition for what happens. Sam does some of this in his paper.

Reply
Condensation
abramdemski1dΩ330

Fixed!

Reply
Intentionality
abramdemski3d20

On the model I mentioned, it would (in part) be a function of fit between explicit goals and implicit goals.

Reply
Geometric UDT
abramdemski4dΩ220

Perhaps I'm still not understanding you, but here is my current interpretation of what you are saying:

  • The (expected utility) argument that it is valuable for us to get the ASI to entangle its values with ours relies on the assumption of non-nosy-ness.
    • That is: since we are uncertain which values are ours, but whichever thing we value, we're just as happy to impose that thing on versions of ourselves which do not value that thing, we don't see any increase in expected value from Geometric UDT.

I see this line of reasoning as insisting on taking max-expected-utility according to your explicit model of your values (including your value uncertainty), even when you have an option which you can prove is higher expected utility according to your true values (whatever they are).

My argument has a somewhat frequentist flavor: I'm postulating true values (similar to postulating a true population frequency), and then looking for guarantees with respect to them (somewhat similar to looking for an unbiased estimator). Perhaps that is why you're finding it so counter-intuitive?

The crux of the issue seems to be whether we should always maximize our explicit estimate of expected utility, vs taking actions which we know are better with respect to our true values despite not knowing which values those are. One way to justify the latter would be via Knightian value uncertainty (ie infrabayesian value uncertainty), although that hasn't been the argument I've been trying to make. I'm wondering if a more thoroughly geometric-rationality perspective would provide another sort of justification.

But the argument I'm trying to make here is closer to just: but you know Geometric UDT is better according to your true values, whatever they are!

== earlier draft reply for more context on my thinking ==

Perhaps I'm just not understanding your argument here, and you need to spell it out in more detail? My current interpretation is that you are interpreting "care about both worlds equally" as "care about rainbows and puppies equally" rather than "if I care about rainbows, then I equally want more rainbows in the (real) rainbow-world and the (counterfactual) puppy-world; if I care about puppies, then I equally want more puppies in the (real) puppy-world and the (counterfactual) rainbow-world."

A value hypothesis is a nosy neighbor if[1] it wants the same things for you whether it is your true values or not. So what's being asserted here (your "third if" as I'm understanding it) is that we are confident we've got that kind of relationship with ourselves -- we don't want "our values to be satisfied, whatever they are" -- rather, whatever our values are, we want them to be satisfied across universes, even in counterfactual universes where we have different values.

Maximizing rainbows maximizes the expected value given our value uncertainty, but it is a catastrophe in the case that we are indeed puppy-loving. Moreover, it is an avoidable catastrophe; ...

... and now I think I see your point?

The idea that it is valuable for us to get the ASI to entangle its values with ours relies on an assumption of non-nosyness. 

There is a different way to justify this assumption, 

  1. ^

    (but not "only if"; there are other ways to be a nosy neighbor)

Reply
Geometric UDT
abramdemski5dΩ340

Wait, do you think value uncertainty is equivalent/reducible to uncertainty about the correct prior? 

Yep. Value uncertainty is reduced to uncertainty about the correct prior via the device of putting the correct values into the world as propositions.

Would that mean the correct prior to use depends on your values?

If we construe "values" as preferences, this is already clear in standard decision theory; preferences depend on both probabilities and utilities. UDT further blurs the line, because in the context of UDT, probabilities feel more like a "caring measure" expressing how much the agent cares about how things go in particular branches of possibility.

So one conflicting pair spoils the whole thing, i.e. ignoring the pair is a pareto improvement? 

Unless I've made an error? If the Pareto improvement doesn't impact the pair, then gains-from-trade for both in the pair is zero, making the product of gains-from-trade zero. But the Pareto improvement can't impact the pair, since an improvement for one would be a detriment to the other.

Reply
Geometric UDT
abramdemski5dΩ220

When I try to understand the position you're speaking from, I suppose you're imagining a world where an agent's true preferences are always and only represented by their current introspectively accessible probability+utility,[1] whereas I'm imagining a world where "value uncertainty" is really meaningful (there can be a difference between the probability+utility we can articulate and our true probability+utility).

If 50% rainbows and 50% puppies is indeed the best representation of our preferences, then I agree: maximize rainbows.

If 50% rainbows and 50% puppies is instead a representation of our credences about our unknown true values, my argument is as follows: the best thing for us would be to maximize our true values (whichever of the two this is). If we assume value learning works well, then Geometric UDT is a good approximation of that best option.

  1. ^

    Here "introspectively accessible" really means: what we can understand well enough to directly build into a machine.

Reply
abramdemski's Shortform
abramdemski18d203

I have personally signed the FLI Statement on Superintelligence. I think this is an easy thing to do, which is very useful for those working on political advocacy for AI regulation. I would encourage everyone to do so, and to encourage others to do the same. I believe impactful regulation can become feasible if the extent of agreement on these issues (amongst experts, and amongst the general public) can be made very legible.

Although this open statement accepts nonexpert signatures as well, I think it is particularly important for experts to take a public stance in order to make the facts on the ground highly legible to nontechnical decision-makers. (Nonexpert signatures, of course, help to show a preponderance of public support for AI regulation.) For those on the fence, Ishual has written an FAQ responding to common reasons not to sign.

In addition to signing, you can also write a statement of support and email it to letters@futureoflife.org. This statement can give more information on your agreement with the FLI statement. I think this is a good thing to do; it gives readers a lot more evidence about what signatures mean. It needs to be under 600 characters.

For examples of what other people have written in their statements of support, you can look at the page: https://superintelligence-statement.org/ EG, here is Samuel Buteau's statement:

“Barring an international agreement, humanity will quite likely not have the ability to build safe superintelligence by the time the first superintelligence is built. Therefore, pursuing superintelligence at this stage is quite likely to cause the permanent disempowerment or extinction of humanity. I support an international agreement to ensure that superintelligence is not built before it can be done safely.”

(If you're still hungry to sign more statements after the one, or if you don't quite like the FLI statement but might be interested in signing a different statement, you can PM Ishual about their efforts.)

Reply
Recent AI Experiences
abramdemski1mo43

A skrode does seem like a good analogy, complete with the (spoiler)

skrodes having a built-in vulnerability to an eldrich God, so that skrode users can be turned into puppets readily. (IE, integrating LLMs so deeply into one's workflow creates a vulnerability as LLMs become more persuasive.)

Reply
Load More
28Consciousness as a Distributed Ponzi Scheme
21h
5
113Condensation
Ω
2d
Ω
12
38Myopia Mythology
Ω
3d
Ω
2
47Comparing Payor & Löb
Ω
3d
Ω
1
15Liberation Clippy
4d
2
28Geometric UDT
Ω
5d
Ω
6
27Intentionality
7d
4
70Research Reflections
8d
2
18Donut Theory of Consciousness
9d
0
28Weak-To-Strong Generalization
Ω
10d
Ω
0
Load More
Timeless Decision Theory
9 months ago
(+1874/-8)
Updateless Decision Theory
a year ago
(+1886/-205)
Updateless Decision Theory
a year ago
(+6406/-2176)
Problem of Old Evidence
a year ago
(+4678/-10)
Problem of Old Evidence
a year ago
(+3397/-24)
Good Regulator Theorems
2 years ago
(+239)
Commitment Races
3 years ago
Agent Simulates Predictor
3 years ago
Distributional Shifts
3 years ago
Distributional Shifts
3 years ago
Load More