LW1.0 username Manfred. Day job is condensed matter physics, hobby is thinking I know how to assign anthropic probabilities.
What Everett says in his thesis is that if the measure is additive between orthogonal states, it's the norm squared. Therefore we should use the norm squared of observers when deciding in how to weight their observations.
But this is a weird argument, not at all the usual sort of argument used to pin down probabilities - the archetypal probability arguments rely on things like ignorance and symmetry. Everett just says "Well, if we put a measure on observers that doesn't have weird cross-state interactions, it's the norm squared." But understanding why humans described by the Schrodinger equation wouldn't see weird cross-state probability flows still requires additional thought (that's a bit hard in the non-Hamiltonian-eigenstate observer + environment basis Everett uses for convenience).
But I think that that's an argument you can make in terms of things like ignorance and symmetry, so I do think the problem is somewhat solved. But it's not necessarily easy to understand or widespread, and the intervening decades have had more than a little muddying of the waters from all sides, from non-physicist philosophers to David Deutsch.
I don't totally understand the Liouville's theorem argument, but I think it's aimed at a more subtle point about choosing the common-sense measure for the underlying Hilbert space.
I was kind of hoping this post would be more about moral authority as it actually exists in our morally-neutral universe. For having subjectivism in the title, it was actually all about objectivism.
I'm reminded of that aphorism about the guy writing a book on magic, and he'd get asked if it was about "real magic." And he'd have to say no, stage magic, because real magic, to the questioner, means something not real, while the sort of magic that can really be done is not real magic.
How does someone whose moral judgment you trust actually get that trust, in the real world? It's okay if this looks more like "stage magic" than "real magic."
I'm still holding out hope for jumping straight to FAI :P Honestly I'd probably feel safer switching on a "big human" than a general CIRL agent that models humans as Boltzmann-rational.
Though on the other hand, does modern ML research already count as trying to use UFAI to learn how to build FAI?
I care about things other than suffering.
You have to put a measure on things. I care less about unlikely things, things with small measure, even if there's a multiverse.
I care about things other than subjective experience.
And seriously, I care about things other than suffering.
I normally don't think of most functions as polynomials at all - in fact, I think of most real-world functions as going to zero for large values. E.g. the function "dogness" vs. "nose size" cannot be any polynomial, because polynomials (or their inverses) blow up unrealistically for large (or small) nose sizes.
I guess the hope is that you always learn even polynomials, oriented in such a way that the extremes appear unappealing?
I recently got reminded of this post. I'm not sure I agree with it, because I think we have different paradigms for AI alignment - I'm not nearly so concerned with the sort of oversight that relies on looking at the state of the computer. Though I have nothing against the sort of oversight where you write a program to tell you about what's going on with your model.
Instead, I think that anticipating the effects of QC on AI alignment is a task in prognosticating how ML is going to change if you make quantum computing available. I think the relevant killer app is not going to be Grover's algorithm, but quantum annealing. So we have to try to think about what kind of ML you could do if you could get a large speedup on optimization on objective functions but were limited to a few hundred to a few thousand bits at a time (assuming that that's true for near-future quantum computers).
And given those changes, what changes for alignment? Seems like a hard question.
Sunday, Sunday, Sunday, at the Detroit Dragway!
Steve's big thoughts on alignment in the brain probably deserve a review. Component posts include https://www.lesswrong.com/posts/diruo47z32eprenTg/my-computational-framework-for-the-brain , https://www.lesswrong.com/posts/DWFx2Cmsvd4uCKkZ4/inner-alignment-in-the-brain , https://www.lesswrong.com/posts/jNrDzyc8PJ9HXtGFm/supervised-learning-of-outputs-in-the-brain
Interestingly, I think there aren't any of my posts I should recommend - basically all of them are speculative. However, I did have a post called Gricean communication and meta-preferences that I think is still fairly interesting speculation, and I've never gotten any feedback on it at all, so I'll happily ask for some: https://www.lesswrong.com/posts/8NpwfjFuEPMjTdriJ/gricean-communication-and-meta-preferences .
I think this looks fine for IDA - the two problems remain the practical one of implementing Bayesian reasoning in a complicated world, and the philosophical one that probably IDA on human imitations doesn't work because humans have bad safety properties.
Hm, I thought that was what Evan called it, but maybe I misheard. Anyhow, I mean the problem where because you can model humans in different ways, we have no unique utility function. We might think of this as having not just one Best Intentional Stance, but a generalizable intentional stance with knobs and dials on it, different settings of which might lead to viewing the subject in different ways.
I call such real-world systems that can be viewed non-uniquely through the lens of the intentional stance "approximate agents."
To the extent that mesa-optimizers are approximate agents, this raises familiar and difficult problems with interpretability. Checking how good an approximation is can require knowing about the environment it will get put into, which (that being the future) is hard.