LESSWRONG
LW

1414
jessicata
10253Ω824709620
Message
Dialogue
Subscribe

Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Blog: unstableontology.com

Twitter: https://twitter.com/jessi_cata

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Emergent morality in AI weakens the Orthogonality Thesis
jessicata23d84

I've written criticisms of orthogonality: The Obliqueness Thesis, Measuring intelligence and reverse-engineering goals.

While I do think human moral reasoning suggests non-orthogonality, it's a somewhat conceptually tricky case. So recently I've been thinking about more straightforward ways of showing non-orthogonality relative to an architecture.

For example, consider RL agents playing Minecraft. If you want to get agents that beat the game, you could encode this preference function directly as a reward function, reward it when it beats the game. However this fails in practice.

The alternative is reward shaping. Reward it for pursuing instrumental values like exploring or getting new resources. This agent is much more likely to win, despite it being mis-aligned.

What this shows is that reinforcement learning is a non-orthogonal architecture. Some goals (reward functions) lead to more satisfaction of convergent instrumental goals than others.

Slightly tricker case is humans. Direct encoding of inclusive fitness as human neural values seems like it would produce high fitness, but we don't see humans have this, therefore the space evolution is searching over is probably non-orthogonal.

Maybe it's like the RL case where organisms are more likely to have fitness if they have neural encodings of instrumental goals, which are easier to optimize short-term. Fixed-action patterns suggest something like this, there's a "terminal value" of engaging in fixed action patterns (which happen to be ones that promote fitness; evolution searched over many possible fixed action patterns).

So instead of assuming "organisms get more fitness by having values aligned with inclusive fitness" we could re-frame as, "inclusive fitness is a meta-value over organisms (including their values), some values lead to higher inclusive fitness than others, empirically".

This approach could be used to study human morality. Maybe some tendencies to engage in moral reasoning lead to more fitness, even if moral reasoning isn't straightforwardly aligned with fitness. Perhaps because, morality is a convenient proxy that works in bounded rationality.

A thesis would be something like, orthogonality holds for almost no architectures. Relative to an architecture like RL or neural encodings of values, there are almost always "especially smart values" that lead to more convergent instrumental goal achievement. Evolution will tend to find these empirically.

This doesn't contradict that there is some architecture that is orthogonal, which I take to be the steelman of the orthogonality thesis. However it suggests that even if this steelman is true, it has limited applicability to empirically realized agent architectures, and in particular doesn't apply to human preference/morality.

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

it doesn't seem highly problematic that we can access mathematical facts that "live partially outside the universe" via "reasoning" or "logical correlation", where the computations in our minds are entangled in some way with computations or math that we're not physically connected to.

While this is one way to think about, it seems first of all that it is limited to "small" mathematical facts that are computable in physics (not stuff like the continuum hypothesis). With respect to the entanglement, while it's possible to have a Bayes net where the mathematical fact "causes" both computers to output the answers, there's an alternative approach where the computers are two material devices that output the same answer because of physical symmetry. Two processes having symmetrical outputs doesn't in general indicate they're "caused by the same thing".

arguments in favor of some types of mathematical realism/platonism (e.g., universe and multiverse views of set theory)

Not familiar with these arguments. I think a formalist approach would be, the consistency of ZFC already implies a bunch of "small" mathematical facts (e.g. ZFC can't prove any false Π1 arithmetic statements). I think it's pretty hard to find a useful a formal system that is strictly finitist, however my intuition is that set theory goes too far. (This is part of why I have been recently thinking about "reverse mathematics", relatively weak second-order arithmetic theories like WKL0)

Another reason I'm not ready to be super-convinced in this direction is I think philosophy is often very hard and slow, therefore as you say "It is somewhat questionable to infer from lack of success to define, say, optimal decision theories, that no such decision theory exists."

Yeah that makes sense. I think maybe what I've become more reluctant to endorse over time, is a jump from "an intuition that something here works, plus alternative solutions failing" to "here, this thing I came up with or something a lot like it is going to work". Like going from failure of CDT to success of EDT, or failure of CDT+EDT to TDT. There is not really any assurance that the new thing will work either.

we're not sure whether we'll eventually keep them when we're philosophically mature, and we don't know how to translate these values to a new ontology that lack these entities

I see this is a practical consideration in many value systems, although perhaps either (a) the pragmatic considerations go differently for different people, (b) different systems could be used for different pragmatic purposes. It at least presents a case for explaining the psychological phenomena of different ontologies/values even ones that might fail in physicalism.

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

The precalculated "stochastic"variables thing, and the on-the-fly calls to the universe's rand() aren't the same thing, because they have different ontological implications.

Yeah they can be distinguished ontologically. Although there are going to be multiple Bayes nets expressing the same joint distribution. So it's not like there's going to be a canonical ordering.

I would guess that the standard rationalist answer is "they are indistinguishable empirically". But rationalism lacks a proof that unempirical questions are unaswerable or meaningless (unlike logical positivism..but LP is explicitly rejected).

I get that active dis-belief in further facts (such as counterfactuals) can be dogmatic. Rather, it's more of a case of, we can get an adequate empirical account without them, and adding them has problems (like causal counterfactuals implying violations of physical law).

Part of where I'm coming with this is a Chalmers like framework. Suppose there are 2 possible universes, they have the same joint distribution, but different causal ordering. Like maybe in one the stochasticity is on the fly, in the other it's pre-computed. They imply the same joint distribution and the same set of "straightforward" physical facts (particle trajectories and so on). Yet there is a distinction, a further fact.

In which case... The agents in these universes can't have epistemic access to these further facts, it's similar to with the zombie argument. A simple approach is "no further facts", although assuming this is literally the case might be dogmatic. It's more like, don't believe in further facts prior to a good/convincing account of them, where the ontological complexity is actually worth it.

Note that compaibilism and naturalistic libertarian are both viable given our present state of knowledge...so there is no necessity to adopt anti realism.

Well it's more like, most specific theories of these have problems. Like, the counterfactuals being really weird, corresponding to bad decision theories, etc. And it seems simpler to say, the counterfactuals don't exist? Even if assigning high probability to it is dogmatic.

So much for MWI then ..according to it, every world is counterfactual to every other.

If instead of QM our best physics said something like "there are true random coin flips" then it would be a bit of a stretch to posit a MWI-like theory there, that there exist other universes where the coin flips go differently. The case for MWI is somewhat more complex, it has to do with the Copenhagen interpretation being a lot more complicated than "here, have some stochastic coin flips".

How do you know counterfactuals require violations of physics itself? The possibility of something happening that wasn't what happened, only requires (genuine) indeterminism, as above.

Well we can disjunct on high or low universal K complexity. Assuming low universal K complexity, counterfactuals really do have problems, there are a lot of implications. Assuming high universal K complexity, I guess they're more well defined. Though you can't counterfact on just anything, you have to counterfact on a valid quantum event. So like, how many counterfactuals there are depends on the density of relevant quantum events to, say, a computer.

I guess you could make the case from QM that the classical trajectory has high K complexity, therefore counterfactual alternatives to the classical trajectory don't require physical law violations.

If not for QM though, our knowledge would be compatible with determinism / low K complexity of the classical trajectory, and it seems like a philosophy should be able to deal with that case (even if it empirically seems not to be the case)

You can hypothetically plan out a moon landing before you perform it for the first time.

Right so, counterfactual reasoning is practically useful, this is more about skepticism of the implied metaphysics. There might be translations like, observing that a deterministic system can be factored (multiple ways) as interacting systems with inputs/outputs, each factoring implying additional facts about the deterministic system. Without having to say that any of these factorings is correct in the sense of correctness about further facts.

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

Ah. I think first of all, it is possible to do ontology in a materialist-directed or idealist-directed way, and the original post is materalist-directed.

I get that the joint distribution over physical facts determines a joint distribution over observations, and we couldn't observe further facts about the joint distribution beyond those implied by the distribution over observations.

I do feel there are a few differences though. Like, in the process of "predicting as if physics" we would be expanding a huge hidden variable theory, yet declaring the elements of the theory unreal. Also there would be issues like, how large is the mental unit doing the analysis? Is it a single person over time or multiple people, and over how much time? What theory of personal identity? What is the boundary between something observed or not observed? (With physicalism, although having some boundary between observed / not observed is epistemically relevant, it doesn't have to be exactly defined since it's not ontological; the ontology is something like an algebraic closure that is big enough to contain the state distinctions that are observed.)

I think maybe someone could try to make an idealist/solipsist minimal philosophy work but it's not what I've done and it doesn't seem easy to include this without running into problems like epistemic stability assumptions.

Reply
Theory of culture as waste.
jessicata1mo30

If you haven't, consider reading Bataille (especially The Accursed Share).

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

This seems like a case of Bayesian inference. Like, we start from the observation that humans exist having the properties they are, and then find the set of strings consistent with that. Like, start from a uniform measure on the strings and then condition on "the string produces humans".

Which is computationally intractable of course. The usual Bayesian inference issues. Though Bayesian inference would be hard if stochasticity was generated on the fly rather than being initial, too.

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

Good point; I was mentioning a fundamentalist mainly to ensure that they unironically have standard beliefs like the resurrection, but it applies to a lot more Christians than fundamentalists. (I think Unitarians don't generally believe in the resurrection as a literal physical event?)

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

I'm not sure why you're thinking about guessing model weights here. The thing I'm thinking with stochastic models is the forward pass bit, Monte Carlo sampling. I'm not sure why pre-computed randomness would be a problem for that portion.

As a weird example: Say there's a memoized random function mapping strings to uniform random bits. This can't really be pre-computed, because it's very big. But it can be lazily evaluated, as if pre-computed. Now the stochastic model can query the memoized random function with a unique specification of the situation it's querying. This should be equivalent to flipping coins mid-run.

Alternatively, if the Monte Carlo process is sequential, then it can just "read the next bit", that's computationally simpler.

Maybe it's not an issue for forward sampling but it is for backprop? Not sure what you mean.

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20

If the universe has high K complexity than any theoretically best model has to be either stochastic or "inherently complex" (which is worse than stochastic)

That might or might not be the case. From current models in practice having to be stochastic to make good predictions, it doesn't follow that the theoretically best models must be. But it could be the case.

I'm not sure why 'partially-stochastic' would ever fail, due to the coding theorem. That is, there is an alternative way of modeling a model that makes stochastic decisions along the way, where all stochastic decisions are made initially and instead of making a new stochastic decision, you read from these initial bits.

Reply
A philosophical kernel: biting analytic bullets
jessicata1mo20
  • Composite objects: Statements about composite objects have implications for microstates. The idea would be that there is no content to statements about composite objects, beyond the implications for microstates.
  • Outside world: Broadly scientific realist so yes.
  • Skeptical hypotheses: Some of the sections include "non-realism", not sure if that counts

But also... Did you read the post? I thought I was clear about including a lot of things in this minimal position?

Reply
Load More
63A philosophical kernel: biting analytic bullets
1mo
21
33Measuring intelligence and reverse-engineering goals
1mo
10
17Towards plausible moral naturalism
2mo
9
23Generalizing zombie arguments
2mo
9
21The Weighted Perplexity Benchmark: Tokenizer-Normalized Evaluation for Language Model Comparison
Ω
2mo
Ω
0
27Why I am not a Theist
2mo
6
20"Self-Blackmail" and Alternatives
7mo
12
96On Eating the Sun
8mo
98
1252024 in AI predictions
8mo
3
96The Obliqueness Thesis
Ω
1y
Ω
19
Load More