Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.



Wiki Contributions


I don't especially think AI capabilities increases are bad on the margin, but if I did I would think of this as a multilateral disarmament problem where those who have the most capabilities (relative to something else, population/economy) and worst technological coordination should disarm first, similar to nukes; that would currently indicate US and UK over China etc. China has more precedent for government control over the economy than the West, so could more easily coordinate AI slowdown.

If the falsifying Turing-computable agent has access to the oracle A' and a different oracle A'', and A' and A'' give different answers on some Turing machine (which must never halt if A' and A'' are arbitration oracles), then there is some way to prove that by exhibiting this machine.

The thing I'm proving is that, given that the falsifier knows A' is an arbitration oracle, that doesn't help it falsify that oracle B satisfies property P. I'm not considering a case where there are two different putative arbitration oracles. In general it seems hard for 2 arbitration oracles to be more helpful than 1 but I'm not sure exactly how to prove this. Maybe it's possible to use the fact that a single arbitration oracle can find two different arbitration oracles by creating two different models of PA?

Do you think of counterfactuals as a speedup on evolution? Could this be operationalized by designing AIs that quantilize on some animal population, therefore not being far from the population distribution, but still surviving/reproducing better than average?

Note the preceding

Let's first, within a critical agential ontology, disprove some very basic forms of determinism.

I'm assuming use of a metaphysics in which you, the agent, can make choices. Without this metaphysics there isn't an obvious motivation for a theory of decisions. As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.

Maybe this metaphysics leads to contradictions. In the rest of the post I argue that it doesn't contradict belief in physical causality including as applied to the self.

AFAIK the best known way of reconciling physical causality with "free will" like choice is constructor theory, which someone pointed out was similar to my critical agential approach.

To expand on strawberries vs diamonds:

It seems to me that the strawberry problem is likely easier than the "turn the universe into diamond" problem. Immediate reasons:

  • the strawberry problem is bounded in space and time
  • strawberry materials can be conveniently placed close to the strawberry factory
  • turning the universe into diamond requires nanobots to burrow through a variety of materials
  • turning the universe into diamond requires overcoming all territorial adversaries trying to protect themselves from nanobots
  • turning the universe into diamond requires not sabotaging the nanobots' energy and other resources in the process, whereas the strawberry factory can be separated from the strawberries
  • turning the universe into diamond is more likely to run into arcane physics (places where our current physics theories are wrong or incomplete, e.g. black holes)

In more detail, here's how a strawberry nanofactory might work:

  • a human thinks about how to design nanotech, what open problems there are, what modular components to factor the problem into
  • an AI system solves some of these problems, designing components that pass a wide variety of test cases; some test cases are in physical simulation and some are real-world small cases (e.g. scanning a small cluster of cells). There might also be some mathematical proofs that the components satisfy certain properties under certain assumptions about physics.
  • one of these components is for creating the initial nanobots from cells. Nanotech engineers can think about what sub-problems there are (e.g. protein folding) and have AI systems help solve these problems.
  • one of these components is for scanning a strawberry. The nanobots should burrow into the strawberry bit by bit, take sensory readings sent to a computer.
  • one of these components is for inferring the strawberry structure from readings. This can be approximate Bayesian inference (like a diffusion model in voxel space), given that there are enough sensory readings that the inference problem isn't especially difficult. There can be "priors" put in for expecting the strawberry to have cells etc, and some of these expectations can be learned from data.
  • one of these components is for translating a strawberry voxel map to a real strawberry. This is like a 3d printer. The nanobots need to move to strawberry materials, gather them, move to the strawberry printing location, and deposit the materials at the right place. Some of this involves an expanding process where a machine builds components that build other components, similar to a ribosome.
  • a big computer might be necessary for some of these steps; nanobots could help build this computer from designs that are mostly created by humans, but faster than present designs due to the additional physical possibilities opened by nanotechnology

None of this requires long-term (>1 month) consequentialism.

I didn't write that reply (or this one) using the method. IMO it's more appropriate to longform.

AI improving itself is most likely to look like AI systems doing R&D in the same way that humans do. “AI smart enough to improve itself” is not a crucial threshold, AI systems will get gradually better at improving themselves. Eliezer appears to expect AI systems performing extremely fast recursive self-improvement before those systems are able to make superhuman contributions to other domains (including alignment research), but I think this is mostly unjustified. If Eliezer doesn’t believe this, then his arguments about the alignment problem that humans need to solve appear to be wrong.

One different way I've been thinking about this issue recently is that humans have fundamental cognitive limits e.g. brain size that AGI wouldn't have. There are possible biotech interventions to fix these but the easiest ones (e.g. just increase skull size) still require decades to start up. AI, meanwhile, could be improved (by humans and AIs) on much faster timescales. (How important something like brain size is depends on how much intellectual progress is explained by max intelligence than total intelligence; a naive reading of intellectual history would say max intelligence is important given that a high percentage of relevant human knowledge follows from <100 important thinkers.)

This doesn't lead me to assign high probability to "takeoff in 1 month", my expectation is still that AI improving AI will be an extension of humans improving AI (and then centaurs improving AI), but the iteration cycle time could be a lot faster due to AIs not having fundamental human cognitive limits.

“myopia” (not sure who correctly named this as a corrigibility principle),

I think this is from Paul Christiano, e.g. this discussion.

I've been thinking recently about AI alignment perhaps being better thought of as a subfield of cognitive science than either AI (since AI focuses on artificial agents, not human values) or philosophy (since philosophy is too open-ended); cognitive science is a finite endeavor (due to the limited size of the human brain) compatible with executable philosophy.

It seems to me that an approach that would "work" for AI alignment (in the sense of solving or reframing it) would be to understand the human mind well enough to determine whether it has "values" / "beliefs" / etc; if it does, then an aligned AI can be programmed to be aligned with these values/beliefs; if it doesn't; then the AI alignment problem must be reframed so as to be meaningful and meaningfully solvable. This isn't guaranteed to work "in time" but it seems to have the virtue of working eventually at all, which is nice.

(Btw, although I got my degree in computer science / AI, I worked in Noah Goodman's lab at Stanford on cognitive science and probabilistic programming, see for an intro to this lab's approach)

Load More