Charlie Steiner

LW1.0 username Manfred. Day job is condensed matter physics, hobby is thinking I know how to assign anthropic probabilities.

Charlie Steiner's Comments

Sparsity and interpretability?

I feel like this is trying to apply a neural network where the problem specification says "please train a decision tree." Even when you are fine with part of the NN not being sparse, it seems like you're just using the gradient descent training as an elaborate img2vec method.

Maybe the idea is that you think a decision tree is too restrictive, and you want to allow more weightings and nonlinearities? Still, it seems like if you can specify from the top down what operations are "interpretable," this will give you some tree-like structure that can be trained in a specialized way.

Human instincts, symbol grounding, and the blank-slate neocortex

This was just on my front page for me, for some reason. So, it occurs to me that the example of the evolved FPGA is precisely the nightmare scenario for the CCA hypothesis.

If neurons behave according to simple rules during growth and development, and there are only smooth modulations of chemical signals during development, then nevertheless you might get regions of the cortex that look very similar, but whose cells are exploiting the hardly-noticeable FPGA-style quirks of physics in different ways. You'd have to detect the difference by luckily choosing the right sort of computational property to measure.

Pessimism over AGI/ASI causing psychological distress?

Nobody is going to take time off from their utopia to spend resources torturing me. Now, killing me on the way to world domination is more plausible, but if someone solves all the technical problems required to actually use AI for world domination, the chances are still well in favor of them being some generally nice, cosmopolitan person in a lab somewhere.

No, unfortunately, it's far more likely that I will be killed by pure mistake, rather than malice.

You seem to go out of your way to make your thought experiments be about foreigners targeting you. Have you considered that maybe your concerns about AI here are an expression of an underlying anxiety about bad foreigners?

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

I am usually opposed on principle to calling something "SSA" as a description of limiting behavior rather than inside-view reasoning, but I know what you mean and yes I agree :P

I am still surprised that everyone is just taking Solomonoff induction at face value here and not arguing for anthropics. I might need to write a follow-up post to defend the Presumptuous Philosopher, because I think there's a real case the Solomonoff induction actually is missing something. I bet I can make it do perverse things in decision problems that involve being copied.

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

Good catch - I'm missing some extra factors of 2 (on average).

And gosh, I expected more people defending the anthropics side of the dilemma here.

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

Yeah, the log(n) is only the absolute minimum. If you're specifying yourself mostly by location, then for there to be n different locations you need at least log(n) bits on average (but in practice more), for example.

But I think it's plausible that the details can be elided when comparing two very similar theories - if the details of the bridging laws are basically the same and we only care about the difference in complexity, that difference might be about log(n).

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

I'm not really sure what you're arguing for. Yes, I've elided some details of the derivation of average-case complexity of bridging laws (which has gotten me into a few factors of two worth of trouble, as Donald Hobson points out), but it really does boil down to the sort of calculation I sketch in the paragraphs directly after the part you quote. Rather than just saying "ah, here's where it goes wrong" by quoting the non-numerical exposition, could you explain what conclusions you're led to instead?

The Presumptuous Philosopher, self-locating information, and Solomonoff induction

What's the minimum number of bits required to specify "and my camera is here," in such a way that it allows your bridging-law camera to be up to N different places?

In practice I agree that programs won't be able to reach that minimum. But maybe they'll be able to reach it relative to other programs that are also trying to set up the same sorts of bridging laws.

An overview of 11 proposals for building safe advanced AI

I noticed myself mentally grading the entries by some extra criteria. The main ones being something like "taking-over-the-world competitiveness" (TOTWC, or TOW for short) and "would I actually trust this farther than I could throw it, once it's trying to operate in novel domains?" (WIATTFTICTIOITTOIND, or WIT for short).

A raw statement of my feelings:

  1. Reinforcement learning + transparency tool: High TOW, Very Low WIT.
  2. Imitative amplification + intermittent oversight: Medium TOW, Low WIT.
  3. Imitative amplification + relaxed adversarial training: Medium TOW, Medium-low WIT.
  4. Approval-based amplification + relaxed adversarial training: Medium TOW, Low WIT.
  5. Microscope AI: Very Low TOW, High WIT.
  6. STEM AI: Low TOW, Medium WIT.
  7. Narrow reward modeling + transparency tools: High TOW, Medium WIT.
  8. Recursive reward modeling + relaxed adversarial training: High TOW, Low WIT.
  9. AI safety via debate with transparency tools: Medium-Low TOW, Low WIT.
  10. Amplification with auxiliary RL objective + relaxed adversarial training: Medium TOW, Medium-low WIT.
  11. Amplification alongside RL + relaxed adversarial training: Medium-low TOW, Medium WIT.
Load More