Richard_Ngo

Former AI safety research engineer, now PhD student in philosophy of ML at Cambridge. I'm originally from New Zealand but have lived in the UK for 6 years, where I did my undergrad and masters degrees (in Computer Science, Philosophy, and Machine Learning). Blog: thinkingcomplete.blogspot.com

Sequences

Shaping safer goals

Comments

ricraz's Shortform

Greg Egan on universality:

I believe that humans have already crossed a threshold that, in a certain sense, puts us on an equal footing with any other being who has mastered abstract reasoning. There’s a notion in computing science of “Turing completeness”, which says that once a computer can perform a set of quite basic operations, it can be programmed to do absolutely any calculation that any other computer can do. Other computers might be faster, or have more memory, or have multiple processors running at the same time, but my 1988 Amiga 500 really could be programmed to do anything my 2008 iMac can do — apart from responding to external events in real time — if only I had the patience to sit and swap floppy disks all day long. I suspect that something broadly similar applies to minds and the class of things they can understand: other beings might think faster than us, or have easy access to a greater store of facts, but underlying both mental processes will be the same basic set of general-purpose tools. So if we ever did encounter those billion-year-old aliens, I’m sure they’d have plenty to tell us that we didn’t yet know — but given enough patience, and a very large notebook, I believe we’d still be able to come to grips with whatever they had to say.
Coherent behaviour in the real world is an incoherent concept

What is missing here is an argument that the VNM theorem does have important implications in settings where its assumptions are not true. Nobody has made this argument. I agree it's suggestive, but that's very far from demonstrating that AGIs will necessarily be ruthlessly maximising some simple utility function.

"obviously we don't expect a superintelligent AI to be predictably stupid in the way Eliezer lines out"

Eliezer argued that superintelligences will have certain types of goals, because of the VNM theorem. If they have different types of goals, then behaviour which violates VNM is no longer "predictably stupid". For example, if I have a deontological goal, then maybe violating VNM is the best strategy.

ricraz's Shortform

It feels partly like an incentives problem, but also I think a lot of people around here are altruistic and truth-seeking and just don't realise that there are much more effective ways to contribute to community epistemics than standard blog posts.

I think that most LW discussion is at the level where "paying for mistakes" wouldn't be that helpful, since a lot of it is fuzzy. Probably the thing we need first are more reference posts that distill a range of discussion into key concepts, and place that in the wider intellectual context. Then we can get more empirical. (Although I feel pretty biased on this point, because my own style of learning about things is very top-down). I guess to encourage this, we could add a "reference" section for posts that aim to distill ongoing debates on LW.

In some cases you can get a lot of "cheap" credit by taking other people's ideas and writing a definitive version of them aimed at more mainstream audiences. For ideas that are really worth spreading, that seems useful.

ricraz's Shortform

I wanted to register that I don't like "babble and prune" as a model of intellectual development. I think intellectual development actually looks more like:

1. Babble

2. Prune

3. Extensive scholarship

4. More pruning

5. Distilling scholarship to form common knowledge

And that my main criticism is the lack of 3 and 5, not the lack of 2 or 4.

I also note that: a) these steps get monotonically harder, so that focusing on the first two misses *almost all* the work; b) maybe I'm being too harsh on the babble and prune framework because it's so thematically appropriate for me to dunk on it here; I'm not sure if your use of the terminology actually reveals a substantive disagreement.

ricraz's Shortform

"If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive."

Hmm, it sounds like we agree on the solution but are emphasising different parts of it. For me, the question is: who's this "we" that should accept fewer ideas? It's the set of people who agree with my argument that you shouldn't believe things which haven't been fleshed out very much. But the easiest way to add people to that set is just to make the argument, which is what I've done. Specifically, note that I'm not criticising anyone for producing posts that are short and speculative: I'm criticising the people who update too much on those posts.

Mathematical Inconsistency in Solomonoff Induction?

"That is, you could say something like "It's the list of all primes OR the list of all squares. Compressed data: first number is zero""

Just to clarify here (because it took me a couple of seconds): you only need the first number of the compressed data because that is sufficient to distinguish whether you have a list of primes or a list of squares. But as Pongo said, you could describe that same list in a much more compressed way by skipping the irrelevant half of the OR statement.

Mathematical Inconsistency in Solomonoff Induction?

My understanding is that a hypothesis is a program which generates a complete prediction of all observations. So there is no specific hypothesis (X OR Y), for the same reason that there is no sequence of numbers which is (list of all primes OR list of all squares).

Note that by "complete prediction of all observations" I don't mean things like "tomorrow you'll see a blackbird", but rather the sense that you get an observation in a MDP or POMDP. If you imagine watching the world through a screen with a given frame rate, every hypothesis has to predict every single pixel of that screen, for each frame.

I don't know where this is explained properly though. In fact I think a proper explanation, which explains how these idealised "hypotheses" relate to hypotheses in the human sense, would basically need to explain what thinking is and also solve the entire embedded agency agenda. For that reason, I place very little weight on claims linking Solomonoff induction to bounded human or AI reasoning.

ricraz's Shortform

Also, I liked your blog post! More generally, I strongly encourage bloggers to have a "best of" page, or something that directs people to good posts. I'd be keen to read more of your posts but have no idea where to start.

ricraz's Shortform

Thanks, these links seem great! I think this is a good (if slightly harsh) way of making a similar point to mine:

"I find that autodidacts who haven’t experienced institutional R&D environments have a self-congratulatory low threshold for what they count as research. It’s a bit like vanity publishing or fan fiction. This mismatch doesn’t exist as much in indie art, consulting, game dev etc"

ricraz's Shortform

I feel like this comment isn't critiquing a position I actually hold. For example, I don't believe that "the correct next move is for e.g. Eliezer and Paul to debate for 1000 hours". I am happy for people to work towards building evidence for their hypotheses in many ways, including fleshing out details, engaging with existing literature, experimentation, and operationalisation.

Perhaps this makes "proven claim" a misleading phrase to use. Perhaps more accurate to say: "one fully fleshed out theory is more valuable than a dozen intuitively compelling ideas". But having said that, I doubt that it's possible to fully flesh out a theory like simulacra levels without engaging with a bunch of academic literature and then making predictions.

I also agree with Raemon's response below.

Load More