Send me anonymous feedback: https://docs.google.com/forms/d/e/1FAIpQLScLKiFJbQiuRYBhrBbVYUo_c6Xf0f8DN_blbfpJ-2Ml39g1zA/viewform

Any type of feedback is welcome, including arguments that a post/comment I wrote is net negative.

Some quick info about me:

I have a background in computer science (BSc+MSc; my MSc thesis was in NLP and ML, though not in deep learning).

You can also find me on the EA Forum.

Feel free to reach out by sending me a PM here or on my website.


Why I Am Not in Charge

It’s quite the endorsement to be called the person most likely to get things right.

I couldn't find such an endorsement in Scott Alexander's linked post. The closest thing I could find was:

I can't tell you how many times over the past year all the experts, the CDC, the WHO, the New York Times, et cetera, have said something (or been silent about something in a suggestive way), and then some blogger I trusted said the opposite, and the blogger turned out to be right. I realize this kind of thing is vulnerable to selection bias, but it's been the same couple of bloggers throughout, people who I already trusted and already suspected might be better than the experts in a lot of ways. Zvi Mowshowitz is the first name to come to mind, though there are many others.

If I'm missing something please let me know. I downvoted the OP and wrote this comment because I think and feel that such inaccuracies are bad (even if not intentional) and I don't want them to occur on LW.

Dario Amodei leaves OpenAI

Yesterday Paul Christiano announced that he left OpenAI.

(USA) N95 masks are available on Amazon

I think it's worth checking whether the manufacture's website supports some verification procedure (in which the customer types in some unique code that appears on the respirator). Consider googling the term: [manufacturer name] validation.


(USA) N95 masks are available on Amazon

To support/add-to what ErickBall wrote, my own personal experience with respirators is that one with headbands (rather than ear loops) and a nose clip + nose foam is more likely to seal well.

ofer's Shortform

[COVID-19 related]

It was nice to see this headline:


My own personal experience with respirators is that one with headbands (rather than ear loops) and a nose clip + nose foam is more likely to seal well.

Short summary of mAIry's room

The topic of risks related to morally relevant computations seems very important, and I hope a lot more work will be done on it!

My tentative intuition is that learning is not directly involved here. If the weights of a trained RL agent are no longer being updated after some point[1], my intuition is that the model is similarly likely to experience pain before and after that point (assuming the environment stays the same).

Consider the following hypothesis which does not involve a direct relationship between learning and pain: In sufficiently large scale (and complex environments), TD learning tends to create components within the network, call them "evaluators", that evaluate certain metrics that correlate with expected return. In practice the model is trained to optimize directly for the output of the evaluators (and maximizing the output of the evaluators becomes the mesa objective). Suppose we label possible outputs of the evaluators with "pain" and "pleasure". We get something that seems analogous to humans. A human cares directly about pleasure and pain (which are things that correlated with expected evolutionary fitness in the ancestral environment), even when those things don't affect their evolutionary fitness accordingly (e.g. pleasure from eating chocolate, and pain from getting a vaccine shot).

  1. In TD learning, if from some point the model always perfectly predicted the future, the gradient would always be zero and no weights would be updated. Also, if an already-trained RL agent is being deployed, and there's no longer reinforcement learning going on after deployment (which seems like a plausible setup in products/services that companies sell to customers), the weights would obviously not be updated. ↩︎

Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain

My understanding is that the 2020 algorithms in Ajeya Cotra's draft report refer to algorithms that train a neural network on a given architecture (rather than algorithms that search for a good neural architecture etc.). So the only "special sauce" that can be found by such algorithms is one that corresponds to special weights of a network (rather than special architectures etc.).

Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain

Great post!

we’ll either have to brute-force search for the special sauce like evolution did

I would drop the "brute-force" here (evolution is not a random/naive search).

Re the footnote:

This "How much special sauce is needed?" variable is very similar to Ajeya Cotra's variable "how much compute would lead to TAI given 2020's algorithms."

I don't see how they are similar.

Why I'm excited about Debate

One might argue:

We don't need the model to use that much optimization power, to the point where it breaks the operator. We just need it to perform roughly at human-level, and then we can just deploy many instances of the trained model and accomplish very useful things (e.g. via factored cognition).

So I think it's important to also note that, getting a neural network to "perform roughly at human-level in an aligned manner" may be a much harder task than getting a neural network to achieve maximal rating by breaking the operator. The former may be a much narrower target. This point is closely related to what you wrote here in the context of amplification:

Speaking of inexact imitation: It seems to me that having an AI output a high-fidelity imitation of human behavior, sufficiently high-fidelity to preserve properties like "being smart" and "being a good person" and "still being a good person under some odd strains like being assembled into an enormous Chinese Room Bureaucracy", is a pretty huge ask.

It seems to me obvious, though this is the sort of point where I've been surprised about what other people don't consider obvious, that in general exact imitation is a bigger ask than superior capability. Building a Go player that imitates Shuusaku's Go play so well that a scholar couldn't tell the difference, is a bigger ask than building a Go player that could defeat Shuusaku in a match. A human is much smarter than a pocket calculator but would still be unable to imitate one without using a paper and pencil; to imitate the pocket calculator you need all of the pocket calculator's abilities in addition to your own.

Correspondingly, a realistic AI we build that literally passes the strong version of the Turing Test would probably have to be much smarter than the other humans in the test, probably smarter than any human on Earth, because it would have to possess all the human capabilities in addition to its own. Or at least all the human capabilities that can be exhibited to another human over the course of however long the Turing Test lasts. [...]

Gradient hacking

It does seem useful to make the distinction between thinking about how gradient hacking failures look like in worlds where they cause an existential catastrophe, and thinking about how to best pursue empirical research today about gradient hacking.

Load More