Jack R

Wiki Contributions


Flash Classes: Pendulums, Policy-Level Decisionmaking, Saving State

"Go with your gut” [...] [is] insensitive to circumstance.

People's guts seem very sensitive to circumstance, especially compared to commitments.

The alignment problem from a deep learning perspective

But the capabilities of neural networks are currently advancing much faster than our ability to understand how they work or interpret their cognition;

Naively, you might think that as opacity increases, trust in systems decreases, and hence something like "willingness to deploy" decreases. 

How good of an argument does this seem to you against the hypothesis that "capabilities will grow faster than alignment"? I'm viewing the quoted sentence as an argument for the hypothesis.

Some initial thoughts:

  • A highly capable system doesn't necessarily need to be deployed by humans to disempower humans, meaning "deployment" is not necessarily a good concept to use here
  • On the other hand, deployability of systems increases investment in AI (how much?), meaning that increasing opacity might in some sense decreases future capabilities compared to counterfactuals where the AI was less opaque
  • I don't know how much willingness to deploy really decreases from increased opacity, if at all
  • Opacity can be thought of as the inability to predict behavior in a given new environment. As models have scaled, the number of benchmarks we test them on also seems to have scaled, which does help us understand their behavior. So perhaps the measure that's actually important is the "difference between tested behavior and deployed behavior" and it's unclear to me what this metric looks like over time. [ETA: it feels obvious that our understanding of AI's deployed behavior has worsened, but I want to be more specific and sure about that]
Will working here advance AGI? Help us not destroy the world!

I was thinking of the possibility of affecting decision-making, either directly by rising the ranks (not very likely) or indirectly by being an advocate for safety at an important time and pushing things into the Overton window within an organization. 

I imagine Habryka would say that a significant possibility here is that joining an AGI lab will wrongly turn you into an AGI enthusiast. I think biasing effects like that are real, though I also think it's hard to tell in cases like that how much you are biased v.s. updating correctly on new information, and one could make similar bias claims about the AI x-risk community (e.g. there is social pressure to be doomy; only being exposed to heuristic arguments for doom and few heuristic arguments for optimism will bias you to be doomier than you would be given more information).

Will working here advance AGI? Help us not destroy the world!

It seems like you are confident that the delta in capabilites would outweigh any delta in general alignment sympathy. Is this what you think?

A central AI alignment problem: capabilities generalization, and the sharp left turn

Attempting to manually specify the nature of goodness is a doomed endeavor, of course, but that's fine, because we can instead specify processes for figuring out (the coherent extrapolation of) what humans value. […] So today's alignment problems are a few steps removed from tricky moral questions, on my models.

I‘m not convinced that choosing those processes is significantly non-moral. I might be misunderstanding what you are pointing at, but it feels like the fact that being able to choose the voting system gives you power over the vote’s outcome is evidence of this sort of thing - that meta decisions are still importantly tied to decisions.

Criticism of EA Criticism Contest

I think there should be a word for your parsing, maybe "VNM utilitarianism," but I think most people mean roughly what's on the wiki page for utilitarianism:

Utilitarianism is a family of normative ethical theories that prescribe actions that maximize happiness and well-being for all affected individuals

Where I agree and disagree with Eliezer

It's not obvious to me that the class of counter-examples "expertise, in most fields, is not easier to verify than to generate" are actually counter-examples. For example for "if you're not a hacker, you can't tell who the good hackers are," it still seems like it would be easier to verify whether a particular hack will work than to come up with it yourself, starting off without any hacking expertise.

Human values & biases are inaccessible to the genome

Could you clarify a bit more what you mean when you say "X is inaccessible to the human genome?"

Information Loss --> Basin flatness

Ah okay -- I have updated positively in terms of the usefulness based on that description, and have updated positively on the hypothesis "I am missing a lot of important information that contextualizes this project," though still confused. 

Would be interested to know the causal chain from understanding circuit simplicity to the future being better, but maybe I should just stay posted (or maybe there is a different post I should read that you can link me to; or maybe the impact is diffuse and talking about any particular path doesn't make that much sense [though even in this case my guess is that it is still helpful to have at least one possible impact story]).

Also, just want to make clear that I made my original comment because I figured sharing my user-experience would be helpful (e.g. via causing a sentence about the ToC), and hopefully not with the effect of being discouraging / being a downer.

Information Loss --> Basin flatness

I didn't finish reading this, but if it were the case that:

  • There were clear and important implications of this result for making the world better (via aligning AGI)
  • These implications were stated in the summary at the beginning

then I very plausibly would have finished reading the post or saved it for later.

ETA: For what it's worth, I still upvoted and liked the post, since I think deconfusing ourselves about stuff like this is plausibly very good and at the very least interesting. I just didn't like it enough to finish reading it or save it, because from my perspective it's expected usefulness wasn't high enough given the information I had.

Load More