evhub's Comments

Understanding “Deep Double Descent”

Note that double descent also happens with polynomial regression—see here for an example.

Understanding “Deep Double Descent”

I wonder if this is a neural network thing, an SGD thing, or a both thing?

Neither, actually—it's more general than that. Belkin et al. show that it happens even for simple models like decision trees. Also see here for an example with polynomial regression.

Are you aware of this work and the papers they cite?

Yeah, I am. I definitely think that stuff is good, though ideally I want something more than just “approximately K-complexity.”

Understanding “Deep Double Descent”

Ah—thanks for the summary. I hadn't fully read that paper yet, though I knew it existed and so I figured I would link it, but that makes sense. Seems like in that case the flat vs. sharp minima hypothesis still has a lot going for it—not sure how that interacts with the lottery tickets hypothesis, though.

Understanding “Deep Double Descent”

Thanks! And good catch—should be fixed now.

What are some non-purely-sampling ways to do deep RL?

Yep—that's the adversarial training approach to this problem. The problem is that you might not be able to sample all the relevant highly uncertain points (e.g. because you don't know exactly what the deployment distribution will be), which means you have to do some sort of relaxed adversarial training instead, which introduces its own issues.

What are some non-purely-sampling ways to do deep RL?

This is really neat; thanks for the pointer!

What are some non-purely-sampling ways to do deep RL?

Hmmm... not sure if this is exactly what I want. I'd prefer not to assume too much about the environment dynamics. Not sure if this is related to what you're talking about, but one possibility, maybe, for a way in which you could do model-based planning with an explicit reward function but without assuming much about the environment dynamics could be to learn all the dynamics necessary to do model-based planning in a model-free way (like MuZero) except for the reward function and then include the reward function explicitly.

Thoughts on implementing corrigible robust alignment

I really enjoyed this post; thanks for writing this! Some comments:

the AGI uses its understanding of humans to try to figure out what a human would do in a hypothetical scenario.

I think that supervised amplification can also sort of be thought as falling into this category, in that you often want your model to be internally modeling what an HCH would do in a hypothetical scenario. Of course, if you're training a model using supervised amplification, you might not actually get a model which is in fact just trying to guess what an HCH would do, but is instead doing something more strategic and/or deceptive, though in many cases the goal at least is to try and get something that's just trying to approximate HCH.

So that suggests an approach of pre-loading this template database with a hardcoded model of a human, complete with moods, beliefs, and so on.

This is actually quite similar to an approach that Nevan Witchers at Google is working on, which is to hardcode a differentiable model of the reward function as a component in your network when doing RL. The idea there being very similar, which is to prevent the model from learning a proxy by giving it direct access to the actual structure of the reward function rather than just learning based on rewards that were observed during training. The two major difficulties I see with this style of approach, however, are that 1) it requires you to have an explicit differentiable model of the reward function and 2) it still requires the model to learn the policy and value (that is, how much future discounted reward the model expects to get using its current policy starting from some state) functions which could still allow for the introduction of misaligned proxies.

Bottle Caps Aren't Optimisers

Daniel Filan's bottle cap example was featured prominently in "Risks from Learned Optimization" for good reason. I think it is a really clear and useful example of why you might want to care about the internals of an optimization algorithm and not just its behavior, and helped motivate that framing in the "Risks from Learned Optimization" paper.

Load More