Completed an undergrad in CS and Math at Columbia, where I helped run Columbia Effective Altruism and Columbia AI Alignment Club (CAIAC). I'm pursuing a career in technical AI alignment research (probably).

Wiki Contributions



Here’s our current best guess at how the type signature of subproblems differs from e.g. an outermost objective. You know how, when you say your goal is to “buy some yoghurt”, there’s a bunch of implicit additional objectives like “don’t spend all your savings”, “don’t turn Japan into computronium”, “don’t die”, etc? Those implicit objectives are about respecting modularity; they’re a defining part of a “gap in a partial plan”. An “outermost objective” doesn’t have those implicit extra constraints, and is therefore of a fundamentally different type from subproblems.

Most of the things you think of day-to-day as “problems” are, cognitively, subproblems.

Do you have a starting point for formalizing this? It sounds like subproblems are roughly proxies that could be Goodharted if (common sense) background goals aren't respected. Maybe a candidate starting point for formalizing subproblems, relative to an outermost objective, is "utility functions that closely match the outermost objective in a narrow domain"?


Lots of interesting thoughts, thanks for sharing!

You seem to have an unconventional view about death informed by your metaphysics (suggested by your responses to 56, 89, and 96), but I don’t fully see what it is. Can you elaborate?


Basic idea of 85 is that we generally agree there have been moral catastrophes in the past, such as widespread slavery. Are there ongoing moral catastrophes? I think factory farming is a pretty obvious one. There's a philosophy paper called "The Possibility of an Ongoing Moral Catastrophe" that gives more context.


How is there more than one solution manifold? If a solution manifold is a behavior manifold which corresponds to a global minimum train loss, and we're looking at an overparameterized regime, then isn't there only one solution manifold, which corresponds to achieving zero train loss?