johnswentworth

Sequences

Framing Practicum
Gears Which Turn The World
Abstraction
Gears of Aging
Model Comparison

Wiki Contributions

Comments

Finding the Central Limit Theorem in Bayes' rule

[Don't feel like you have to answer these - they're more just me following up on thoughts I got from your comment]

I accept your affordance, and thank you, this will make me more likely to comment on your posts in the future.

Hypotheses about Finding Knowledge and One-Shot Causal Entanglements

Meta commentary: this post is a great example of how to do the very earliest stages of conceptual research. Well done.

Finding the Central Limit Theorem in Bayes' rule

You've got a solid talent for math research.

Your reasoning here is basically correct; this is why Laplace' approximation typically works very well on large datasets. One big catch is that it requires the number of data points be large relative to the dimension of the variables. The real world is decidedly high dimensional, so in practice the conditions for Gausianity usually happen when we pick some small set of "features" to focus on and then get a bunch of data on those (e.g. as is typically done in academic statistics).

There's also another more subtle catch here: in e.g. a large causal model, once we have a decent number of variables, we often have all the info we're going to get about some value of interest, and later updates add basically-zero information. Depending on how that plays out, it could mess up the Gaussianity convergence.

Christiano, Cotra, and Yudkowsky on AI progress

My understanding is that Sputnik was a big discontinuous jump in "distance which a payload (i.e. nuclear bomb) can be delivered" (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.

Christiano, Cotra, and Yudkowsky on AI progress

My version of it (which may or may not be Paul's version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).

This is super helpful, thanks. Good explanation.

With this formulation of the "continuous view", I can immediately think of places where I'd bet against it. The first which springs to mind is aging: I'd bet that we'll see a discontinuous jump in achievable lifespan of mice. The gears here are nicely analogous to AGI too: I expect that there's a "common core" (or shared cause) underlying all the major diseases of aging, and fixing that core issue will fix all of them at once, in much the same way that figuring out the "core" of intelligence will lead to a big discontinuous jump in AI capabilities. I can also point to current empirical evidence for the existence of a common core in aging, which might suggest analogous types of evidence to look at in the intelligence context.

Thinking about other analogous places... presumably we saw a discontinuous jump in flight range when Sputnik entered orbit. That one seems extremely closely analogous to AGI. There it's less about the "common core" thing, and more about crossing some critical threshold. Nuclear weapons and superconductors both stand out a-priori as places where we'd expect a critical-threshold-related discontinuity, though I don't think people were optimizing hard enough in superconductor-esque directions for the continuous view to make a strong prediction there (at least for the original discovery of superconductors).

The bonds of family and community: Poverty and cruelty among Russian peasants in the late 19th century

It sounds like you're thinking about "adaptivity" in terms of what's good for the group, not the individual. In a malthusian equilibrium, the world is largely zero-sum, so uprooting the trees of slightly more well-off neighbors could plausibly increase the odds of survival for one's own offspring. It's the next best thing to eating the neighbor's babies, as far as evolutionary fitness goes. And over time, it's the families with the most individual fitness which will dominate the constituency of the group.

(On the other hand, the fact that there was space to plant more apple trees indicates that the world was not perfectly zero sum; there were nonzero gains to be had from tree-planting. But the broader idea still applies: the culture can be a Nash equilibrium without being particularly good at the group level.)

Why Study Physics?

That is indeed a meme. Though if the physicists' attempts consistently failed, then biologists would not joke about physicists being like gunslingers.

How To Get Into Independent Research On Alignment/Agency

My main modification to that plan would be "writing up your process is more important than writing up your results"; I think that makes a false negative much less likely.

8 weeks seems like it's on the short end to do anything at all, especially considering that there will be some ramp-up time. A lot of that will just be making your background frames/approach more legible. I guess viability depends on exactly what you want to test:

  • If your goal is write up your background models and strategy well enough to see if grantmakers want to fund your work based on them, 8 weeks is probably sufficient
  • If your goal is to see whether you have any large insights or make any significant progress, that usually happens for me on a timescale of ~3 months

It sounds like you want to do something closer to the latter, so 12-16 weeks is probably more appropriate?

Frame Control

I like the rule, and if it's possible to come up with engagement guidelines that have asymmetrical results for frame control I would really like that.

Some thoughts, based on one particular framing of the problem...

Claim/frame: in general, the most robust defense against abuse is to foster independence in the corresponding domain. The most robust defense against emotional abuse is to foster emotional independence, the most robust defense against financial abuse is to foster financial independence, etc. The reasoning is that, if I am in not independent in some domain, then I am necessarily dependent on someone else in that domain, and any kind of dependence always creates an opportunity for abuse.

Applying that idea to frame control: the most robust defense is to build my own frames, pay attention to them, notice when they don't match the frame someone else is using, etc. It's "frame independence": I independently maintain my own frames, and notice when other people set up frames which clash with them.

But independence is not always a viable option in practice, and then we have to fall back on next-best solutions. The main class of next-best solutions I know of involve having a wide variety of people to depend on and freedom to move between them - i.e. avoiding dependence on a monopoly provider.

Applying that next-best answer to frame control: when we can't rely on "frame independence", we want to have a variety of people around providing different frames, so that it's easy to move between them. Social norms to support people offering alternative frames (for instance, making "I disagree with the frame" a normal conversational move) therefore provide value not only by letting me express my own frame, but giving me other peoples' frames to choose from when I'm not ready to provide my own. Actively trying to include people who tend to have different frames should also help with this.

Frame Control

I think it would be helpful for the culture to be more open to persistent long-running disagreements that no one is trying to resolve.

 +1 to this. I have an intuition that the unwillingness-to-let-disagreements-stand leads to a bunch of problems in subtle ways, including some of the things you point out here, but haven't sat down to think through what's going on there.

Load More