Review

A concept I got from Anna Salamon, and maybe also from my coworker Jacob Lagerros, although I haven't run this by either of them and they may not endorse my phrasing. I think I probably run with it in a different direction than Anna would endorse.

Epistemic effort: thought about it for a few hours. Basic concept makes sense to me. If it doesn't make sense to you, let me and maybe we can talk-cruxes.

Say you have a problem you don't know how to solve, and seems computationally intractable to strategize about. There are a few ways to go about solving it anyway. But one idea is to follow a heuristic where you look for Resource X, where X has two properties:

  1. X compounds over time
  2. If you have a lot of X, you win.

Say you're playing the game "Chess", or you're playing the game "The Stock Market." 

In the stock market, money compounds fairly straightforwardly. You invest money, it results in you getting more money. If you're ultimate goal is either to have a lot of money, or spend a lot of money on a thing, then you win. Hurray. In Chess, you can't calculate all the moves in advance, but, you can try to gain control of the center of the board. “Control over the center” tends to help you gain more control over the center, and if you have enough of it, you win. 

Why does it matter that it "compound?". In this scenario you've decided you're trying to win by applying a lot of resource X. If you're starting out without much X, you need some kind of story for how you're going to get enough. If you need 1,000,000 metaphorical units of X within 10 years, and you only have 10 (or, zero).... well, maybe you can linearly gain X at a rate of 100,000 per year. Maybe you could find a strategy that doesn't get any X at all and then suddenly gets all 1,000,000 at the last second. But in practice, if you're starting with a little bit of something, getting a lot of it tends to involve some compounding mechanism.

If you're creating a startup, users can help you get more users. They spread awareness via worth of mouth which grows over time, and in some cases by creating network effects that make other users find your product more valuable. They also, of course, can get you more money, which you can invest in more employees, infrastructure or advertising. 

My coworker Jacob used to ask a similar question, re: strategizing at Lightcone Infrastructure. We're trying to provide a lot of value to the world. We currently seem to be providing... some. We're a nonprofit, not a business. But we are trying to deliver a billion dollars worth of value to the world. If we're not currently doing that, we need to have a concrete story for how our current strategy gets us there, or we need to switch strategies. 

If our current strategy isn't generating millions of dollars worth of value per month, either we should be able to explain clearly where "the compounding piece" of our strategy is, or our strategy probably isn't very good.

Note: in the chess example, "control over the center" is a particularly non-obvious resource. If you're staring at a chess board, there's lots of actions you could hypothetically take, and a lot of abstractions you could possibly make up to help you reason about them. "Control over the center of the board" is an abstraction that requires accumulated experience to discover, and requires some minimum threshold of experience to even understand. (Note: I myself do not really understand how to "control the center", but I repeated this story to a few chess players and they nodded along as if it made sense).  

So, bear in mind that Resource X might be something subtle.

Resource X for "Solve AGI"

Say the problem you're trying to solve is "AGI is gonna kill everyone in 10-40 years", and you're in a preparadigmatic field that's confused about what it even means to make progress. What sort of compounding Xs might you apply to this problem?

Do "Committed Effective Altruists" compound? Or "EA Affiliated Dollars?"

A lot of effective altruists often seem to be doing something like "find resource X", where X is "number of committed EAs" or "EA affiliated dollars." These do compound. But they also don't seem like having a lot of them is sufficient. And, in fact, naively compounding them might be anti-helpful. Those committed EAs need something to do, but you don't really know what to do or how to evaluate it. So by default they probably turn into a moral maze where everyone is goodharting at best, or pointlessly/sociopathically climbing the hierarchy at worst. 

So compounding "number of committed EAs" doesn't work, at least in isolation. What about EA affiliated dollars? The problem is we just don't actually know what to do with the dollars. Money is abundant, knowledge is our actual bottleneck. Without a plan, the easiest thing to do is turn money into either hiring people at orgs that we don't quite know what to do with (again, risking creating moral mazes for no reason), or giving grants to independent researchers who sort of flounder around hoping to deconfuse themselves without good feedback, eventually get depressed and burn out. We also meanwhile attract grifters who take advantage of the fact that we don't know how to evaluate projects. This further pollutes our signal/noise ratio and reduces our ability to make use of the knowledge. 

We don't have enough knowledge.

Okay. Does knowledge compound?

Kinda. If you learn some facts about some snails, that might give you some hints about the general structure of snails, or animals generally. Maybe that makes it easier to figure out more stuff about snails. A single person who builds up a web of knowledge on a topic can find connections between those topics, which compounds somewhat. A group of people who are able to share a web of knowledge can also maybe find more shared connections. But you quickly run into a bottleneck where finding all the connections is hard, because there's too much information to sift through.

"Gaining a unit of knowledge" is a somewhat confused metaphor, but my guess is "it compounds a little bit, but not very fast." 

I think academia attempts to build a compounding knowledge machine, and it does improve on raw "knowledge gained." It has a system wherein you're supposed to summarize your work, cite related work, and spend some time teaching the next generation. Good papers eventually work their way into textbooks. All of these practices help knowledge build upon itself. This works better for some fields than others. But in many cases we're still struggling with Research Debt, and various dysfunctions that make it hard for academia to work at scale. Science seems like it's slowing down.

Two answers

After thinking about it for a bit, here are two resources I think are contenders:

  • Good (meta)cognitive processes entangled with the territory. 
  • Coordination capital pointed at the right goal.

Like "control over the center" in chess, I think these are both subtle. It'd be easy to miscommunicate them into a simpler form that didn't do the job right. I'm not 100% sure I've got them right. Insofar as it turns out my operationalization either doesn't compound, or doesn't solve AGI risk, keep your eye on the ball and iterate on the operationalization until it seems on track to accomplish the thing. 

These two resources can map onto the "Build Good/Safe AGI that solves our problems for us" and "Coordinate to not build AGI until we can actually build safe AGI" victory conditions. And a mixture of both resources can be used to accomplish a variety of mixed strategies.

Good (meta)cognitive processes entangled with the territory

I specified “good metacognitive processes” as opposed to “good ideas” because there needs to be some mechanism by which the ideas get better over time, and hone in on the right subspace, for it to exponentially compound fast enough while cleaving to the right target. It's trickier with "solving AGI alignment" because a lot of advances are dual-use, and some strategies for "generically accelerate all intellectual progress" may be net-negative.

I list "entangled with the territory" because I think there's a failure mode of being too wrapped up in your thinking. Figuring out some way to bump your ideas into reality and see what bumps back feels pretty important.

Examples of compounding cognitive processes:

  • Individual researchers notice particular thought-processes that are particularly productive or unproductive, and improve their research process.
  • Researchers train each other, or "pair program" in ways that help them learn from each other's good cognitive processes. Or write up docs about their research taste.
  • Collide thinkers together who think in somewhat-different ways, so they can bounce ideas off each other and come up with new iterations. More thinkers and more ideas means more opportunity for creative collisions.
  • Interesting and innovative thinkers can attract more interesting and innovative thinkers who are excited to build in a "scene."
  • Build UI that simplifies the scholarship process.
  • Have a mixture of deep researchers, distillitators, and teachers who hit different points on the Pareto frontier of "good thought/communication", and in aggregate help push us as fast as possible to the secrets of the universe.
  • Have skilled researchers spend some time noticing the kinds of mistakes that junior researchers are making, and figure out how to teach them not to. 
  • Practice the art of noticing Goodhart when it's happening to you.

Note that some of these don't automatically compound. i.e. a single researcher improving their thought processes once might not actually do so a second and third time. But I think there are ways of constructing ecosystems such that they compound on average.

All of this isn't that novel. Academia already has instances of most of this. The innovations of the rationalsphere over academia are something like: 

  1. Invest in cognitive reflection, so you can learn to think better, and so you can teach others how to learn to think better.
  2. Be more strategic and goal directed about your research. Don't lose sight of your real purpose
  3. Optimize more for smaller numbers of researchers making a lot of progress than for global legibility/defensibility. Have more "Move fast and break things" attitude.
  4. Have a firm understanding of probabilistic reasoning and how to weigh evidence, which helps keep you sane while you're moving fast and breaking things.

I think #2 and #3 both come with some tradeoffs[1], but are pretty essential for actually making progress fast enough. (I do think #3 ideally comes in combination with 

All of these are essentially arguing for "more of the same strategy LessWrong was already doing, just, faster and better." But, the frame I want to bring to it here are "look for ways to make our cognitive processes compound faster", and "look for ways to make sure that the tree of knowledge we're growing actually points towards 'stop x-risk'" rather than Goodharted versions of that. 

Coordination Capital

I'm defining coordination capital similarly to the dictionary definition of human capital: The collective skills, knowledge, relationships, incentive structures or other intangible assets of a group of people, that can enable strategies that depend on multiple actors taking actions in concert.

Coordination capital can be relevant to "coordination-centric strategies" (like "get firms to stop publishing AI advances", or "get all the humans to stop building machines that will kill everyone"). It's also relevant to intellectual advances – a coordinated intellectual institution can potentially train more people, organize more/better workshops, or specialize more efficiently.

Coordination capital can compound, although it doesn't automatically (maybe similar to "knowledge."). I like Evan Hubinger's recent post on how AI Coordination needs clear wins, which notes:

In the theory of political capital, it is a fairly well-established fact that “Everybody Loves a Winner.” That is: the more you succeed at leveraging your influence to get things done, the more influence you get in return. This phenomenon is most thoroughly studied in the context of the ability of U.S. presidents’ to get their agendas through Congress—contrary to a naive model that might predict that legislative success uses up a president’s influence, what is actually found is the opposite: legislative success engenders future legislative success, greater presidential approval, and long-term gains for the president’s party.

I think many people who think about the mechanics of leveraging influence don’t really understand this phenomenon and conceptualize their influence as a finite resource to be saved up over time so it can all be spent down when it matters most. But I think that is just not how it works: if people see you successfully leveraging influence to change things, you become seen as a person who has influence, has the ability to change things, can get things done, etc. in a way that gives you more influence in the future, not less.

Here are some ways coordination capital can compound:

  •  Individual people or organizations gain a reputation for getting things done, so people trust the stag hunts that they initiate.
  • You promote a cluster of strategies, until those strategies get a reputation for being normal and working out. Other people (without being directly part of your org or network), pick up on the fact that those strategies are trendy, which makes it more likely people do them more often. 
  • Institutions or technology gets built that facilitates more coordination, which in turn creates momentum for more/better institutions/technology. 

But some problems come quickly to mind when I think about coordination capital:

  • Some coordination-momentum works at cross purposes. If one purpose is trying to get everyone to make government regulations happen, and another person is trying to ensure coordination happens between labs with minimal government involvement, they may step on each other's toes. 
  • Coordinated action is more likely to produce "people-who-are-losing-out", who might become your enemy, and rise to strategically oppose you.
  • I expect scaling coordination-institutions to be more moral-maze prone than scaling intellectual institutions. You're selecting more for people who are politically oriented, and training them to see situations through a political lens. 

"Pointed at the right target"

It matters that both intellectual output and coordination capital be pointed at the right target. To actually get "a million units of cognition" and "a million units of coordination", you really really need to avoid goodhart. I think it's harder when building coordination capital, because the natural feedback loops of coordination don't (necessarily) route through cognitive reflection, and because the dynamics are more adversarial. A lot of people's first thoughts re: "build coordination capital" is "go to DC and network such that you can be involved in policy", and then you find yourself thinking thoughts like "what if the Russians/Chinese get AI first?" which aren't actually very useful thoughts, but you may not have a feedback loop that corrects them.

My current overall best strategic guess is that we need realistically need both metacognitive reflection, and coordination, and flexibility on how to apply them. I hope to write some more posts that flesh out my thinking here in more detail.

Further Thoughts?

This post is a rough sketch. It's currently shaping a lot of my strategic thought, but in a lightly held way. I'm interested in alternative frames for reaching victory, and alternative "Resource X"s within this frame. 

 

  1. ^

    Re "Be Goal Oriented in Your Research": People I respect keep warning me about being too goal oriented, and failing to see the forest for the trees. Trying too hard may cause you to lose the forest for the trees, or fail to have the kind of curiosity that's needed to really think the most important thoughts, or end up goodharting, etc. I'm not sure how to navigate this tensions. It sure seems like both sides are important. It seems kinda obviously good to reflect on those tradeoffs, find the healthy middle, and experiment with third alternatives that get the best of both worlds. (That process seems like the kind of meta-reflection this section is all about)

    Re "Move Fast and Break Things": I think people often get annoyed at the rationalsphere for being more on the 'move fast and break things' end of the philosophical spectrum. I think it's doing useful work. but I think a reasonable case can be made that you at least need multiple processes going, some of which are optimized for robust legibility.

New Comment
6 comments, sorted by Click to highlight new comments since:

Alternative frame: I've been poking at the idea of quantum resource theories periodically, literally on the strength of a certain word-similarity between quantum stuff and alignment stuff.

The root inspiration for this comes from Scott Aaronson's Quantum Computing Since Democritus, specifically two things: one, the "certain generalization of probability" lens pretty directly liberates me to throw QM ideas at just about anything, the same way I might with regular probability; two, the introduction of negative probability and through that "cancelling out" possibilities is super cool and feels like a useful way to think about certain problems.

So, babbling: can we loot resource theories from quantum thermodynamics as a way to reason more precisely about the constraints we want for alignment?

A Quanta article animating the thought: https://www.quantamagazine.org/physicists-trace-the-rise-in-entropy-to-quantum-information-20220526/

Direct quote -

“A resource theory is a simple model for any situation in which the actions you can perform and the systems you can access are restricted for some reason,” said the physicist Nicole Yunger Halpern of the National Institutes of Standards and Technology.

This sounds like a good match for alignment-ish problems on the face of it. In the alignment case the some reason for the restrictions is so it doesn't kill us. There are two elements to the resource theory: firstly a set of free operations and states we assume can be gotten to at no cost; secondly valuable resources like entanglement, purity, and asymmetry which are states which can be achieved at a cost (and therefore are limited). The gist is, what if we swapped out words like entanglement and purity with words like corrigibility and interpretability?

quantum probability is a very specific thing; I agree that it's an incredibly interesting metaphor, and I also think there's something to be had there, but I'd caution against applying it too literally without care. the kinds of interference patterns at quantum scale are in fact qualitatively different from the ones at larger spatial scales under most conditions.

neural networks are not usually complex valued, for starters. and not because it hasn't been tried.

[-][anonymous]10

Which areas of neural network would fit under the complex number paradigm?

anything processing complex valued phenomena or modeling reality in high enough resolution that the network should learn small-scale complex valued patterns; so, chemistry, fluid waves eg sound, electricity, etc. some very solid results: https://arxivxplorer.com/?query=complex+valued+neural+networks

Thank you for this post. 

The only thing about which I want to encourage more reflection is Have more "Move fast and break things" attitude [admittedly there is a bit of context I'm not edit-pasting here, but you do seem to favour this approach to a fair extent]. 

My gentle nudge here is based on my sense that 'moving fast and breaking things' can have pretty bad consequences if you're (collaboratively) exploring AI safety research tracks that recklessly put into circulation knowledge that can be used to increase capabilities over/without safety 'points'. 

This seems like a useful special case of "conditions-consequences" reasoning. I wonder whether

  • Avoiding meddling is a useful subskill in this context (probably not)
  • There is another useful special case