Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

“Weak” cognitive tools are clearly a thing, and are useful. Google search is a fine example. There are plenty of flavors of “weak AI” which are potentially helpful for alignment research in a similar way to google search.

In principle, I think there’s room for reasonably-large boosts to alignment research from such tools[1]. Alas, the very large majority of people who I hear intend to build such tools do not have the right skills/background to do so (at least not for the high-value versions of the tools). Worse, I expect that most people who aim to build such tools are trying to avoid the sort of work they would need to do to build the relevant skills/background.

Analogy: A Startup Founder’s Domain Expertise (Or Lack Thereof)

Imagine a startup building tools meant to help biologists during their day-to-day work in the wetlab. I expect domain expertise to matter a lot here: I would guess that if none of the founders have ample personal experience doing research work in a wetlab, the chance of this startup building an actually-highly-useful wetlab product drops by about an order of magnitude. Our hypothetical startup might still “succeed” some other way, e.g. by pivoting to something else, or by being good at pitching their shitty product to managers who make purchasing decisions without actually using the product, or by building something very marginally useful and pricing it very cheaply. But their chance of building a wetlab product which actually provides a lot of value is pretty slim.

One might reply: but couldn’t hypothetical founders without domain experience do things to improve their chances? For instance, they could do a bunch of user studies on biologists working in wetlabs, and they could deploy the whole arsenal of UX study techniques intended to distinguish things-users-say-matter from things-which-actually-matter-to-users.

… and my response is that I was already assuming our hypothetical founders do that sort of thing. If the founders don’t have much domain experience themselves, and don’t do basic things like lots of user studies, then I’d guess their chance of building an actually-high-value wetlab product drops by two or three orders of magnitude, not just one order of magnitude. At that point it’s entirely plausible that we’d have to go through thousands of times more startups to find one that succeeded at building a high-value product.

How is this analogous to plans to build AI tools for alignment research?

So we want to build products (specifically AI products) to boost alignment research. The products need to help solve the hard parts of aligning AI, not just easy things where we can clearly see what’s going on and iterate on it, not just problems which are readily legible or conceptually straightforward. Think problems like e.g. sharp left turn, deception, getting what we measure, or at a deeper level the problem of fully updated deference, the pointers problem, value drift under self-modification, or ontology identification. And the tools need to help align strong AI; the sort of hacky tricks which fall apart under a few bits of optimization pressure are basically irrelevant at that point. (Otherwise the relevant conversation to have is not about how the tools will be useful, but about how whatever thing the tools are building will be useful.)

The problem for most people who aim to work on AI tools for alignment research is that they have approximately-zero experience working on those sorts of problems. Indeed, as far as I can tell, people usually turn to tool-building as a way to avoid working on the hard problems.

I expect failure modes here to mostly look like solving the wrong problems, i.e. not actually addressing bottlenecks. Here are some concrete examples, ordered by how well the tool-builders understand what the bottlenecks are (though even the last still definitely constitutes a failure):

  • Tool-builder: “We made a wrapper for an LLM so you can use it to babble random ideas!” 
    Me: “If ever there was a thing we’re not bottlenecked on, it’s random ideas. Also see Lessons from High-Dimensional Optimization.”
  • Tool-builder: “We made an IDE with a bunch of handy AI integration!”
    Me: “Cool, that’s at least useful. I’m not very bottlenecked on coding - coding up an experiment takes hours to days, while figuring out what experiment would actually be useful takes weeks to months - but I’ll definitely check out this IDE.”
  • Tool-builder: “We made an AI theorem-prover!”
    Me: “Sometimes handy, but most of my work is figuring out the right operationalizations (e.g. mathematical definitions or quantitative metrics) to use. A theorem-prover can tell me that my operationalization won’t let me prove the things I want to prove, but won’t tell me how close I am, so limited usefulness.”
  • Tool-builder: “We heard you’re bottlenecked on operationalizing intuitions, so we made an operationalizationator! Give it your intuitive story, and it will spit out a precise operationalization.”
    Me: *looks at some examples* “These operationalizations are totally ad-hoc. Whoever put together the fine-tuning dataset didn’t have any idea what a robust operationalization looks like, did they?”

Note that, by the time we get to the last example on that list, domain experience is very much required to fix the problem, since the missing piece is relatively illegible.

Thus, my current main advice for people hoping to build AI tools for boosting alignment research: go work on the object-level research you’re trying to boost for a while. Once you have a decent amount of domain expertise, once you have made any progress at all (and therefore have any first-hand idea of what kinds of things even produce progress), then you can maybe shift to the meta-level[2].

Metascience

Another way to come at the problem: rather than focus on tool-building for alignment research specifically, one might focus on tool-building for preparadigmatic science in general. And then the tool-builder’s problem is to understand how preparadigmatic science works well in general - i.e. metascience.

Of course, as with the advice in the previous section, the very first thing which a would-be metascientist of preparadigmatic fields should do is go get some object-level experience in a preparadigmatic field. Or, preferably, a few different preparadigmatic fields.

Then, the metascientist needs to identify consistent research bottlenecks across those fields, and figure out tools to address those bottlenecks.

How hard is this problem? Well, for comparison… I expect that “produce tools to 5-10x the rate of progress on core alignment research via a metascience-style approach” involves most of the same subproblems as “robustly and scalably produce new very-top-tier researchers”. And when I say “very-top-tier” here, I’m thinking e.g. pre-WWII nobel winners. (Pre-WWII nobel winners are probably more than 5-10x most of today’s alignment researchers, but tools are unlikely to close that whole gap themselves, or even most of the gap.) So, think about how hard it would be to e.g. produce 1000 pre-WWII-nobel-winner level researchers. The kinds of subproblems involved are about things like e.g. noticing the load-bearing illegible skills which make those pre-WWII nobel winners as good at research as they are, making those skills legible, and replicating them. That’s the same flavor of challenge required to produce tools which can 5-10x researcher productivity.

So it’s a hard problem. I am, of course, entirely in favor of people tackling hard problems.

For an example of someone going down this path in what looks to me like a relatively competent way, I'd point to some of Conjecture's work. (I don't know of a current write-up, but I had a conversation with Gabe about it which will hopefully be published within the next few weeks; I'll link it from here once it's up.) I do think the folks at Conjecture underestimate the difficulty of the path they're on, but they at least understand what the path is and are asking the right kind of questions.

What Tools I Would Build

To wrap up, I'll talk a bit about my current best guesses about what "high-value cognitive tools" would look like. Bear in mind that I think these guesses are probably wrong in lots of ways. At a meta level, cognitive tool-building is very much the sort of work where you should pick one or a handful of people to build the prototype for, focus on making those specific people much more productive, and get a fast feedback loop going that way. That's how wrong initial guesses turn into better later guesses.

Anyway, current best guesses.

I mentioned that the subproblems of designing high-value cognitive tools overlap heavily with the subproblems of creating new top-tier researchers. Over the past ~2 years I've been working on training people, generally trying to identify the mostly-illegible skills which make researchers actually useful and figure out how to install those skills. One key pattern I've noticed is that a lot of the key skills are "things people track in their head" - e.g. picturing an example while working through some math, or tracking the expected mental state of a hypothetical reader while writing, or tracking which constraints are most important while working through a complicated problem. Such skills have low legibility because they're mostly externally invisible - they're in an expert's head, after all. But once we know to ask about what someone is tracking in their head, it's often pretty easy and high-value for the expert to explain it.

... and once I'm paying attention to what-things-I'm-tracking-in-my-head, I also notice how valuable it is to externalize that tracking. If the tracked-information is represented somewhere outside my head, then (a) it frees up a lot of working memory and lets me track more things, and (b) it makes it much easier to communicate what I'm thinking to others.

So if I were building research tools right now, the first thing I'd try is ways to externalize the things strong researchers track in their heads. For instance:

  • Imagine a tool in which I write out mathematical equations on the left side, and an AI produces prototypical examples, visuals, or stories on the right, similar to what a human mathematician might do if we were to ask what the mathematician were picturing when looking at the math. (Presumably the interface would need a few iterations to figure out a good way to adjust the AI's visualization to better match the user's.)
  • Imagine a similar tool in which I write on the left, and on the right an AI produces pictures of "what it's imagining when reading the text". Or predicted emotional reactions to the text, or engagement level, or objections, etc.
  • Debugger functionality in some IDEs shows variable-values next to the variables in the code. Imagine that, except with more intelligent efforts to provide useful summary information about the variable-values. E.g. instead of showing all the values in a big tensor, it might show the dimensions. Or it might show Fermi estimates of runtime of different chunks of the code.
  • Similarly, in an environment for writing mathematics, we could imagine automated annotation with asymptotic behavior, units, or example values. Or a sidebar with an auto-generated stack trace showing how the current piece connects to everything else I'm working on.
  1. ^

    A side problem which I do not think is the main problem for “AI tools for AI alignment” approaches: there is a limit to how much of a research productivity multiplier we can get from google-search-style tools. Google search is helpful, but it’s not a 100x on research productivity (as evidenced by the lack of a 100x jump in research productivity shortly after Google came along). Fundamentally, a key part of what makes such tools “tools” is that most of the key load-bearing cognition still “routes through” a human user; thus the limit on how much of a productivity boost they could yield. But I do find a 2x boost plausible, or maybe 5-10x on the optimistic end. The more-optimistic possibilities in that space would be a pretty big deal.

    (Bear in mind here that the relevant metric is productivity boost on rate-limiting steps of research into hard alignment problems. It’s probably easier to get a 5-10x boost to e.g. general writing or coding speed, but I don’t think those are rate-limiting for progress on the core hard problems.)

  2. ^

    A sometimes-viable alternative is to find a cofounder with the relevant domain experience. Unfortunately the alignment field is SUPER bottlenecked right now on people with experience working on the hard central problems; the number of people with more than a couple years experience is in the low dozens at most. Also, the cofounder does need to properly cofound; a mere advisor with domain experience is not a very good substitute. So a low time commitment from the domain expert probably won’t cut it.

New Comment
17 comments, sorted by Click to highlight new comments since: Today at 2:59 PM

Thanks for writing this post, John! I'll comment since this is one of the directions I am exploring (released an alignment text dataset, published a survey for feedback on tools for alignment research, and have been ruminating on these ideas for a while).

Thus, my current main advice for people hoping to build AI tools for boosting alignment research: go work on the object-level research you’re trying to boost for a while. Once you have a decent amount of domain expertise, once you have made any progress at all (and therefore have any first-hand idea of what kinds of things even produce progress), then you can maybe shift to the meta-level[2].

I mostly agree with this, which is why I personally took 6 months away from this approach and tried to develop my domain expertise during my time at MATS. I don't think this is enough time, unfortunately (so I might spend more time on object-level work after my current 3-month grant). However, I plan to continue to do object-level research and build tools that are informed by my own bottlenecks and others. There are already many things I think I could build that would accelerate my work and possibly accelerate the work of others.

I see my current approach as creating a feedback loop where both things that take up my time inform each other (so I at least have N>0 users). I expect to build the things that seem the most useful for now, then re-evaluate based on feedback (is this accelerating alignment greatly or not at all?) and then decide whether I should focus all my time on object-level research again. Though I expect at this point that I could direct some software engineers to build out the things I have in mind at the same time.

What I've found that might be valuable for thinking about these tools is to backcast how I might see myself or others coming up with a solution to alignment and then focusing on tools that would primarily accelerate research into actually being crucially important for solving the problem rather than optimizing for something else. I think dedicating time to object-level work has been helpful for this.

At a meta level, cognitive tool-building is very much the sort of work where you should pick one or a handful of people to build the prototype for, focus on making those specific people much more productive, and get a fast feedback loop going that way. That's how wrong initial guesses turn into better later guesses.

Agreed.

If the tracked-information is represented somewhere outside my head, then (a) it frees up a lot of working memory and lets me track more things, and (b) it makes it much easier to communicate what I'm thinking to others.

Yes! That is precisely what I have in mind when thinking about building tools. What can I build that sufficiently frees up working memory / cognitive load so that the researcher can use that extra space for thinking more deeply about other things?

A side problem which I do not think is the main problem for “AI tools for AI alignment” approaches: there is a limit to how much of a research productivity multiplier we can get from google-search-style tools. Google search is helpful, but it’s not a 100x on research productivity (as evidenced by the lack of a 100x jump in research productivity shortly after Google came along). Fundamentally, a key part of what makes such tools “tools” is that most of the key load-bearing cognition still “routes through” a human user; thus the limit on how much of a productivity boost they could yield. But I do find a 2x boost plausible, or maybe 5-10x on the optimistic end. The more-optimistic possibilities in that space would be a pretty big deal.

I aim for a minimum 10x speed up when thinking about this general approach (or at least leads to some individual, specific breakthroughs in alignment). I'm still grappling with when to drop this direction if it is not very fruitful. I'm trying to be conscious of what I think weak AI won't be able to solve. Either way, I hope to bring on software engineers / builders who can help make progress on some of my ideas (some have already).

I don't have coherent thoughts right now. But here are some misc vaguely related ideasposts I found while browsing around for inspiration from some of the commentary in this one I'm commenting on. This is me being a "related stuff online" search drone again (the results are my picks from a metaphor search of a paragraph from the post), so if this comment gets too far up the replies I may strong downvote myself. But if any of these things catch others' eyes, please comment your takeaways, especially if there are action items for others.

https://thesephist.com/posts/notation/ is an interesting related essay. intro:

2 January 2022 - 32 mins

I spent the last month wondering and investigating how we might design better workflows for creative work that meld the best of human intuition and machine intelligence. I think a promising path is in the design of notation. More explicitly, I believe inventing better notations can contribute far more than automated tools to our effective intelligence in understanding ourselves, the world, and our place in it

http://glench.com/LegibleMathematics/ is another interesting related essay. an intro paragraph: 

[...] to me the problem isn't just that the first examples are badly formatted (missing spaces between operators, for example), or even that the environment isn't rendering those equations using the traditional notation taught in grade school. The main problem is that the programming environment doesn't give us any way of understanding what the symbols are doing.

https://a9.io/inquiry/notes/1rCLzzZwVdcOICn9eIrII/ is yet another related essay:

3blue1brown's videos about linear algebra are beautiful and likely improve mental imagery when thinking causally about things like linear transformations. However, Gilbert Strang's notation-heavy lectures on the same subject are presented with the same interface learners use: chalkboards. When we have to actually factorize a matrix, the only way we can do that in our current media environment is by writing things down

https://penrose.cs.cmu.edu/ seems interesting:

Penrose

Create beautiful diagrams

just by typing math notation in plain text.

Declarative

Penrose is a platform that enables people to create beautiful diagrams just by typing mathematical notation in plain text.

Beautiful

The goal is to make it easy for non-experts to create and explore high-quality diagrams and provide deeper insight into challenging technical concepts.

Universal

We aim to democratize the process of creating visual intuition.

https://cognitivemedium.com/ has several related essays:

Augmenting Long-term Memory: How to build personal memory systems.

Using Artificial Intelligence to Augment Human Intelligence (with Shan Carter): By creating user interfaces which let us work with the representations inside machine learning models, we can give people new tools for reasoning.

Magic Paper: What if our mathematical notation had been invented after computers? What if it was gestural, and worked at the speed of thought?

Thought as a Technology: How sufficiently imaginative interface designers can invent new forms of thought.

Toward an Exploratory Medium for Mathematics

Reinventing Explanation: Using new media to create new types of explanation.

https://scholarphi.org/ is a research project that attempted something related for aiding paper skimming. It did not use ML. it doesn't look ready to generalize. but it may be very useful for improving on basic multi-pass skimming. it apparently inspired semanticscholar's semantic reader, which does not to my knowledge have this additional functionality?

The goal of the ScholarPhi project was to improve the reading of scientific papers by helping readers see the meanings of mathematical symbols, technical terms, and other information directly where they are used within the paper.

There were more results in the metaphor search, and tweaking the metaphor search (eg, using different paragraphs) may find more relevant stuff. Still, these results had some very interesting essays and so those are what I pasted. Hope this helps someone implement what John is thinking! Busy for 2h.

more tidbits that could be part of or inspiration for an interesting tool

idk

  • "The goal of curvenote.dev is to provide open source tools to promote and enable interactive scientific writing, reactive documents and explorable explanations. These tools are used and supported by curvenote.com, which is an interactive scientific writing platform that integrates to Jupyter." https://curvenote.dev/
  • "Nota is a language for writing documents, like academic papers and blog posts. The goal of Nota is to bring documents into the 21st century." https://nota-lang.org/
  • "Apparatus is a hybrid graphics editor and programming environment for creating interactive diagrams." http://aprt.us/
  • "A Visual Programming Environment for Scientific Computing" - python visual programming with plotting https://mathinspector.com/
  • notebook with builtin desmos and latex https://themathist.com/app
  • "Hazel is a live functional programming environment that is able to typecheck, manipulate, and even run incomplete programs, i.e. programs with holes. There are no meaningless editor states." https://hazel.org/ 
  • windows only agpl math::topology of 3d::topology? I think? idk. "Topologic is a software development kit and plug-in that enables logical, hierarchical and topological representation of spaces and entities" https://topologic.app/ 

ordinary stuff

  • "Convert images and PDFs to LaTeX, DOCX, Overleaf, Markdown, Excel, ChemDraw and more, with our AI powered document conversion technology." https://mathpix.com/
  • "Gridpaste is an online math tool to share computations, transformations, and annotations on geometric structures in a coordinate plane." https://gridpaste.io/

react libs and such

more blogs

a dataset: https://wellecks.com/naturalproofs/ 

For years, I've collected a MASSIVE number of links in a bookmark folder, about this exact cluster of topics. It started as "build a better wiki for maths, especially for AI alignment", but now I've had a few different (not-very-workable) ideas for similar tools.

Here is a large dump of such links, in a similar vein: https://drive.google.com/file/d/1sFVb58wceAd_RoPqw_eGnA7K_wSIrziK/view?usp=sharing

Disclaimer: Haven't actually tried this myself yet, naked theorizing.

“We made a wrapper for an LLM so you can use it to babble random ideas!” 

I'd like to offer a steelman of that idea. Humans have negative creativity — it takes conscious effort to come up with novel spins on what you're currently thinking about. An LLM babbling about something vaguely related to your thought process can serve as a source of high-quality noise, noise that is both sufficiently random to spark novel thought processes and relevant enough to prompt novel thoughts on the actual topic you're thinking about (instead of sending you off in a completely random direction). Tools like Loom seem optimized for that.

It's nothing a rubber duck or a human conversation partner can't offer, qualitatively, but it's more stimulating than the former, and is better than the latter in that it doesn't take up another human's time and is always available to babble about what you want.

Not that it'd be a massive boost to productivity, but might lower friction costs on engaging in brainstorming, make it less effortful.

... Or it might degrade your ability to think about the subject matter mechanistically and optimize your ideas in the direction of what sounds like it makes sense semantically. Depends on how seriously you'd be taking the babble, perhaps.

I think one of the key intuitions here is that in a high dimensionality problem, random babbling takes far too long to solve the problem, as the computational complexity of random babbling is 2^n. If n is say over 100, then it requires more random ideas than anyone will make in a million years.

Given that most real world problems are high dimensional, babbling will lead you nowhere to the solution.

Yeah, but the random babbling isn't solving the problem here, it's used as random seeds to improve your own thought-generator's ability to explore.  Like, consider cognition as motion through the mental landscape. Once a motion is made in some direction, human minds' negative creativity means that they're biased towards continuing to move in the same direction. There's a very narrow "cone" of possible directions in which we can proceed from a given point, we can't stop and do a turn in an arbitrary direction. LLMs' babble, in this case, is meant to increase the width of that cone by adding entropy to our "cognitive aim", let us make sharper turns.

In this frame, the human is still doing all the work: they're the ones picking the ultimate direction and making the motions, the babble just serves as vague inspiration.

Or maybe all of that is overly abstract nonsense.

Me: *looks at some examples* “These operationalizations are totally ad-hoc. Whoever put together the fine-tuning dataset didn’t have any idea what a robust operationalization looks like, did they?”

... So maybe we should fund an effort to fine-tune some AI model on a carefully curated dataset of good operationalizations? Not convinced building it would require alignment research expertise specifically, just "good at understanding the philosophy of math" might suffice.

Finding the right operationalization is only partly intuition, partly it's just knowing what sorts of math tools are available. That is, what exists in the concept-space and is already discovered. That part basically requires having a fairly legible high-level mental map of the entire space of mathematics, and building it is very effortful, takes many years, and has very little return on learning any specific piece of math.

At least, it's definitely something I'm bottlenecked on, and IIRC even the Infra-Bayesianism people ended up deriving from scratch a bunch of math that latter turned out to be already known as part of imprecise probability theory. So it may be valuable to get some sort of "intelligent applied-math wiki" that babbles possible operationalizations at you/points you towards math-fields that may have the tools for modeling what you're trying to model.

That said, I broadly agree that the whole "accelerate alignment research via AI tools" doesn't seem very promising, either the Cyborgism or the Conditioning Generative Models directions. Not that I see any fundamental reason why pre-AGI AI tools can't be somehow massively helpful for research — on the contrary, it feels like there ought to be some way to loop them it. But it sure seems trickier than it looks at first or second glance.

Just a small point: InfraBayesianism is (significantly) more general than imprecise probability. But the larger point stands: a lot of the math of what alignment researchers need is already in the literature and an AI tool that can find those pieces of math and work with them profitably would be very useful.

Additional ideas for cognitive tools:

  • Let the user describe an intuitive concept, maybe with some additional examples, and let the system return one or several type signatures that match the intuitive concept and examples.
  • For hairy conceptual/strategic problems, map out possible compatible/incompatible decisions for plans/operationalisation, perhaps in disjunctive normal form (e.g. "we can fulfill axiom and axiom , but not , or and " (I stole this idea from the Tsvi Benson-Tilssen interview on the Bayesian Conspiracy podcast)
  • Try to find the algorithm for proving mathematical theorems (maybe some variant of MCTS?), and visualize the tree to find possible alternative assumptions/related theorems:

"Perhaps they are checking many instances? Perhaps they are white-box testing and looking for boundaries? Could there be some sort of “logical probability” where going down possible proof-paths yield probabilistic information about the final target theorem, maybe in some sort of Monte Carlo tree search of proof-trees? Do sleep serve to consolidate & prune & replay memories of incomplete lines of thought, finetuning heuristics or intuitions for future attacks and getting deeper into a problem (perhaps analogous to expert iteration)? Reading great mathematicians like Terence Tao discuss the heuristics they use on unsolved problems²⁶, they bear some resemblances to computer science techniques."

Gwern Branwen, “The Existential Risk of Math Errors”, 2019

In an unrelated note, the more conceptual/theoretical/abstract alignment research is, the less dangerous the related tools look to me: I usually don't want my AI system to be thinking about ML architectures and scaling laws and how to plan better, but abstract mathematical theorems seem fine.

Though "try to solve mathematics algorithmically" is maybe a bit far-fetched.

I generally find this compelling, but I wonder if it proves too much about current philosophy of science and meta-science work. If people in those fields have created useful insight without themselves getting dirty with the object work of other scientific fields, then the argument proves too much. I suspect there is some such work. Additionally:

I would guess that if none of the founders have ample personal experience doing research work in a wetlab, the chance of this startup building an actually-highly-useful wetlab product drops by about an order of magnitude.

Order of magnitude seems like a lot here. My intuitive feeling is that lacking relevant experience makes you 30-80% less likely to succeed, not 90%+. I think this probability likely matters for whether some certain people should work on trying to make tools for alignment research now. I thought a bit about reference classes we could use here to get a better sense, but nothing great comes to mind. Assuming for simplicity that the rate of founder with expertise and without is around the same (incorrect I’m sure), we can look at existing products.

Mainly, I think there are not very many specific products which 5x research speed, which is (maybe) the thing we’re aiming for. Below is a brainstorm of products that seem to somewhat speed up research but they’re mostly general rather than domain specific and I don’t think any of them have that big of an increase (maybe math assistants and Google Scholar are like 1.3-2x productivity for research process overall [all the math assistants together are big gains, but each one compared to its predecessor is not that big]). It seems that you are making a claim about something for which we don’t have nice historical examples — products that speed up research by 5x. The examples I came up with seem to be mostly relatively-general-purpose, which implies that the founders don’t have strong object level knowledge of each of the fields they are helping with; but I don’t think these are very good examples. This data is also terrible because it has the selection bias of being things I could think of and I probably haven’t heard of productivity boosting things that are specific to most fields.

Brainstorming examples of products that seem to do this: Copilot and other code-assistants; general productivity boosters like task management software, word processors, etc; tools for searching literature better, like Google Scholar, Elicit, Connected Papers, etc.; math assistants like calculators, Excel, Stata, etc.; maybe some general purpose technologies like the internet and electricity count, but those feel like cheating due to not being specific products. 

I don’t think this meager evidence particularly supports the claim that lacking domain knowledge decreases one’s chances of success by 90%+. I think it supports something like the following: It seems very hard to increase productivity 3x or more with a single product, and current products that increase productivity are often general-use. Those trying to revolutionize alignment research by making products to speed it up should recognize that large boosts are unlikely on base-rates. 

I’m curious if anybody has examples of products that you think 3x or more the total rate of your or others’ research? Ideally this would be 3x compared to the next best thing before this thing came along. I think people tend to over-estimate productivity gains from tools, so the follow-up question is: would you rather work 9 hours a day without that tool or 3 with it; which scenario gets more work done? Is that tool responsible for 67%+ of your research output, and would it be a good deal for 67% of your salary (or however society compensates you for research output, e.g., impact credits) to be attributed to that tool? You don't necessarily have to pass all these tests, but I am including them here because I think it's a really bold claim to say a tool 3xed your overall research productivity. Without examples of this it's hard to make confident claims about what it will take to make such tools in the future. 

If ever there was a thing we’re not bottlenecked on, it’s random ideas.

 

These seems basically wrong, even though the weak AI tools are bad at this.

EDIT 6 Mar 2023: retracted the "basically wrong" part, see replies below.

Consider:

  • The few promising AI safety directions are worked on by a few researchers, and there doesn't seem to have been much "stacking" of multiple researchers into those promising areas.
  • Current "babble"-oriented AI/tools, being trained on existing human data, don't tend to come up with actually new ideas that would impress a human. (Source: cursory testing of ChatGPT, prompting to "have REALLY unique" business ideas, generally just gives slightly-wacky existing ideas. If this problem can be prompt-engineered away, my guess is it'd take nontrivial effort.)
  • Because the AI alignment field is preparadigmatic, we don't know what the best ideas will turn out to be. We just have lots of similar ideas that we're pretty sure won't work. This seems like exactly the sort of situation where having wildly new ideas is a bottleneck. This is especially true if they're testable directly, or can be quickly discarded through discussion.

Perhaps we don't need more ideas, so much as randomer ideas.

This actually implies that weak AI tools are (as you noted), not helpful... but really hokey random word combiners/generators totally might be.

I think one of the key intuitions here is that in a high dimensionality problem, random babbling takes far too long to solve the problem, as the computational complexity of random babbling is 2^n. If n is say over 100, then it requires more random ideas than anyone will make in a million years.

Given that most real world problems are high dimensional, babbling will lead you nowhere to the solution.

This is true. I'm trying to reconcile this with the intuition that we need very-different ideas from what we already have.

Like, we need a focused chain of ideas (to avoid the combinatorial explosion of babble), but it needs to be "aimed" in a different-enough direction to be more-helpful-than-the-norm, while in a good-enough direction to be helpful-at-all.

Like, if we're talking about some specific idea-tree, my mental model is "low branching factor is good, but initialization matters more for getting what will turn out to be the most helpful ideas". And maybe(?) John's model is "the existing branch(es) of AI alignment are incomplete, but that should be solved depth-first instead of making new branches".

This could be correct, but that doesn't square with our shared belief that AI alignment is pre-paradigmatic. If we really are preparadigmatic, I'd expect we should look for more Newtons, but John's advice points to looking for more Tellers or Von Neumanns.

What would John rather have, for the same monetary/effort cost: Another researcher creating a new paradigm (new branches), or another researcher helping him (depth first)?

And if he thinks the monetary/effort cost of helping that researcher can't possibly be comparable, what precisely does that mean? That e.g. we would prefer new branches but there's no practical way the field of AI alignment can actually support them in any substantial way? Really?

What would John rather have, for the same monetary/effort cost: Another researcher creating a new paradigm (new branches), or another researcher helping him (depth first)?

I think "new approach" vs "existing approach" is the wrong way to look at it. An approach is not the main thing which expertise is supposed to involve, here. Expertise in this context is much more about understanding the relevant problems/constraints. The main preference I have is a new researcher who understands the problems/constraints over one who doesn't. Among researchers who understand the problems/constraints, I'd rather have one with their own new program than working on an existing program, but that's useful if-and-only-if they understand the relevant problems and constraints.

The problem with a random-idea-generator is that the exponential majority of the ideas it generates won't satisfy any known constraints or address any of the known hard barriers, or even a useful relaxation of the known constraints/barriers.

That said, I do buy your argument at the top of the thread that in fact GPT fails to even generate new bad ideas.

Ah, okay yeah that makes sense. The many-paths argument may work, but IFF the researcher/idea is even remotely useful for the problem, which a randomly-generated one won't. Oops

first-hand idea of what kinds of things even produce progress

I'd rather share second-hand ideas about how progress looks like based on a write up from someone with deep knowledge from multiple research directions than to spend next 5 years forming my own idiosyncratic first-hand empathic intuitions.

It's not like Agent Foundations are 3cm / 5db / 7 dimensions of more progress than Circuits, but if there is no standardized quantity of progress, then why ought we believe that making 1000 different tools by 1000 people now is worse than those people doing research first before attempting to help with non-research skills?