Specifically for AI alignment (small field, preparadigmatic, plausibly-short-timelines), but a general principle decision-mechanism could apply to other fields.

What I mean by this is, what ratio of pre-learning to filling-in-gaps, leads to deeper and quicker insights? (Note that, given the time-blocks I'm thinking of here, I could get sidetracked by either of these in a counterproductive way, either by going too-deep on a knowledge-base-item that's actually trivial, or by doing an approach that a deeper knowledge base would immediately show to be wrong.)

Again, especially interested in how this applies to AI alignment. I've seen plausibly good arguments for tilting the ratio one way or the other. And I vaguely get that two people could implement the same knowledge-base-vs-backchain ratio, but feel like they're leaning different directions.

New to LessWrong?

New Answer
New Comment

3 Answers sorted by

(Personal bias is heavily towards the upskilling side of the scale) There are three big advantages to “problem first, fundamentals later”:

  1. You get experience doing research directly
  2. You save time
  3. Anytime you go to learn something for the problem, you will always have the context of “what does this mean in my case?”

3 is a mixed bag - sometimes this will be useful because it brings together ideas from far-away in idea-space; other times it will make it harder for you to learn things on their own terms - you may end up basing your understanding on non-central examples of a thing and end up having trouble putting it in its own context, thus making it harder to make it a central node in your knowledge-web. This makes it harder to use and less cognitively available.

By contrast, the advantages of going hard for context-learning:

  1. Learning bottom-up helps resolve this “tools in-context” problem
  2. I’d bet that if you focus on learning this way, you will feel less of the “why can’t I just get this and move past it already” pressure - which imo is highly likely to end up with poor learning overall
  3. Studying things “in order” will give you a lot more knowledge of how a field progresses - giving a feel for what moving things forward should “feel like from the inside”.
  4. Spaced repetition basically solves the “how do I remember basic isolated facts” problem - an integrated “bottom-up” approach is better-suited to building a web-of knowledge which will then give you affordances as to when to use particular tools.

The “bottom-up” approach has the risk of making you learn only central examples of concepts - this is best mitigated by taking the time and effort to be playful with the whatever you’re learning. Having your “Hamming problem” in mind will also help - it doesn’t need to be a dichotomy between bottom-up and top-down, in that regard.

My recommendation would be to split it and try: for alignment, there’s clearly a “base” of linalg, probability, etc. that in my estimation would be best consumed in its own context, while much of the rest of the work in the field is conceptual enough that mentally tagging what the theories are about (“natural abstractions” or “myopia of LLMs”) is probably sufficient for you to know what you’ll need and when, thus good to index as needed.

Thank you, this makes sense currently!

(Right now I'm on Pearl's Causality)

My best guess after ~1 year upskilling is to learn interpretability (which is quickly becoming paradigmatic) and read ML papers that are as close to the area you plan to work in as possible; if it's conceptual work also learn how to do research first, e.g. do half a PhD or work with someone experienced, although I am pessimistic about most independent conceptual work. Learn all the math that seems like obvious prerequisites (for me this was linear algebra, basic probability, and algorithms, which I mostly already had). Then learn everything else you feel like you're missing lazily, as it comes up. Get to the point where you have a research loop of posing and solving small problems, so that you have some kind of feedback loop. 

The other possible approach, something like going through John Wentworth's study guide, seems like too much background before contact with the problem. One reason is that conceptual alignment research contains difficult steps even if you have the mathematical background. Suppose you want to design a way to reliably get a desirable cognitive property into your agent. How do you decide what cognitive properties you intuitively want the agent to have, before formalizing them? This is philosophy. How do you formalize the property of conservatism or decide whether to throw it out? More philosophy. Learning Jaynes and dynamical systems and economics from textbooks might give you ideas, but you don't get either general research experience or contact with the problem, and your theorems are likely to be useless. In a paradigmatic field, you learn exactly the fields of math that have been proven useful. In a preparadigmatic field, it might be necessary to learn all fields that might be useful, but it seems a bit perverse to do this before looking at subproblems other people have isolated, making some progress, and greatly narrowing down the areas you might need to study.

Not clear what do you mean by "for AI alignment". "What leads to quicker and deeper insights?" is still not a good enough question, because there may be different purposes that you wish to apply these insights to. Some possible options: 1) find flaws in models apparently held by people at large labs (OpenAI etc.) and convince them to change course of action; 2) demonstrate your insights during interviews to be hired by OpenAI/DeepMind/etc. 3) start an AI alignment startup for a specific idea; 4) start an alignment startup without a specific alignment idea (while you still need to be able to distinguish good from bad ideas when selecting projects, hiring, etc.); 5) work on AI governance, policy design, or startup which is not overtly about AI alignment but attach to x-risk models in important ways; etc. These different pragmatic leads to a different optimal ratio of "explore vs. exploit" and different blend of the topics and disciplines for study.

I suspect that you are close to have goal 1), but I become convinced recently that this is a very ineffectual goal because it's close to impossible to "convince" large labs (where billions of dollars are already involved, which notoriously makes changing people's minds much more difficult) in anything from the outside. So I don't even want to discuss the optimal ratio for this goal.

For goal 2), you should learn which deep base knowledge models the hiring manager cherishes and learn those. E.g., if the hiring manager likes ethics or epistemology, or philosophy of science, you better learn some of them and demonstrate your knowledge of these models during the interview. But if the hiring manager is not very knowledgeable about these themselves, this deep knowledge will be to no avail.

Then, if you are already at a large lab, it's too context dependent: organisational politics, tactical goals such as completing a certain team project, the models that you teammates already possess, all play the role in deciding what, when, and how you should learn at an organisation.

For goal 3), the bias for "greater depth first" should definitely be higher than for 4). But for 4) you should have some other exceptional skills or resources to offer (which is offtopic for this question though).

For 5), pretty clearly you should mostly backchain.

1 comment, sorted by Click to highlight new comments since: Today at 6:12 AM

What I mean by this is, what ratio of pre-learning to filling-in-gaps, leads to deeper and quicker insights?



For my case, what was useful is: personal worldview --> LW Material --> Ask Community --> Build a theory --> Test the theory --> Repeat.