New Comment
3 comments, sorted by Click to highlight new comments since: Today at 10:06 PM

Estimating how much safety research contributes to capabilities via citation counts

An analysis I'd like to see is:

  1. Aggregate all papers linked on Alignment Newsletter Database (public) - Google Sheets 
  2. For each paper, count what percentage of citing papers are also in ANDB vs not in ANDB (or use some other way of classifying safety vs not safety papers)
  3. Analyze differences by subject area / author affiliation

My hypothesis: RLHF, and OpenAI work in general, has high capabilities impact. For other domains e.g. interpretability, preventing bad behavior, agent foundations, I have high uncertainty over percentages.

Research Agenda Base Rates and Forecasting

An uninvestigated crux of the AI doom debate seems to be pessimism regarding current AI research agendas. For instance, I feel rather positive about ELK's prospects, but in trying to put some numbers on this feeling, I realized I have no sense of base rates for research program's success, nor their average time horizon. I can't seem to think of any relevant Metaculus questions either. 

What could be some relevant reference classes for AI safety research program's success odds? Seems most similar to disciplines with both engineering and mathematical aspects driven by applications. Perhaps research agendas on proteins, material sciences, etc. It'd be especially interesting to see how many research agendas ended up not panning out, i.e. cataloguing events like 'a search for a material with X tensile strength and Y lightness starting in year Z was eventually given up on in year Z+i'.

When are intuitions reliable? Compression, population ethics, etc.

Intuitions are the results of the brain doing compression. Generally the source data which was compressed is no longer associated with the intuition. Hence from an introspective perspective, intuitions all appear equally valid.

Taking a third-person perspective, we can ask what data was likely compressed to form a given intuition. A pro sports players intuition for that sport has a clearly reliable basis. Our moral intuitions on population ethics are formed via our experiences in every day situations. There is no reason to expect one persons compression to yield a more meaningful generalization than anothers'--we should all realize that this data did not have enough structure to generalize to such cases. Perhaps an academic philosopher's intuitions are slightly more reliable in that they compress data (papers) which held up to scrutiny. 

New to LessWrong?