Thanks so much for making this!
I'm hopeful this sort of dataset will grow over time as new sources come about.
In particular, I'd nominate adding MLSN (https://www.alignmentforum.org/posts/R39tGLeETfCZJ4FoE/mlsn-4-many-new-interpretability-papers-virtual-logit) to the list of newsletters in the future.
Yes super excited about datasets like this! It might be helpful to also add https://ai-alignment.com/ or https://paulfchristiano.medium.com/ if these aren't already in the data
I believe all of those posts can be found on the Alignment Forum so, luckily, they are included in the dataset (at least from what I remember after checking a handful of the posts). I had begun scraping from those sources, but realized they were already on AF halfway through.
Good idea! I added most of the papers from the previous entries of MLSN. Adding the summaries would be a useful next step. Would be great if someone could keep track of it in a Google Sheet of individual summaries like the Alignment Newsletter (https://docs.google.com/spreadsheets/d/1lJ6431R-E6aioVRd7AN4LQYTj-QhQlUYNRbGDbG5RWY/edit?usp=sharing).
I was also considering adding distillations as a key as well. For example, adding ELK distillations to the ELK report entry.
I saw that earlier! Please keep up the great work. We have a Google Sheet with the bibliography (or at least arXiv url) of a bunch of papers. I can add those manually, but it would be great if they were stored in Zotero (preferably from arXiv) or something similar, we’d love to export it and add it to our list. And for anyone else reading this, sending us a bibliography like TAI Safety (https://www.lesswrong.com/posts/4DegbDJJiMX2b3EKm/tai-safety-bibliographic-database) would be a great way to contribute! :)
Here’s the (public) Google Sheet of the arXiv papers we used (there are duplicates but we remove duplicates during extraction): https://docs.google.com/spreadsheets/d/1jh5VbDWqNZiB5VUM4MW-yDRhZVGa19OsC2wdI6MwUZk/edit
We focused our search on alignment papers more specifically, but we are also discussing branching out to other domains with papers that could be relevant for alignment.
Unfortunately, I am denied access. Send me a private message with a Google Drive link to the exported Zotero RDF and I'll import it into my Zotero library. I think you can create a group in Zotero, but it is very limited in storage (need to pay to load a decent amount of entries).
This is very cool! For archiving and rebuilding after a global catastrophe, how easy would this be to port to Kiwix for reading on a phone? My thinking is that if a few hundred LWers/EAs have this offline on their phones, that could go quite a long way. Burying phones with it on could also be good as a low hanging fruit (ideally you need a way of reading the data to be stored with the data). Happy to fund this if anyone wants to do it.
Cool work!
Can I ask a couple of questions about the DR+clustering approach?
If I understand correctly, you do the clustering in a 2D space obtained with UMAP (ignore this if I am wrong). Are you sure you are not losing important information with such a low dimension? I say this because you show that one dimension is strongly correlated with style (academic vs forum/blog) and the second may be somewhat correlated with time. I remember that an argument exists for using n-1 dimensions when looking for n clusters, although that was probably using linear DR techniques and might not apply to UMAP. But it would be interesting to check if using higher n_components (3 to 5) results in the same clustering or generates some new insight.
Another thing you could check is using GMM instead of k-means. My (limited) experience is that if the embedding dimension is low you get better results this way. But, again, I was clustering downstream of linear DR.
Thank you for the comment and the questions! :)
This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping - that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.
Ahh sorry! Going back to read it was pretty clear from the text. I was tricked by the figure where the embedding is presented first. Again, good job! :)
We were surprised to find a decrease in publications on the arXiv in recent years, but identified the cause for the decrease as spurious and fixed the issue in the published dataset (details in Fig. 4).
I'd be interested in hearing more about how the decrease was determined to be spurious; I looked at Fig. 4 but am not understanding how that decision was made based on the figure, if that was the intention.
Thanks for the question! When we initially scraped the dataset, we looked at the dates in figure 1.a. and there was a decrease in papers after 2020 since much of the Alignment literature lists we grabbed papers from were made in 2020 or earlier and had not been updated. This led to a perceived decline in papers based on figure 1.a. However, this seemed obviously due to not including all the newer papers that had came out in 2020 and later. So, once we scraped a wider set of papers using arXiv’s API, you could see the uptick in papers in 2020 and beyond (figure 4.e) where there was previously a decrease (figure 1.a).
I would very much like to see your dataset, as a zotero database or some other format, in order to better orient myself to the space. Are you able to make this available somehow?
Very very helpful! The clustering is obviously a function of the corpus. From your narrative, it seems like you only added the missing arx.iv files after clustering. Is it possible the clusters would look different with those in?
Hey Ben! :) Thanks for the comment and the careful reading!
Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that's pretty neat (especially since the dimensionality reduction doesn't "know" about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.
I just updated the code for the scrape to include the EA Forum in case someone wants to do something interesting with that data. Contains metadata as well: authors, score, votes, date_published, text (post contents), comments.
Here’s a link to a jsonl of the EA Forum only: https://drive.google.com/file/d/1XA71s2K4j89_N2x4EbTdVYANJ7X3P4ow/view?usp=drivesdk
TL;DR: In this project, we collected and cataloged AI alignment research literature and analyzed the resulting dataset in an unbiased way to identify major research directions. We found that the field is growing quickly, with several subfields emerging in parallel. We looked at the subfields and identified the prominent researchers, recurring topics, and different modes of communication in each. Furthermore, we found that a classifier trained on AI alignment research articles can detect relevant articles that we did not originally include in the dataset.
(video presentation here)
Dataset Announcement
In the context of the 6th AISC, we collected a dataset of alignment research articles from a variety of different sources. This dataset is now available for download here and the code for reproducing the scrape is on GitHub here[1]. When using the dataset, please cite our manuscript as described in the footnote[2].
Here follows an abbreviated version of the full manuscript, which contains additional analysis and discussion.
Rapid growth of AI Alignment research from 2012 to 2022 across two platforms
After collecting the dataset, we analyzed the two largest non-redundant sources of articles, Alignment Forum (AF) and arXiv. We found rapid growth in publications on the AF (Fig. 1a) and a long-tailed distribution of articles per researcher (Fig. 1b) and researchers per article (Fig. 1c). We were surprised to find a decrease in publications on the arXiv in recent years, but identified the cause for the decrease as spurious and fixed the issue in the published dataset (details in Fig. 4).
Unsupervised decomposition of AI Alignment research into distinct clusters
Given access to this unique dataset, we were curious to see if we could identify distinct clusters of research. We mapped the title + abstract of each article into vector form using the Allen Institute for AI's SPECTER model and reduced the dimensionality of the embedding with UMAP (Fig. 2a). The resulting manifold shows a continuum of AF posts and arXiv articles (Fig. 2b) and a temporal gradient from the top right to the bottom left (Fig. 2c). Using k-means and the elbow method, we obtain five clusters of research articles that map onto distinct regions of the UMAP projection (Fig. 2d).
We were curious to see if the five clusters identified by k-means map onto existing distinctions in the field. When identifying the most prolific authors in each cluster, we noticed strong differences[3] (consistent with previous work that suggests that author identity is an important indicator of research direction).
By skimming articles in each cluster and given the typical research published by the authors, we suggest the following putative descriptions of each cluster:
We note that these descriptions are chosen to be descriptive, not prescriptive. Our approach has the advantage of being (comparatively[4]) unbiased and can therefore serve as a baseline against which other (more prescriptive) descriptions of the landscape can be compared (Krakovna's paradigms, FLI landscape, Christiano's landscape, Nanda's overview, ...). Discrepancies between these descriptions and ours can serve as important information for funding agencies (to identify neglected areas) and AI Governance researchers (for early identification of natural categories for regulation).
Research dynamics vary across the identified clusters
We further note some properties of the identified clusters (Fig. 3a). The cluster labeled as "alignment foundations" contains most of the seminal work in the field (Fig. 3b,c), but remains largely disconnected from the more applied "agent alignment" and "tool alignment" research (Fig. 3a). Furthermore, most "alignment foundations" work is published on the Alignment Forum (Fig. 3d) and it has the largest inequality in terms of "number of articles per researcher" (Fig. 3e). This corroborates an observation that was made before: While critically important, alignment foundations research appears to be poorly integrated into more applied alignment research, and the research remains insular and pushed by comparatively few researchers.
Leveraging dataset to train an AI alignment research classifier
After having identified the five clusters, we returned to the issue we noted at the onset of our analysis: the apparent decrease in publications on the arXiv in recent years (Fig. 1a). We were skeptical about this and hypothesized that our data collection might have missed relevant recent articles[5]. Therefore, we trained a logistic regression classifier to distinguish alignment articles (level-0) from articles cited by alignment articles (level-1) (Fig.4 a). The resulting classifier achieved good performance and generalized well to papers from unrelated sources (Fig. 4b). We then scraped all the articles from the arXiv cs.AI category and asked our classifier to score them (Fig. 4c,d). Based on the distribution of scores of Alignment Forum posts (Fig. 4d) and after skimming the relevant articles, we chose a threshold of 75% as a reasonable trade-off between false positives and false negatives.
When adding the arXiv articles above the cutoff to our dataset, we observed a rapid increase in publications also on the arXiv (Fig. 4e). To test if our clustering is robust to this increase, we repeated the UMAP projection with the updated dataset and found that, indeed, the clusters are still in distinct regions of the manifold (Fig. 4f). Interestingly, the added literature appears to fill some of the gaps between "alignment foundations" and "agent alignment" research.
Closing remarks
The primary output from our project is the curated dataset of alignment research articles. We hope the dataset might serve as the basis for
If you have other ideas for how to use the dataset, please don't hesitate to reach out to us; we're excited to help.
Furthermore, we hope that the secondary outcome from our project (the analysis in this post) can aid both funding agencies and new researchers entering the field to orient themselves and contextualize the research.
As we plan to continue this line of research, we are happy about any and all feedback on the dataset and the analysis, as well as hints and pointers about things we might have missed.
Acknowledgments: We thank Daniel Clothiaux for help with writing the code and extracting articles. We thank Remmelt Ellen, Adam Shimi, and Arush Tagade for feedback on the research. We thank Chu Chen, Ömer Faruk Şen, Hey, Nihal Mohan Moodbidri, and Trinity Smith for cleaning the audio transcripts.
We will make some finishing touches on the repository over the next few weeks after this post is published.
Kirchner, J. H., Smith, L., Thibodeau, J., McDonnell, K., and Reynolds, L. "Understanding AI alignment research: A Systematic Analysis." arXiv preprint arXiv:2206.02841 (2022).
Except for Stuart Armstrong, who publishes prolifically across all clusters.
Remaining biases include:
We took the TAI Safety Bibliographic Database from early 2020 as a starting point and manually added relevant articles from other existing bibliographies or based on our judgment. We were very conservative in this step, as we wanted to make sure that our dataset includes as few false positives as possible.