2277

LESSWRONG
LW

2276
Agent FoundationsAI
Personal Blog

78

Proceedings of ILIAD: Lessons and Progress

by Alexander Gietelink Oldenziel, JessRiedel
28th Apr 2025
10 min read
5

78

78

Proceedings of ILIAD: Lessons and Progress
7plex
1Jess Riedel
2plex
4Jonas Hallgren
2Jess Riedel
New Comment
5 comments, sorted by
top scoring
Click to highlight new comments since: Today at 4:44 AM
[-]plex5mo70

No public comments will be hosted on our website as we don't have the resources for moderation of public discussion. Authors can choose to link-post their work on the Alignment Forum or LessWrong to engage with a broader audience.

I think it'd be pretty important/useful if the UI shows links to publicly commentable link-posts where those exist.

Reply
[-]Jess Riedel5mo10

I think this is something we may do, with the caveat that we would make it an option that the author can choose (and that that would be clear to the readers.) We don’t want to get in the business of deciding which online discussions are good or bad enough to be worth endorsing as worth the reader’s time. 

Thanks for raising this.

Reply
[-]plex5mo20

Yes, I'm imagining if the author link-posts they can add a cross-link so viewers can participate.

Reply
[-]Jonas Hallgren5mo40

I was encouraged to ask these questions under this post instead of over email so I'll do so:

I wanted to ask some questions around what papers are within scope.

Firstly, what about things overlapping between Cooperative AI and Agent Foundations? I've got a paper on tying game theory together with percolation theory in order to better predict fixed points in complex systems, is this within scope or not?

Secondly, I believe there are various problems in scaling technical single-agent safety to systemic prolems of Gradual Disempowerment and similar. We need to be able to detect agency in larger systems as well and Michael Levin has a bunch of work on trying to establish "diverse intelligence". Would something that is agent foundations on diverse intelligence be something compatible? A sort of scaling of agent foundations to systemic disciplines like computational social science.

Finally, what about taxonomy papers? Something I've noticed being frustrated about is not having a taxonomy of agent definitions and therefore running into issues with specification of language in discussions. An "agent" can mean a lot of things and I was thinking that putting together a taxonomy of what existing fields consider agents might be useful.

I love the idea of ILIAD, I think it's needed and awesome.

Reply
[-]Jess Riedel5mo20

Thanks, we really appreciate the questions.


Our general approach to scope is to ask (1) if the topic is worth studying, and (2) if there are no other venues that can offer a substantially better review. If so, we’ll probably say yes. (We generally want to avoid reviewing manuscripts where there are already good existing journals who accept submissions on the topic, e.g., almost all interpretability.) We are willing to go outside our comfort zone to get worthwhile manuscripts reviewed imperfectly if the alternative is they get reviewed nowhere. One advantage of the reviewer abstract idea is that it allows the reviewers to communicate their uncertainty to the potential reader.

Both of the interdisciplinary papers you mention sound fine. In these sorts of cases we may ask the authors to put in special effort in helping us locate qualified (and reasonably unbiased) reviewers.

Review and taxonomy papers are fine, and indeed we’d love to see something that collects and compares various definitions of “agent” in both the conventional lit and the Alignment Forum. For us the question isn’t “Is this novel enough to ‘deserve’ publication?”, it’s “Is this worth writing? Are there at least a few researchers who will find this significantly more useful than what’s already been written?”.

Reply1
Moderation Log
More from Alexander Gietelink Oldenziel
View more
Curated and popular this week
5Comments
Agent FoundationsAI
Personal Blog

tl;dr

This post is an update on the Proceedings of ILIAD, a conference journal for AI alignment research intended to bridge the gap between the Alignment Forum and academia. Following our successful first issue with 9 workshop papers from last year's ILIAD conference, we're launching a second issue in association with ILIAD 2: ODYSSEY. The conference is August 25-29, 2025 at Lighthaven in Berkeley, CA. Submissions to the Proceedings are open now (more info) and due June 25. Our goal is to support impactful, rapid, and readable research, carefully rationing scarce researcher time, using features like public submissions, partial anonymity, partial confidentiality, reviewer-written abstracts, reviewer compensation, and open licensing. We are soliciting community feedback and suggestions for reviewers and editorial board members.

Motivation

Prior to the deep learning explosion, much early work on AI alignment occurred at MIRI, the Alignment Forum, and LessWrong (and their predecessors). Although there is now vastly more alignment and safety work happening at ML conferences and inside industry labs, it's heavily slanted toward near-term concerns and ideas that are tractable with empirical techniques. This is partly for good reasons: we now have much more capable models which guide theory and allow extremely useful empirical testing.

However, conceptual, mathematically abstract, and long-term research on alignment still doesn't have a good home in traditional academic journals and conferences. Much of it is still done on the AI Alignment Forum and here on LessWrong, or is done informally (private discussion, Twitter, blogs, etc) by academic researchers without a good venue for attracting the best constructive criticism.

As a result, there remains a gulf between more traditional academic work and much of the most important alignment work:

  • Some traditional academics consider alignment work to be sloppy or unsophisticated, often re-inventing the wheel, neglecting prior academic literature, failing to engage with the strongest criticism, lapsing into hubris, and ignoring valuable academic norms.
  • Some alignment researchers consider traditional academic work to be unacceptably slow, unwilling to confront the hard problems, incremental, credentialist, reluctant to publicly criticize misleading work, unable to abandon failed approaches, and prone to fetishizing the trappings of academia.

There is substantial truth in many of these criticisms, but often they can be misaimed. There is value in the techniques and philosophy of both communities, and we would like get best of both worlds by building a central venue to bridge them.

There is currently nothing like an academic journal on alignment. Several scientific fields (e.g., game theory, cybernetics) were meaningfully accelerated or incubated by an initial conference followed up with a journal, so we decided to take this route. We were warned several times that a journal/proceedings is an enormous amount of work, but we're stubborn and decided to try out a trial version at the first ILIAD conference (~120 attendees; LW announcement here) that took place August 28 - September 3, 2024 at Lighthaven.

Experience with first issue of Proceedings

We have just released 9 workshop papers in the first issue of Proceedings of ILIAD. First, the bad:

  • We took way too long. It's been >8 months since ILIAD. This is silly. Doubly so in a short-timeline world. Turnover should be 2-month max.
  • We lacked experience with the process of reviewing.
  • We randomly allocated submitters to do two reviews each. This means people often reviewed work outside their expertise.
  • It was a lot of work and since it wasn't our primary priority this was an onerous burden.
  • It was harder than expected to have a consistent standard. 

And the good:

  • There was a sufficient number of submissions (this was our main uncertainty).
  • The overall quality was good. 

Overall we were satisfied with this experiment and have decided to push forward. We have just opened up submissions for the second issue of the Proceedings in association with the second annual conference, ILIAD 2: ODYSSEY, taking place August 25-29, 2025 at Lighthaven. The soft deadline for submission is June 25th. We are continuing to experiment on mechanisms for running these proceedings, especially with the review process.  If we succeed in getting good community engagement and adding value, we may start an archival journal dedicated to AI alignment. 

In the rest of this post we first describe our general philosophy and then how we expect that to cash out in terms of the design of the second issue of the Proceedings and a possible alignment journal.

General philosophy

We want to accelerate impactful research and make it more readable to other researchers. Our hope to combine the best features of academic journals and internet forums.

An idealized traditional journal...

  • ...identifies great research from the chaff and brings it to the attention of other researchers. It pushes someone (the reviewer) to read a manuscript carefully when it might otherwise be neglected or only read cursorily.
  • ...improves manuscripts through constructive detailed feedback, e.g., better explanations, and forcing comparison with the existing literature.
  • ...matches manuscripts to the most relevant experts so the above can be done efficiently.
  • ...allows manuscripts to be written with a wider set of useful tools generally not found on internet forums, such as equation references, latex macros, and automatic bibliography generation.
  • ...has less incentive to pander to non-experts, and wastes less author time on low-quality comments.

On the other hand, an idealized internet forum...

  • ...enables near-instant and near-frictionless dissemination of results.
  • ...generates rapid feedback from forum comments. Review takes hours or days, not months.
  • ...solicits feedback from whoever happens to be most interested, rather than guessing with editor-picked reviewers. This can get more engagement, and many eyes on a work can surface problems.
  • ...allows updating of posts (“living”).

We want to combine all the above advantages as much as possible. We are inspired by the Distill Journal while keeping in mind its reasons for shutting down.

A central lens through which we think about all decisions is that, when it comes to the economics of academic discussion and review, the scarce resource is researcher time. Researcher time is best used when: 

  • Reviewers are matched with papers that interest them and for which they have the relevant expertise.
  • Reviewers are not forced to read low-quality papers, and authors are not forced to respond to low-quality reviews.
  • Neither authors nor reviewers waste time on papers that shouldn't be written in the first place: incremental, deceptive, or boring research.
  • Reviews offer constructive feedback.
  • The review process is rapid, and does not get derailed by bickering or minutia.
  • The public output of the review process is not a single bit (publish/reject), but rather a distillation of the reviewer's insight about a paper that can aid future readers.

Design of the second issue of the Proceedings

Some simple and relatively uncontroversial features:

  • Rapid review, at least relative to the slower academic journals. We are targeting 2 months or less, with further improvements in future issues.
  • Google-scholar indexing of all accepted papers.
  • No public comments will be hosted on our website as we don't have the resources for moderation of public discussion. Authors can choose to link-post their work on the Alignment Forum or LessWrong to engage with a broader audience.
  • Manuscript submissions in any format are accepted (even... shudder... Microsoft Word) and are sent out for review as a PDF. This avoids unnecessarily wasting author time on formatting if they end up being rejected.  Conversion to a standardized format for beautiful distribution will occur only after the manuscript is accepted.
  • Prior- and post-publication allowed: that is, you can release or publish the work however you want, before or after you submit to us. All we ask is that (a) you tell us where it's already published and (b) you don't submit it to us and other journals concurrently (because we don't want to waste the time of reviewers). In particular, this makes the Proceedings a non-archival publication.
  • Web-first formatting: Accepted manuscripts will be formatted with something like the Distill version of R Markdown to get beautiful math, reflowable text, mobile readability, etc.  (We will not support interactive content, for now.)  A PDF will also be available.
  • Formatting assistance: We have more resources per paper than many journals for helping an author improve the appearance and typesetting of their work after acceptance, so they can prepare the initial manuscript using whatever format is most productive for them. In particular, we aim to allow authors to submit work formatted for the Alignment Forum (e.g., Markdown) or in LaTeX with negligible effort beyond approving our conversion output.
  • Limited mentorship: While the Proceedings is small (~25 submissions), the editorial decision role and mentorship role necessarily blend together. If we are successful in growing the number of papers in future issues, we will separate these roles.

Here are some less-trivial ideas we are tentatively planning to use:

  • Publicly visible submissions:
    • Submitted manuscripts will be publicly posted while under review.
    • Anyone may submit an unsolicited review to the editor. (See "self-nominated reviewer" below.) If constructive, editors will include these in the review process.
  • Dual abstracts:
    • Each published paper will appear with a traditional abstract written by the authors alongside a “reviewer abstract” written by one or more of the reviewers and accepted by the authors.
    • The reviewers will be told when first recruited that this will be asked of them, and they can write a draft reviewer abstract as the first paragraph of their report.
    • The instructions for the reviewer is not just to perfunctorily summarize the paper, nor to simply issue a final assessment, but rather to write the abstract they wish they could have read before diving into the paper (and in particular, an abstract that would help a reader decide on the merits if they want to read the paper in the first place).
  • Confidential and semi-anonymous review:
    • OpenReview is used for the review process.
    • Reviewers will initially be anonymous.
    • After submitting their first report, reviewers will see other reviewer's reports on the same manuscript and the author's responses. The back-and-forth discussion will be kept private between reviewers, authors, and editor to encourage honest and constructive discussion.
    • If a manuscript is accepted, one reviewer will be asked by the editor to write the reviewer abstract (see above), incorporating insight and explicit text from any of the reviews.
    • Any or none of the reviewers may choose to publicly sign the reviewer abstract.
  • Licensing: Authors and reviewers agree to release their work under a Creative Commons Attribution (CC-BY) 4.0 license. Basically, anyone can share and adapt the work so long as they give attribution to the original.
    • In particular this means others can reuse exact words and figures from anything published in the Proceedings.
    • We believe this makes research maximally useful to others and is consistent with the fact that most of the research we publish will be publicly or philanthropically funded.
    • It's possible this causes some researchers to not submit their work to the Proceedings, but we expect this will be a small effect. Such licenses are the norm on the OpenReview platform.
  • Reviewer payments:
    • Authors of useful reviews will receive ~$200, and unusually excellent reviews will get double (~$400). (Quality is judged by the editor, and unsolicited reviews are eligible for compensation.) An additional ~$100 will go to the reviewer who writes the reviewer abstract.
    • The amounts are subject to revision based on funding availability before the review process starts
    • Reviewers will be paid for positive and negative reviews alike.
    • We are hoping that payments, though modest, will spur reviewers to review quickly, thoroughly, and professionally.
    • We're of course cognizant of the various ways payments can negatively distort motivations. But we think its worth trying.

Possible design for an alignment journal

Towards our ultimate goal of combining as many advantages of academic journals and internet forums into one venue, here are some ideas we're strongly considering for an alignment journal (but not for the next issue of Proceedings):

  • Living: Authors can easily and nearly instantly update their paper post-publication. The original published version will remain available and marked as the “reviewed version".
  • Re-review: Additional review may be obtained for published papers that would benefit from it, e.g., if problems are discovered later or if the paper proves to be more important than was initially appreciated. (This could be augmented with additional markers of notability, e.g., "Editor's suggestion", "Test of time", etc.).
  • Self-nominated reviewer: Anyone can easily (e.g., with one click) nominate themselves to review a paper, with acceptance determined by the editor. (Self-nominated reviewers can submit a review unilaterally alongside their self-nominations, and the editor can take the usefulness of the submitted review into account, but the reviewer risks wasted effort if the editor ends up not accepting it.) As with all reviewers, the identity of self-nominated reviewers is known to the editor but they participate anonymously in the joint discussion between reviewers and authors.  At the end, they can optionally sign the reviewer's abstract. The editor would still recruit reviewers, and the editor would still decide which reviews are useful enough to show to the authors and to other reviewers, but it would make it much easier (relative to a traditional journal) for good reviewers to bring themselves to the editor's attention.

The last bullet point is potentially the most powerful mechanism for obtaining most of the benefits of internet forums while retaining the essentials of peer review. In essence, the review process would be a conventional internet forum discussion, except that (1) participants are filtered for expertise, (2) the discussion is confidential and lightly moderated by the editor, and (3) the detail results of the discussion are released as a reviewer abstract.  Importantly, the moderation workload is greatly reduced compared to an open forum because the editor doesn’t have to monitor the comments in real-time; new commenters are hidden by default until approved by the editor.

Asks for readers

We're seeking constructive criticism of the above ideas. Please also let us know:

  • Who should be on the editorial board? Who would make good reviewers? Suggestions, including self-nominations, are welcome. You can post them here or just email us.
  • What currently keeps you, alignment researcher, from submitting your best ideas to an academic journal/conference, or from posting them on the Alignment Forum?
  • If you are considering submitting, which of these policies do you like or dislike?
  • If you are considering reviewing, would a payment make you more or less likely to agree to review, and to put in effort?

Acknowledgements

We thank Oliver Habryka for discussion.