Summary
In this post, I motivate an extension of constitutional AI (CAI) and present one possible concrete execution of that strategy.
TL;DR: When generating AI feedback during the CAI process, principles from the constitution are randomized for each pair of red-teamed prompts and initial responses. A helpful-only model then critiques its initial responses and subsequently revises them. Instead of randomizing selecting principles, I propose we choose principles based on the context provided by each particular prompt/response pair. I call this contextual constitutional AI.
This is intended only as a preliminary insight as part of my AISF: Alignment course project. Due to limited time and funding, I have made certain decisions that have made my investigation... (read 3596 more words →)
Hi Deco! I did not end up using LLMSwarm. As far as I understand, it's particularly useful when you have multiple GPUs and you can parallelize the computations (for eg. when creating the datasets). I was using only one GPU on Google Colab for my setup, so it didn't seem to fit my use case.
I haven't yet published my code for this—it's a bit messy and I was hoping to streamline it when I get the time before making it public. However, I'd be happy to give you (or anyone) access in the meantime, if they are in a hurry and want to take a look.