Is it ethical to work in AI "content evaluation"?

LESSWRONG
LW

Is it ethical to work in AI "content evaluation"? — LessWrong

I recently graduated with a CS degree and have been freelancing in "content evaluation" while figuring out what’s next. The work involves paid tasks aimed at improving LLMs, which generally fall into a few categories:

Standard Evaluations: Comparing two AI-generated responses and assessing them based on criteria like truthfulness, instruction-following, verbosity, and overall quality. Some tasks also involve evaluating how well the model uses information from provided PDFs.
Coding Evaluations: Similar to standard evaluations but focused on code. These tasks involve checking responses for correctness, documentation quality, and performance issues.
"Safety"-Oriented Tasks: Reviewing potentially adversarial prompts and determining whether the model’s responses align with safety guidelines, such as refusing harmful requests like generating bomb instructions.
Conversational Evaluations: Engaging with the model directly, labeling parts of a conversation (e.g., summarization or open Q&A), and rating its responses based on simpler criteria than the other task types.

Recently, I have been questioning the ethics of this work. The models I work with are not cutting-edge, but improving them could still contribute to AI arms race dynamics. The platform is operated by Google, which might place more emphasis on safety compared to OpenAI, though I do not have enough information to be sure. Certain tasks, such as those aimed at helping models distinguish between harmful and benign responses, seem like they could be geared towards applying RLHF and are conceivably net-positive. Others, such as comparing model performance across a range of tasks, might be relevant to interpretability, but I am less certain about this.

Since I lean utilitarian, I have considered offsetting potential harm by donating part of my earnings to AI safety organizations. At the same time, if the work is harmful enough on balance, I would rather stop altogether. Another option would be to focus only on tasks that seem clearly safety-related or low-risk, though this would likely mean earning less, which could reduce prospective donations.

LESSWRONG
LW

LESSWRONG
LW

2

Is it ethical to work in AI "content evaluation"?

2

2