3050

LESSWRONG
LW

3049
Effective altruismEthical OffsetsEthics & MoralityUtilitarianismAI
Frontpage

2

Is it ethical to work in AI "content evaluation"?

by anon_databoy123
27th Jan 2025
1 min read
2

2

2

Is it ethical to work in AI "content evaluation"?
4Dave Orr
1anon_databoy123
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 6:39 AM
[-]Dave Orr9mo*42

I'm probably too conflicted to give you advice here (I work on safety at Google DeepMind), but you might want to think through, at a gears level, what could concretely happen with your work that would lead to bad outcomes. Then you can balance that against positives (getting paid, becoming more familiar with model outputs, whatever).

You might also think about how your work compares to whoever would replace you on average, and what implications that might have as well.

Reply
[-]anon_databoy1239mo10

Part of why I ask is because it's difficult for me to construct a concrete gears-level picture of how (if at all) my work influences eventual transformative AI. I'm unsure about the extent to which refining current models' coding capabilities accelerates timelines, whether some tasks are possibly net-positive, whether these impacts are easily offset etc.

Reply
Moderation Log
More from anon_databoy123
View more
Curated and popular this week
2Comments
Effective altruismEthical OffsetsEthics & MoralityUtilitarianismAI
Frontpage

I recently graduated with a CS degree and have been freelancing in "content evaluation" while figuring out what’s next. The work involves paid tasks aimed at improving LLMs, which generally fall into a few categories:

  • Standard Evaluations: Comparing two AI-generated responses and assessing them based on criteria like truthfulness, instruction-following, verbosity, and overall quality. Some tasks also involve evaluating how well the model uses information from provided PDFs.
  • Coding Evaluations: Similar to standard evaluations but focused on code. These tasks involve checking responses for correctness, documentation quality, and performance issues.
  • "Safety"-Oriented Tasks: Reviewing potentially adversarial prompts and determining whether the model’s responses align with safety guidelines, such as refusing harmful requests like generating bomb instructions.
  • Conversational Evaluations: Engaging with the model directly, labeling parts of a conversation (e.g., summarization or open Q&A), and rating its responses based on simpler criteria than the other task types.

Recently, I have been questioning the ethics of this work. The models I work with are not cutting-edge, but improving them could still contribute to AI arms race dynamics. The platform is operated by Google, which might place more emphasis on safety compared to OpenAI, though I do not have enough information to be sure. Certain tasks, such as those aimed at helping models distinguish between harmful and benign responses, seem like they could be geared towards applying RLHF and are conceivably net-positive. Others, such as comparing model performance across a range of tasks, might be relevant to interpretability, but I am less certain about this.

Since I lean utilitarian, I have considered offsetting potential harm by donating part of my earnings to AI safety organizations. At the same time, if the work is harmful enough on balance, I would rather stop altogether. Another option would be to focus only on tasks that seem clearly safety-related or low-risk, though this would likely mean earning less, which could reduce prospective donations.