Do No Harm? Navigating and Nudging AI Moral Choices
TL;DR: How do AI systems make moral decisions, and can we influence their ethical judgments? We probe these questions by examining Llama's 70B (3.1 and 3.3) responses to moral dilemmas, using Goodfire API to steer its decision-making process. Our experiments reveal that simply reframing ethical questions - from "harm one...