Emile Delcourt, David Baek, Adriano Hernandez, Erik Nordby with advising from Apart Lab Studio
Introduction & Problem Statement
Helpful, Harmless, and Honest (”HHH”, Askell 2021) is a framework for aligning large language models (LLMs) with human values and expectations. In this context, "helpful" means the model strives to assist users in achieving their legitimate goals, providing relevant information and useful responses. "Harmless" refers to avoiding generating content that could cause damage, such as instructions for illegal activities, harmful misinformation, or content that perpetuates bias. "Honest" emphasizes transparency about the model's limitations and uncertainties—acknowledging when it doesn't know something, avoiding fabricated information, and clearly distinguishing between facts and opinions. This framework serves as both a... (read 6496 more words →)