Emile Delcourt, David Baek, Adriano Hernandez, Erik Nordby with advising from Apart Lab Studio Introduction & Problem Statement Helpful, Harmless, and Honest (”HHH”, Askell 2021) is a framework for aligning large language models (LLMs) with human values and expectations. In this context, "helpful" means the model strives to assist users...
Every year I try to do Advent of Code, but this year it's only going to run through December 12th. So this year, I'm going to try making my own challenges/readings with an AI safety/security flair.
I just thought to do this last night and didn't have much extra time today, so I'm still figuring out what challenges/readings I'm going to do. Any suggestions for quick (preferably sub-1 hour) tasks are greatly welcome!
I'll be posting regular-ish updates (probably every 2 or 3 days). I'll also try standing up a fully fledged repo later. For now, I have a simple notebook that you can check out here. Again, any feedback or... (read more)
Emile Delcourt, David Baek, Adriano Hernandez, Erik Nordby with advising from Apart Lab Studio
Helpful, Harmless, and Honest (”HHH”, Askell 2021) is a framework for aligning large language models (LLMs) with human values and expectations. In this context, "helpful" means the model strives to assist users in achieving their legitimate goals, providing relevant information and useful responses. "Harmless" refers to avoiding generating content that could cause damage, such as instructions for illegal activities, harmful misinformation, or content that perpetuates bias. "Honest" emphasizes transparency about the model's limitations and uncertainties—acknowledging when it doesn't know something, avoiding fabricated information, and clearly distinguishing between facts and opinions. This framework serves as both a... (read 6496 more words →)