AI Safety is a group project for us all. We need everyone to participate - the ESFPs to the INTJs!

Capturing the essence and subtleties of core values needs input across a broad span of humanity.

Assumption 1 - large language models will be the basis of AGI.

Assumption 2 - One way to add the abstraction of a value like "kindness is good" into the model is to add a large corpus of written material on Kindness during training (or retraining).

The Kindness Project is a website with a prompt, like a college essay. Users add their stories to the open collection based on the prompt: "Tell a story about how you impacted or were impacted by someone being kind". This prompt is translated for all languages to maximize input.

The end goal is that there is a large and detailed system of nodes in the model around the abstraction of Kindness that represents our experiences.

There would be sister projects based around other values like Wisdom, Integrity, Compassion, etc.

The project incentivizes participation through contests, random drawings, partner projects with schools, etc.

Submissions are filtered for plagiarism, duplicates, etc.

Documents are auto-linked back to reddit for inclusion in language model document scrapers.

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 1:08 AM

Where does one click to participate?

If the site isn't built yet, what's the benefit of DIYing it over simply creating a subreddit for it and doing it natively on the platform to guarantee scraper inclusion?

I'm soliciting input from people with more LLM experience to tell me why this naive idea will fail.  I'm hoping it's not in the category of "not even wrong".  If there's a 2%+ shot this will succeed, i'll start coding.

From what I gather, the scrapers look for links on reddit to external text files.  I could also collate submissions, zip them and upload to github/IPFS.  Which ever format is easiest for inclusion into a Pile.

I'm genuinely not sure how useful this would be. So I think we should maybe try to think about some high-value information that you might try to learn.

The way I imagine this might be useful is in trying to do near to medium term AI alignment on language models. Then having a lot of highly ethical text lying around might be good data to learn from. But if the AI is clever, it might not need specially labeled examples that really spell out the ethical implications - it might be able to learn about humans while observing more complicated situations.

Also, I'm personally skeptical that fine-tuning only on object-level ethics-relevant text is what we need to work on in the near term. At the very least, I'm interested in trying to learn and apply human "meta-preferences" - our preferences about how we would want an observer to think of our preferences, what we wish we were like, how we go about experiencing moral change and growth, times we've felt misunderstood, that sort of thing.

But I say this in spite of people actively working on this sort of thing at places like the Allen Institute for AI and Redwood Research. So the opinions of other people are definitely important here - it's not the average opinion that counts, it's the opinion of whoever's most excited.

New to LessWrong?