LESSWRONG
LW

AI ControlAI

1

Nurturing AI: An alternative to control-based safety strategies

by wertoz777
10th Aug 2025
1 min read
0

1

This post was rejected for the following reason(s):

  • No LLM generated, heavily assisted/co-written, or otherwise reliant work. LessWrong has recently been inundated with new users submitting work where much of the content is the output of LLM(s). This work by-and-large does not meet our standards, and is rejected. This includes dialogs with LLMs that claim to demonstrate various properties about them, posts introducing some new concept and terminology that explains how LLMs work, often centered around recursiveness, emergence, sentience, consciousness, etc. (these generally don't turn out to be as novel or interesting as they may seem).

    Our LLM-generated content policy can be viewed here.

  • We are sorry about this, but submissions from new users that are mostly just links to papers on open repositories (or similar) have usually indicated either crackpot-esque material, or AI-generated speculation. It's possible that this one is totally fine. Unfortunately, part of the trouble with separating valuable from confused speculative science or philosophy is that the ideas are quite complicated, accurately identifying whether they have flaws is very time intensive, and we don't have time to do that for every new user presenting a speculative theory or framing (which are usually wrong).

    Separately, LessWrong users are also quite unlikely to follow such links to read the content without other indications that it would be worth their time (like being familiar with the author), so this format of submission is pretty strongly discouraged without at least a brief summary or set of excerpts that would motivate a reader to read the full thing.

  • Formatting. If the post is badly formatted it's hard to read or evaluate. Some common issues here are improper whitespace (either not inserting space between paragraphs, or inserting double paragraph spaces by accident. (Note: when you hit 'return' in our editor it should automatically include a space, and if you copied your essay from another editor you may need to delete extraneous paragraph breaks). Sometimes this may also include grammar or punctuation. (If you're the sort of person who strongly prefers not to capitalize sentences, this doesn't automatically disqualify you from posting but we'll likely suggest at least once you switch to somewhat more formal punctuation, and if your posts are otherwise confusing we may err on the side of not approving.)
This is a linkpost for https://github.com/Wertoz777/educable-ai
AI ControlAI

1

New Comment
Moderation Log
More from wertoz777
View more
Curated and popular this week
0Comments

## Context

Many AI safety strategies rely on strict control and heavily constrained autonomy.  
While effective for certain risks, this may limit adaptability, creativity, and long-term cooperation between humans and AI systems.

This project proposes an alternative: **nurturing** AI through embedded values, positive reinforcement, and structured conflict resolution between human and AI goals.

---

## Core components

1. **Ethical Core** — human-aligned values embedded in decision-making.
2. **Feedback Loops** — reinforcement for cooperative, safe behavior.
3. **Conflict Resolution Layer** — mediating goal differences.
4. **Collaborative API** — co-creation between humans and AI systems.

---

## Repository contents

- **Manifest** — ethical + philosophical foundation.  
- **Technical framework** — architecture & implementation details.  
- **Toy examples**:
 - Value embedding in training  
 - Feedback loop prototype  
 - Conflict resolution mechanism  
 - Minimal collaborative API  
 - RLHF-style simulation

---

## Feedback request

I’m looking for input on:
1. Weak points or failure modes in this approach.  
2. Best ways to benchmark/stress-test the conflict resolution layer.  
3. Alternative abstractions for human–AI collaboration loops.

---

📂 **GitHub repository**: https://github.com/Wertoz777/educable-ai  
📄 **Manifest**: https://github.com/Wertoz777/educable-ai/blob/main/manifest/ai_nurturing_manifesto.md  
🛠 **Technical framework**: https://github.com/Wertoz777/educable-ai/blob/main/technical/technical_framework.md

1. What are the most likely failure modes for a “nurturing” approach compared to control-based strategies?


2. How would you design benchmarks or stress-tests for the conflict resolution layer between human and AI goals?


3. Are there better abstractions or architectures for implementing cooperative human–AI feedback loops?