Rafael Cosman — LessWrong

I'm curious about if a good "hero-GPT" or "alignment-research-support-GPT" could be useful today or with slightly improved tech. Of course having something like this run autonomously is not without risk, but might be quite valuable/important in the sub-critical AI era.

Creating a truly formidable Art

Rafael Cosman3y10

Hey Valentine, I really like this post. I think it hits on some key things that traditional LW culture was missing for a while. Was wondering if you've ever encountered The Conscious Leadership Group (https://conscious.is/)- they explicitly train some techniques similar to what you're describing here (as well as some quite different ones).

Prototype of Using GPT-3 to Generate Textbook-length Content

Rafael Cosman3y10

Cool, thanks for sharing! Hadn't heard of Metaphor before.

Prototype of Using GPT-3 to Generate Textbook-length Content

Rafael Cosman3y10

I might be able to code up an 'editing' pass to catch things like that!

Prototype of Using GPT-3 to Generate Textbook-length Content

Rafael Cosman3y10

Consider using reversible automata for alignment research

Rafael Cosman3y10

Have spent some time playing with reversible CAs, and can confirm that they are very interesting. They are a great example of how provable high-level properties (things like conservation of gliders) can come out of low level properties (reversibility).

Jailbreaking ChatGPT on Release Day

Rafael Cosman3y22

This is absolutely hilarious, thank you for the post.

What is wrong with this approach to corrigibility?

Rafael Cosman3y10

Great answer, thanks!

What an actually pessimistic containment strategy looks like

Rafael Cosman3y10

Thanks for the post! I think asking AI Capabilities researchers to stop is pretty reasonable, but I think we should be especially careful not to alienate the people closest to our side. E.g. consider how the Protestants and Catholics fought even though they agree on so much.

I like focusing on our common ground and using that to win people over.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments