Having novel approaches to alignment research seems like it could really help the field at this still-early stage. Thanks for creating a program specifically designed to foster this.

Reply

[-]adamShimi3y40

Does Conjecture/Refine work with anyone remotely or is it all in person?

By default Conjecture is all in person, although right now for a bunch of administrative and travelling reasons we are more disseminated. For Refine it will be in person the whole time. Actually, ensuring that is one big reason we're starting in France (otherwise it would need to be partly remote for administrative reasons)

Having novel approaches to alignment research seems like it could really help the field at this still-early stage. Thanks for creating a program specifically designed to foster this.

You're welcome. ;)

Reply

[-]Charlie Steiner3y*Ω562

I'll be interested in the results! First-principles reasoning being kinda hard, I'm curious how much people are going to try to chew bite-sized pieces vs. try to absorb a ball of energy bigger than their head.

Reply

[-]adamShimi3yΩ330

Yeah, I will be posting updates, and probably the participants themselves will post some notes and related ideas. Excited too about how it's going to pan out!

Reply

[-]Michael Soareverix3y40

I'm someone new to the field, and I have a few ideas on it, namely penalizing a model for accessing more compute than it starts with (every scary AI story seems to start with the AI escaping containment and adding more compute to itself, causing an uncontrolled intelligence explosion). I'd like feedback on the ideas, but I have no idea where to post them or how to meaningfully contribute.

I live in America, so I don't think I'll be able to join the company you have in France, but I'd really like to hear where there are more opportunities to learn, discuss, formalize, and test out alignment ideas. As a company focused on this subject, is there a good place for beginners?

Reply

[-]adamShimi3y51

Thanks for your comment!

Probably the best place to get feedback as a beginner is AI Safety Support. They can also redirect you towards relevant programs, and they have a nice alignment slack.

As for your idea, I can give you quick feedback on my issues with this whole class of solutions. I'm not saying you haven't thought about these issues, nor that no solution in this class is possible at all, just giving the things I would be wary of here:

How do you limit the compute if the AI is way smarter than you are?
Assuming that you can limit the compute, how much compute do you give it? Too little and it's not competitive, leading many people to prefer alternatives without this limit; too much and you're destroying the potential guarantees.
Even if there's a correct and safe amount of compute to give for each task, how do you compute that amount? How much time and resources does it cost?

Reply

[-]p.b.3y30

Could you maybe add a paragraph (or comment) how exactly you define "conceptual" alignment research? What would be an example of alignment research that is not conceptual?

Reply

[-]adamShimi3y30

Maybe I should have added this link. ;)

Reply

[-]adamShimi3y30

Basically the distinction is relevant because there are definitely more and more people working on alignment, but the vast majority of the increase actually doesn't focus on formulating solution or deconfusing the main notions; instead they mostly work on (often relevant) experiments and empirical questions related to alignment.

Reply

[-]p.b.3y10

I see many incorrect assumptions about what it takes to be a good conceptual researcher floating around [...] you can pick up the relevant part [of ML] and just work on approaches different to pure prosaic alignment

This seemed to imply that you might be a conceptual alignment researcher, but also work on pure prosaic alignment, which was the point were I thought: Ok, maybe I don't know what "conceptual alignment research" means. But the link definitely clears it up, thank you!

Reply

[-]adamShimi3y20

Yeah, I see how it can be confusing. To give an example, Paul Christiano focuses on prosaic alignment (he even coined the term) yet his work is mostly on the conceptual side. So I don't see the two as in conflict.

Reply

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

87

How to Diversify Conceptual Alignment: the Model Behind Refine

87

Ω 29

87

Ω 29

The Problem: Not Enough Varied Conceptual Research

Description of Refine

Research Incubator

Generalist Mentors

Selection and Respect

Difference with Other Programs

Some Concrete Details

The Long View: Refine and Conjecture