Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Conjecture is a new alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research. We have VC backing from, among others, Nat Friedman, Daniel Gross, Patrick and John Collison, Arthur Breitman, Andrej Karpathy, and Sam Bankman-Fried. Our founders and early staff are mostly EleutherAI alumni and previously independent researchers like Adam Shimi. We are located in London.

Of the options we considered, we believe that being a for-profit company with products[1] on the market is the best one to reach our goals. This lets us scale investment quickly while maintaining as much freedom as possible to expand alignment research. The more investors we appeal to, the easier it is for us to select ones that support our mission (like our current investors), and the easier it is for us to guarantee security to alignment researchers looking to develop their ideas over the course of years. The founders also retain complete control of the company.

We're interested in your feedback, questions, comments, and concerns. We'll be hosting an AMA on the Alignment Forum this weekend, from Saturday 9th to Sunday 10th, and would love to hear from you all there. (We'll also be responding to the comments thread here!)

Our Research Agenda

We aim to conduct both conceptual and applied research that addresses the (prosaic) alignment problem. On the experimental side, this means leveraging our hands-on experience from EleutherAI to train and study state-of-the-art models without pushing the capabilities frontier. On the conceptual side, most of our work will tackle the general idea and problems of alignment like deception, inner alignment, value learning, and amplification, with a slant towards language models and backchaining to local search.

Our research agenda is still actively evolving, but some of the initial directions are: 

  • New frames for reasoning about large language models:
    • What: Propose and expand on a frame of GPT-like models as simulators of various coherent text-processes called simulacra, as opposed to goal-directed agents (upcoming sequence to be published on the AF, see this blogpost for preliminary thoughts).
    • Why: Both an alternative perspective on alignment that highlights different questions, and a high-level model to study how large language models will scale and how they will influence AGI development.
  • Scalable mechanistic interpretability: 
    • What: Mechanistic interpretability research in a similar vein to the work of Chris Olah and David Bau, but with less of a focus on circuits-style interpretability  and more focus on research whose insights can scale to models with many billions of parameters and larger. Some example approaches might be: 
      • Locating and editing factual knowledge in a transformer language model.
      • Using deep learning to automate deep learning interpretability - for example, training a language model to give semantic labels to neurons or other internal circuits.
      • Studying the high-level algorithms that models use to perform e.g, in-context learning or prompt programming.
    • Why: Provide tools to implement alignment proposals on neural nets, and insights that reframe conceptual problems in concrete terms.
  • History and philosophy of alignment:
    • What: Map different approaches to alignment, translate between them, explore ideas that were abandoned too fast, and propose new exciting directions (upcoming sequence on pluralism in alignment to be published on the AF).
    • Why: Help alignment research become even more pluralist while still remaining productive. Understanding historical patterns helps put our current paradigms and assumptions into perspective.

We target the Alignment Forum as our main publication outlet, and aim to regularly publish posts there and interact with the community through it. That being said, our publication model is non-disclosure-by-default, and every shared work will go through an internal review process out of concern for infohazards.

In addition to this research, we want to create a structure hosting externally funded independent conceptual researchers, managed by Adam Shimi. It will also include an incubator for new conceptual alignment researchers to propose and grow their own research directions.

How We Fit in the Ecosystem

Our primary goal at Conjecture is to conduct prosaic alignment research which is informed by the ultimate problem of aligning superintelligence. We default to short timelines, generally subscribe to the scaling hypothesis, and believe it is likely that the first AGI will be based on modern machine-learning architectures and learning methods. 

We believe that combining conceptual research, applied research, and hosting independent researchers into one integrated organization is a recipe for making promising untapped research bets, fostering collaboration between the more high concept work and the experimental side, and truly scaling alignment research.

Among the other existing safety orgs, we consider ourselves closest in spirit to Redwood Research in that we intend to focus primarily on (prosaic) alignment questions and embrace the unusual epistemology of the field. Our research agenda overlaps in several ways with Anthropic, especially in our acceptance of the Scaling Hypothesis and interest in mechanistic interpretability, but with more emphasis on conceptual alignment. 

We Are Hiring!

If this sounds like the kind of work you’d be interested in, please reach out!

We are always looking to hire more engineers and researchers. At the time of writing, we are particularly interested in hiring devops and infrastructure engineers with supercomputing experience, and are also looking for one to two fullstack/frontend webdevs, preferably with data visualization experience. We are located in London and pay is competitive with FAANG. If you have experience with building, serving, and tuning large scale ML models and experiments, or have done interesting alignment theory work, we’d love to hear from you. We also accept Alignment Forum posts as applications!

We will open applications for the incubator in about a month, and are interested in hearing from any funded independent conceptual researcher who would like to be hosted by us.

If you don’t fit these descriptions but would like to work with us, please consider reaching out anyways if you think you have something interesting to bring to the table.

And if you’re around London and would like to meet, feel free to drop us an email as well!

  1. ^

    We will ask for feedback from many researchers in the community to gauge the risks related to these products before releasing them.

New to LessWrong?

New Comment
25 comments, sorted by Click to highlight new comments since: Today at 3:21 AM

Great news. What kind of products do you plan on releasing?

We aren’t committed to any specific product or direction just yet (we think there are many low hanging fruit that we could decide to pursue). Luckily we have the  independence to be able to initially spend a significant amount of time focusing on foundational infrastructure and research. Our product(s) could end up as some kind of API with useful models, interpretability tools or services, some kind of end-to-end SaaS product or something else entirely. We don’t intend to push the capabilities frontier, and don’t think this would be necessary to be profitable.

Glad to see a new Alignment research lab in Europe. Good luck with the start and the hiring!

I'm wondering, you're saying: 

That being said, our publication model is non-disclosure-by-default, and every shared work will go through an internal review process out of concern for infohazards.

That's different from Eleuther's position[1]. Is this a change of mind or a different practice due to the different research direction? Will you continue open-sourcing your ML models?

  1. ^

    "A grassroots collective of researchers working to open source AI research."

TL;DR: For the record, EleutherAI never actually had a policy of always releasing everything to begin with and has always tried to consider each publication’s pros vs cons. But this is still a bit of change from EleutherAI, mostly because we think it’s good to be more intentional about what should or should not be published, even if one does end up publishing many things. EleutherAI is unaffected and will continue working open source. Conjecture will not be publishing ML models by default, but may do so on a case by case basis.

Longer version:

First of all, Conjecture and EleutherAI are separate entities. The policies of one do not affect the other. EleutherAI will continue as it has. 

To explain a bit of what motivated this policy: We ran into some difficulties when handling infohazards at EleutherAI. By the very nature of a public open source community, infohazard handling is tricky to say the least. I’d like to say on the record that I think EAI actually did an astoundingly good job not pushing every cool research or project discovery we encountered, for what it is. However, there are still obvious limitations to how well you can contain information spread in an environment that open.

I think the goal of a good infohazard policy should not be to make it as hard as possible to publish information or talk to people about your ideas to limit the possibility of secrets leaking, but rather to make any spreading of information more intentional. You can’t undo the spreading of information, it’s a one-way street. As such, the “by-default” component is what I think is important to allow actual control over what gets out and what not. By having good norms around not immediately sharing everything you’re working on or thinking about widely, you have more time to deliberate and consider if keeping it private is the best course of action. And if not, then you can still publish. 

That’s the direction we’re taking things with Conjecture. Concretely, we are working on writing a well thought out infohazard policy internally, and plan to get the feedback of alignment researchers outside of Conjecture on whether each piece of work should or should not be published.

We have the same plan with respect to our models, which we by default will not release. However, we may choose to do so on a case by case basis and with feedback from external alignment researchers. While this is different from EleutherAI, I’d note that EAI does not, and has never, advocated for literally publishing anything and everything all the time as fast as possible. EAI is a very decentralized organization, and many people associated with the name work on pretty different projects, but in general the projects EAI chooses to do are informed by what we considered net good to be working on publicly (e.g. EAI would not release a SOTA-surpassing, or unprecedentedly large model). This is a nuanced point about EAI policy that tends to get lost in outside communication. 

We recognize that Conjecture’s line of work is infohazardous. We think it’s almost guaranteed that when working on serious prosaic alignment you will stumble across capabilities increasing ideas (one could argue one of the main constraints on many current models' usefulness/power is precisely their lack of alignment, so incremental progress could easily remove bottlenecks), and we want to have the capacity to handle these kinds of situations as gracefully as possible. 

Thanks for your question and giving us the chance to explain!

Thanks for the thoughtful response, Connor.

I'm glad to hear that you will develop a policy and won't be publishing models by default.

How are you negotiating EleutherAI participation? Or are you just done with EAI now?

EAI has always been a community-driven organization that people tend to contribute to in their spare time, around their jobs. I for example have had a dayjob of one sort or another for most of EAI’s existence. So from this angle, nothing has changed aside from the fact my job is more demanding now.

Sid and I still contribute to EAI on the meta level (moderation, organization, deciding on projects to pursue), but do admittedly have less time to dedicate to it these days. Thankfully, Eleuther is not just us - we have a bunch of projects going on at any one time, and progress for EAI doesn’t seem to be slowing down.

We are still open to the idea of releasing larger models with EAI, and funding may happen, but it’s no longer our priority to pursue that, and the technical lead of that project (Sid) has much less time to dedicate to it.

Conjecture staff will occasionally contribute to EAI projects, when we think it’s appropriate.

Let us know if/when you're in the Bay, would be good to meet people on your team :)

Thanks - we plan to visit the Bay soon with the team, we’ll send you a message! 

I look forward to it.

Cool! Are you planning to be in-person or have some folks working remotely? Other similar safety orgs don't seem that flexible with in-person requirements, so it'd be nice to have a place for alignment work for those outside of {SF, London}

How do you differ from Redwood?

One thing is that it seems like they are trying to build some of the world’s largest language models (“state of the art models”)

Congratulations!  Can you say if there will be a board, and if so who will start on it?

Currently, there is only one board position, which I hold. I also have triple vote as insurance if we decide to expand the board. We don’t plan to give up board control.

This is lovely! I’ve a couple questions (will post them in the AMA as well if this is not a good place to ask)

  1. What is the reasoning behind non-disclosure by default? It seems opposite to what EleutherAI does.

  2. Will you be approachable for incubating less experienced people (for example student interns), or do you not want to take that overhead right now?

The founders also retain complete control of the company.

Can you say more about that? Will shareholders not be able to sue the company if it acts against their financial interests? If Conjecture will one day become a public company, is it likely that there will always be a controlling interest in the hands of few individuals?

[...] to train and study state-of-the-art models without pushing the capabilities frontier.

Do you plan to somehow reliably signal to AI companies—that do pursue AGI—that you are not competing with them? (In order to not exacerbate race dynamics).

The founders have a supermajority of voting shares and full board control and intend to hold on to both for as long as possible (preferably indefinitely). We have been very upfront with our investors that we do not want to ever give up control of the company (even if it were hypothetically to go public, which is not something we are currently planning to do), and will act accordingly.

For the second part, see the answer here.

Thanks Connor RE: if you’re around London and would like to meet, feel free to drop us an email as well!

Best regards 

Mark @ 

Is Conjecture open to the idea of funding PhD fellowships for research in alignment and related topics? I think society will look back and see work in alignment as being very crucial in getting machines (which are growing impressively more intelligent quite quickly) to cooperate with humans.

Excited to hear that some at EleutherAI are working on alignment next (GPT-J & -Neo work were quite awesome). 

What do you mean by Scaling Hypothesis? Do you believe extremely large transformer models trained based on autoregressive loss will have superhuman capabilities?

Can't answer the second question, but see for the first.