Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Conjecture is a new alignment startup founded by Connor Leahy, Sid Black and Gabriel Alfour, which aims to scale alignment research. We have VC backing from, among others, Nat Friedman, Daniel Gross, Patrick and John Collison, Arthur Breitman, Andrej Karpathy, and Sam Bankman-Fried. Our founders and early staff are mostly EleutherAI alumni and previously independent researchers like Adam Shimi. We are located in London.

As described in our announcement post, we are running an AMA this week-end, from Today (Saturday 9th April) to Sunday 10th of April. We will answer any question asked before the end of Sunday Anywhere on Earth. We might answer later questions, but no guarantees.

If you asked question on our announcement post, we would prefer that you repost them here if possible. Thanks!

Looking forward to your questions!

40 comments, sorted by Click to highlight new comments since: Today at 1:39 AM
New Comment

Your website says: "WE ARE AN ARTIFICIAL GENERAL INTELLIGENCE COMPANY DEDICATED TO MAKING AGI SAFE", and also "we are committed to avoiding dangerous AI race dynamics".

How are you planning to avoid exacerbating race dynamics, given that you're creating a new 'AGI company'? How will you prove to other AI companies—that do pursue AGI—that you're not competing with them?

Do you believe that most of the AI safety community approves of the creation of this new company? In what ways (if any) have you consulted with the community before starting the company?

To address the opening quote - the copy on our website is overzealous, and we will be changing it shortly. We are an AGI company in the sense that we take AGI seriously, but it is not our goal to accelerate progress towards it. Thanks for highlighting that.

We don’t have a concrete proposal for how to reliably signal that we’re committed to avoiding AGI race dynamics beyond the obvious right now. There is unfortunately no obvious or easy mechanism that we are aware of to accomplish this, but we are certainly open to discussion with any interested parties about how best to do so. Conversations like this are one approach, and we also hope that our alignment research speaks for itself in terms of our commitment to AI safety. 

If anyone has any more trust-inducing methods than us simply making a public statement and reliably acting consistently with our stated values (where observable), we’d love to hear about them!

To respond to the last question - Conjecture has been “in the making” for close to a year now and has not been a secret, we have discussed it in various iterations with many alignment researchers, EAs and funding orgs. A lot of initial reactions were quite positive, in particular towards our mechanistic interpretability work, and just general excitement for more people working on alignment. There have of course been concerns around organizational alignment, for-profit status, our research directions and the founders’ history with EleutherAI, which we all have tried our best to address.

But ultimately, we think whether or not the community approves of a project is a useful signal for whether a project is a good idea, but not the whole story. We have our own idiosyncratic inside-views that make us think that our research directions are undervalued, so of course, from our perspective, other people will be less excited than they should be for what we intend to work on. We think more approaches and bets are necessary, so if we would only work on the most consensus-choice projects we wouldn’t be doing anything new or undervalued. That being said, we don’t think any of the directions or approaches we’re tackling have been considered particularly bad or dangerous by large or prominent parts of the community, which is a signal we would take seriously.

I'm curious why you believe that having products will be helpful? A few particular considerations I would be interested to hear your take on:

  1. There seems to be abundant EA donor funding available from sources like FTX without the need for a product / for attracting non-EA investors
  2. Products require a large amount of resources to build/maintain
  3. Profitable products also are especially prone to accelerating race dynamics

To point 1: While we greatly appreciate what OpenPhil, LTFF and others do (and hope to work with them in the future!), we found that the hurdles required and strings attached were far greater than the laissez-faire silicon valley VC we encountered, and seemed less scalable in the long run. Also, FTX FF did not exist back when we were starting out.

While EA funds as they currently exist are great at handing out small to medium sized grants, the ~8 digit investment we were looking for to get started asap was not something that these kinds of orgs were generally interested in giving out (which seems to be changing lately!), especially to slightly unusual research directions and unproven teams. If our timelines were longer and the VC money had more strings attached (as some of us had expected before seeing it for ourselves!), we may well have gone another route. But the truth of the current state of the market is that if you want to scale to a billion dollars as fast as possible with the most founder control, this is the path we think is most likely to succeed.

To point 2: This is why we will focus on SaaS products on top of our internal APIs that can be built by teams that are largely independent from the ML engineering. As such, this will not compete much with our alignment-relevant ML work. This is basically our thesis as a startup: We expect it to be EV+, as this earns much more money than we would have had otherwise.    

Notice this is a contingent truth, not an absolute one. If tomorrow, OpenPhil and FTX contracted us with 200M/year to do alignment work, this would of course change our strategy.

To point 3: We don’t think this has to be true. (Un)fortunately, given the current pace of capability progress, we expect keeping up with the pace to be more than enough for building new products. Competition on AI capabilities is extremely steep and not in our interest. Instead, we believe that (even) the (current) capabilities are so crazy that there is an unlimited potential for products, and we plan to compete instead on building a reliable pipeline to build and test new product ideas.

Calling it competition is actually a misnomer from our point of view. We believe there is ample space for many more companies to follow this strategy, still not have to compete, and turn a massive profit. This is how crazy capabilities and their progress are.

Why did you decide to start a separate org rather than joining forces with an existing org? I'm especially curious since state-of-the-art models are time-consuming/compute-intensive/infra-intensive to develop, and other orgs with safety groups already have that infrastructure. Also, it seems helpful to have high communication bandwidth between people working on alignment, in a way that is impaired by having many different orgs (especially if the org plans to be non-disclosure by default). Curious to hear how you are thinking about these things!

We (the founders) have a distinct enough research agenda to most existing groups such that simply joining them would mean incurring some compromises on that front. Also, joining existing research orgs is tough! Especially if we want to continue along our own lines of research, and have significant influence on their direction. We can’t just walk in and say “here are our new frames for GPT, can we have a team to work on this asap?”.

You’re right that SOTA models are hard to develop, but that being said, developing our own models is independently useful in many ways - it enables us to maintain controlled conditions for experiments, and study things like scaling properties of alignment techniques, or how models change throughout training, as well as being useful for any future products. We have a lot of experience in LLM development and training from EleutherAI, and expect it not to take up an inordinate amount of developer hours.

We are all in favor of high bandwidth communication between orgs. We would love to work in any way we can to set these channels up with the other organizations, and are already working on reaching out to many people and orgs in the field (meet us at EAG if you can!).

In general, all the safety orgs that we have spoken with are interested in this, and that’s why we expect/hope this kind of initiative to be possible soon.

Are you trying to work on a thing that can discern the principles and implications of objective benevolence, if such principles exist? (If the question seems unclear: I mean this in a manner similar to how mathematicians can discern and explicitly and rigorously discuss the structure of Peano arithmetic.)

Relatedly: do you agree that the idea of "Alignment" is kind of a mediocre way to think about not getting killed by AGI, and might even be evil, because an "aligned AI" could be "aligned with the Devil" or "aligned with a psychotic dictator" or "aligned with... <anything, really>"?

In a deep sense: please talk about "how you think about" whether or how the most plausible versions of "moral realism" are, or are not, relevant to your project.

This is a genuinely difficult and interesting question that I want to provide a good answer for, but that might take me some time to write up, I'll get back to you at a later date.

I like that you didn't say something glib :-)

I worked as an algorithmic ethicist for a blockchain project for several years, and this was (arguably?) my central professional bedevilment. It doesn't really surprise me that you have a hard time with it... I asked it because it is The Tough One, and if you had an actually good answer then such an answer would (probably) count as "non-trivial research progress".

Will you be approachable for incubating less experienced people (for example student interns), or do you not want to take that overhead right now?

(I will be running the Incubator at Conjecture)

The goal for the incubator is to foster new conceptual alignment research bets that could go on to become full-fledged research directions, either at Conjecture or at other places. We’re thus planning to select mostly on the quality we expect for a very promising independent conceptual researcher, that is proactivity (see Paul Graham’s Relentlessly Resourceful post) and some interest or excitement about not fully tapped streams of evidence (see this recent post).

Although experience with alignment could help, it might also prove a problem if it comes with too strong ontological commitment and limits exploration of unusual research directions and ideas. The start of the program will include a lot of discussion and sharing a map of alignment and mental moves that I (Adam) have been building over the last few months, so this should bring people up to speed to do productive research.

If you have any more questions about this, feel free to reach me either on LW or at my Conjecture email.

How large do you expect Conjecture to become? What percent of people do you expect to be working on the product and what percentage to be working on safety? 

Ideally, we would like Conjecture to scale quickly. Alignment wise, in 5 years time, we want to have the ability to take a billion dollars and turn it into many efficient, capable, aligned teams of 3-10 people working on parallel alignment research bets, and be able to do this reliably and repeatedly. We expect to be far more constrained by talent than anything else on that front, and are working hard on developing and scaling pipelines to hopefully alleviate such bottlenecks.

For the second question, we don't expect it to be a competing force (as in, we have people who could be working on alignment working on product instead). See point two in this comment.


This is why we will focus on SaaS products on top of our internal APIs that can be built by teams that are largely independent from the ML engineering. As such, this will not compete much with our alignment-relevant ML work. This is basically our thesis as a startup: We expect it to be EV+, as this earns much more money than we would have had otherwise.    

Congratulations on your launch!

As Michaël Trazzi in the other post, I'm interested in the kind of products you'll develop, but more specifically in how the for-profit part interacts with both the conceptual research part and the incubator part. Are you expecting the latter two to yield new products as they make progress? Do these activities have different enough near-term goals that they mostly just coexist within Conjecture?

(also, looking forward to the pluralism sequence, this sounds great)

See the reply to Michaël for answers as to what kind of products we will develop (TLDR we don’t know yet).

As for the conceptual research side, we do not do conceptual research with product in mind, but we expect useful corollaries to fall out by themselves for sufficiently good research. We think the best way of doing fundamental research like this is to just follow the most interesting, useful looking directions guided by the “research taste” of good researchers (with regular feedback from the rest of the team, of course). I for one at least genuinely expect product to be “easy”, in the sense that AI is advancing absurdly fast and the economic opportunities are falling from the sky like candy, so I don’t expect us to need to frantically dedicate our research to finding worthwhile fruit to pick.

The incubator has absolutely nothing to do with our for profit work, and is truly meant to be a useful space for independent researchers to develop their own directions that will hopefully be maximally beneficial to the alignment community. We will not put any requirements or restrictions on what the independent researchers work on, as long as it is useful and interesting to the alignment community.

How do you differ from Redwood?

Redwood is doing great research, and we are fairly aligned with their approach. In particular, we agree that hands-on experience building alignment approaches could have high impact, even if AGI ends up having an architecture unlike modern neural networks (which we don’t believe will be the case). While Conjecture and Redwood both have a strong focus on prosaic alignment with modern ML models, our research agenda has higher variance, in that we additionally focus on conceptual and meta-level research. We’re also training our own (large) models, but (we believe) Redwood are just using pretrained, publicly available models. We do this for three reasons:

  1. Having total control over the models we use can give us more insights into the phenomena we study, such as training models at a range of sizes to study scaling properties of alignment techniques.
  2. Some properties we want to study may only appear in close-to-SOTA models - most of which are private.
  3. We are trying to make products, and close-to-SOTA models help us do that better. Though as we note in our post, we plan to avoid work that pushes the capabilities frontier.

We’re also for-profit, while Redwood is a nonprofit, and we’re located in London! Not everyone lives out in the Bay :)

Do you expect interpretability tools developed now to extend to interpreting more general (more multimodal, better at navigating the real world) decision-making systems? How?

Yes, we do expect this to be the case. Unfortunately, I think explaining in detail why we think this may be infohazardous. Or at least, I am sufficiently unsure about how infohazardous it is that I would first like to think about it for longer and run it through our internal infohazard review before sharing more. Sorry!

Are you planning to be in-person or have some folks working remotely? Other similar safety orgs don't seem that flexible with in-person requirements, so it'd be nice to have a place for alignment work for those outside of {SF, London}

We strongly encourage in person work - we find it beneficial to be able to talk over or debate research proposals in person at any time, it’s great for the technical team to be able to pair program or rubber duck if they’re hitting a wall, and all being located in the same city has a big impact on team building.

That being said, we don’t mandate it. Some current staff want to spend a few months a year with their families abroad, and others aren’t able to move to London at all. While we preferentially accept applicants who can work in person, we’re flexible, and if you’re interested but can’t make it to London, it’s definitely still worth reaching out.

What guarantees that, in case you happen to be the first to build an interpretable aligned AGI, Conjecture, as an organization wielding a newly acquired immense power, stays aligned with the best interests of humanity?

For the record, having any person or organization in this position would be a tremendous win. Interpretable aligned AGI?! We are talking about a top .1% scenario here! Like, the difference between egoistical Connor vs altruistic Connor with an aligned AGI in his hands is much much smaller than Connor with an aligned AGI and anyone, any organization or any scenario, with a misaligned AGI.

But let’s assume this.

Unfortunately, there is no actual functioning reliable mechanism by which humans can guarantee their alignment to each other. If there was something I could do that would irreversibly bind me to my commitment to the best interests of mankind in a publicly verifiable way, I would do it in a heartbeat. But there isn’t and most attempts at such are security theater. 

What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.

On a meta-level, I think the best guarantee I can give is simply that not acting in humanity’s best interest is, in my model, Stupid. And my personal guiding philosophy in life is “Don’t Be Stupid”. Human values are complex and fragile, and while many humans disagree about many details of how they think the world should be, there are many core values that we all share, and not fighting with everything we’ve got to protect these values (or dying with dignity in the process) is Stupid.

Thank you for your answer.

I have very high confidence that the *current* Connor Leahy will act towards the best interests of humanity, however, given the extraordinary amount of power an AGI can provide, confidence in this behavior staying the same for decades or centuries (directing some of the AGIs resources towards radical human life extension seems logical) to come is much less.

Another question in case you have time - considering the same hypothetical situation of Conjecture being first to develop an aligned AGI, do you think that immediately applying its powers to ensure no other AGIs can be constructed is the correct behavior to maximize humanity's chances of survival?

What I can do is point to my history of acting in ways that, I hope, show my consistent commitment to doing what is best for the longterm future (even if of course some people with different models of what is “best for the longterm future” will have legitimate disagreements with my choices of past actions), and pledge to remain in control of Conjecture and shape its goals and actions appropriately.

Sorry, do you mean that you are actually pledging to "remain in control of Conjecture"? Can some other founder(s) make that pledge too if it's necessary for maintaining >50% voting power?

Will you have the ability to transfer full control over the company to another individual of your choice in case it's necessary? (Larry Page and Sergey Brin, for example, are seemingly limited in their ability to transfer their 10x-voting-power Alphabet shares to others).

There are no guarantees in the affairs of sentient beings, I’m afraid.

I'm a donor interested in giving money to AI safety work, on the order of $100k-$1M right now and possibly more in the future. Are you looking for donations or do you know anyone who is looking for donations?

I think this is something better discussed in private. Could you DM me? Thanks!

How much time do you think you have before your investors expect/require you to turn a profit?

Congratulations on your new venture for a great cause!

Our current plan is to work on foundational infrastructure and models for Conjecture’s first few months, after which we will spin up prototypes of various products that can work with a SaaS model. After this, we plan to try them out and productify the most popular/useful ones.

More than profitability, our investors are looking for progress. Because of the current pace of progress, it would not be smart from their point of view to settle on a main product right now. That’s why we are mostly interested in creating a pipeline that lets us build and test out products flexibly.

Do you ever plan on collaborating with researchers in academia, like DeepMind and Google Brain often do? What would make you accept or seek such external collaboration?

We would love to collaborate with anyone (from academia or elsewhere) wherever it makes sense to do so, but we honestly just do not care very much about formal academic publication or citation metrics or whatever. If we see opportunities to collaborate with academia that we think will lead to interesting alignment work getting done, excellent!

Before someone points this out: Non-disclosure-by-default is a negative incentive for the academic side, if they care about publication metrics. 

It is not a negative incentive for Conjecture in such an arrangement, at least not in an obvious way.

What is the reasoning behind non-disclosure by default? It seems opposite to what EleutherAI does.

See a longer answer here.

TL;DR: For the record, EleutherAI never actually had a policy of always releasing everything to begin with and has always tried to consider each publication’s pros vs cons. But this is still a bit of change from EleutherAI, mostly because we think it’s good to be more intentional about what should or should not be published, even if one does end up publishing many things. EleutherAI is unaffected and will continue working open source. Conjecture will not be publishing ML models by default, but may do so on a case by case basis. 

Our decision to open-source and release the weights of large language models was not a haphazard one, but was something we thought very carefully about. You can read my short post here on our reasoning behind releasing some of our models. The short version is that we think that the danger of large language models comes from the knowledge that they’re possible, and that scaling laws are true. We think that by giving researchers access to the weights of LLMs, we will aid interpretability and alignment research more than we will negatively impact timelines. At Conjecture, we aren’t against publishing, but by making non-disclosure the default, we force ourselves to consider the long-term impact of each piece of research and have a better ability to decide not to publicize something rather than having to do retroactive damage control.

Your website says:

We want to build tools and frameworks to make interpretability with neural nets more accessible, and to help reframe conceptual problems in concrete terms.

Will you make your tools and frameworks open source so that, in addition to helping advance the work of your own researchers, they can help independent interpretability researchers and those working in other groups as well?

Probably. It is likely that we will publish a lot of our interpretability work and tools, but we can’t commit to that because, unlike some others, we think it’s almost guaranteed that some interpretability work will lead to very infohazardous outcomes. For example, obvious ways in which architectures could be trained more efficiently, and as such we need to consider each result on a case by case basis. However, if we deem them safe, we would definitely like to share as many of our tools and insights as possible.

We are located in London.

Great! Is there a co-working space or something? If so, where? Also, are you planning to attend EAG London as a team?

We currently have a (temporary) office in the Southwark area, and are open to visitors. We’ll be moving to a larger office soon, and we hope to become a hub for AGI Safety in Europe.

And yes! Most of our staff will be attending EAG London. See you there? 

Ya I'll be there so I'd be glad to see you, especially Adam!