Understanding Conjecture: Notes from Connor Leahy interview

Orpheus16

I recently listened to Michaël Trazzi interview Connor Leahy (co-founder & CEO of Conjecture, a new AI alignment organization) on a podcast called The Inside View. Youtube video here; full video & transcript here.

The interview helped me better understand Connor’s worldview and Conjecture’s theory of change.

I’m sharing my notes below. The “highlights” section includes the information I found most interesting/useful. The "full notes" section includes all of my notes.

Disclaimer #1: I didn’t take notes on the entire podcast. I selectively emphasized the stuff I found most interesting. Note also that these notes were mostly for my understanding, and I did not set out to perfectly or precisely capture Connor’s views.

Disclaimer #2: I’m always summarizing Connor (even when I write with “I” or “we”— the “I” refers to Connor). I do not necessarily endorse or agree with any of these views.

Highlights

Timelines

20-30% in the next 5 years. 50% by 2030. 99% by 2100. 1% we already have it (but don’t know this yet). Higher uncertainty than Eliezer but generally buys the same arguments. Is mostly like “Eliezer’s arguments seem right but how can anyone be so confident about things?”

Thoughts on MIRI Dialogues & Eliezer’s style

An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them. So they need to be communicated in a special way. Moral intuitions. Truths about yourself. A psychologist doesn’t just tell you “yo, you’re fucked up bro.” That doesn’t work.
A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.
What happened in the MIRI dialogues is that Eliezer was telling Paul “hey, I’m trying to communicate an antimeme to you, but I’m failing because it’s really really hard.”

Thoughts on Death with Dignity & optimizing for “dignity points” rather than utility

The Death with Dignity post is a perfect example of an antimeme. A great way to convey antimemes is through jokes and things outside the Overton Window.
The antimeme is that utilitarianism is hard, and no, it’s not actually a good idea to advocate for really stupid “pivotal acts” that sound ridiculous. Consequentialism is really hard. I have to reason about all of my possible choices and all of their possible consequences. If you have an infinitely big brain, this works. If not, it doesn’t.
It’s too computationally hard to be a perfect consequentialist. And being an imperfect consequentialist is really really bad. If you do one step of reasoning, you might be like “yeaaa let’s get rid of GPUs!” But you don’t realize how that would be super bad for the world, would make cooperation extremely difficult, would make everything become super secretive, etc.
The antimeme is that most people shouldn’t be thinking like consequentialists. Instead of thinking about how to maximize utility, they should be thinking about how to maximize dignity. This is easier. This is computationally tractable. This heuristic will make you do better.
I see so many people come into this arena with the anime protagonist “I’m going to save the world” complex, and then they burnout after 3 months and go do DMT.
I know two humans who can maybe reason better under the consequentialist frame. But for everyone else, if you’re going to do 5 years of soul-crushing difficult research without much support from the outside world, you should think under the dignity frame.

Thoughts on the importance of playing with large models

One mistake I see people make is that they underestimate the importance of getting actual hands-on experience with the thing you are studying. I think it’s important to actually have experience looking at & playing with large models. Throughout history, science has made a lot of progress from people just starting at a complicated thing until they figure it out.

Conjecture

We take short timelines seriously; we are an org that is committed to 5-year timelines. There’s a lot of stuff you do in 5-year timelines that no one else is doing. Also we think alignment is going to be hard. Also we are not optimistic about government stuff.
We think that in order for things to go well, there needs to be some sort of miracle. But miracles do happen. When Newton was thinking about stuff, what were the odds that motion on earth was governed by the same forces that governed motion in the stars? And what were the odds that all of this could be interpreted by humans? Then you see calculus and the laws of motion and you’re like “ah yes, that just makes sense.”

Refine (alignment incubator)

We started Refine, an incubator which will bring together alignment researchers who have weird, non-conventional research ideas. The goal is to incubate weird new ideas (they don’t have to work on anything relating to Conjecture).

Uncorrelated Bets

One problem with the current alignment community is that they’re very correlated. Anthropic, Open Phil, Redwood, and Paul are very correlated. I want people trying new and uncorrelated things.

What does Conjecture need right now?

We’re funding constrained!
We posted on LessWrong saying that we’re hiring, and we got so many high-quality applications. 1 in 3 applications were really good— that never happens! So we have some new people, and we have lots of projects, but we’re currently funding-constrained.

Full notes

AGI Timelines

20-30% in the next 5 years. 50% by 2030. 99% by 2100. 1% we already have it (but don’t know this yet). Higher uncertainty than Eliezer but generally buys the same arguments. Is mostly like “Eliezer’s arguments seem right but how can anyone be so confident about things?”
- Biggest update to timelines came from GPT-2. Connor was super impressed. Then GPT-3 came and Connor was like “oh wow, this happened just from scaling? Oh okay we’re really right around the corner to AGI.”
- Since then, I haven’t updated. Gato? Not surprising. Chain of thought stuff? Not surprising. Actually we figured out chain of thought stuff before everyone else, and we kept it secret.
- Heuristic: Imagine you were in a horror movie. At what point would the audience be like “why aren’t you screaming yet?” And how can you see GPT-3 and Dall-E (especially Dall-E) and not imagine the audience screaming at you?
- Stuart Russell: AI/ML professors don’t ask themselves “what happens if we succeed?” What happens if things just go as planned and we just AGI?
When talking about AGI, there is stuff that is technically correct and stuff that is correct in spirit. Sometimes people are like “grrr you said something that was not 100% technically correct!” This doesn’t matter. Sometimes it’s fine to say something that is technically a little off but captures the spirit or vibe of what you’re talking about.
- Ex: Eliezer had said AGI is “the thing that has the thing that separates humans from apes.”
Some people will be like “you will never have an agent that does biotech and builds Ais and solves NP hard problems”
- But that doesn’t matter—maybe the AI won’t be good at protein folding by itself. But if it can invent tools that fold proteins, it’s just as dangerous.

How we’ll get AGI

I expect that many capabilities will come just from scaling.
- In the past, when he thought about how to get to AGI, he’d have some boxes labeled “magic.” There are no longer any boxes labeled “magic” in his head.
Recursive self-improvement is sufficient but not necessary. Also there are lots of different ways for something to self-improve even if it doesn’t write its own code.
There’s no “hard stopping point of intelligence” at human-level intelligence. It’s weird to believe that AI can go from slug to bug to chimp to human and then all of a sudden we get it to human-level intelligence and it’s like “ah, I’ve gotten to the natural stopping point.”
Critics of AI safety seem overconfident. I can’t rule out the possibility that deep learning stops scaling. I don’t expect this to be true, but I can’t rule it out. Predicting the future is really really hard. So when ML experts are like “there’s no way”, I’m like how can you rule this out? You might not expect it, but how are you so confident?
If aliens were looking at earth, and they saw only 200 people working on this, they would be screaming at the screen! At least some more people should be working on this.

Thoughts on Ajeya’s bioanchors report

TAI definition is bad one to use
- GDP is a really bad measure. It’s sort of designed to measure things that don’t change. Wikipedia is one of the greatest products. It had 0 impact on GDP.
- If a whole country gets access to the internet, this (in itself) has zero impact on GDP.
- John Wentworth has a post about this.
Eliezer: Regulations might make “slow takeoffs” look like “fast takeoffs.” Maybe we’ll have AI gradually getting better. But it’ll be under regulation—the government will have it in testing, or the companies will be doing tests on it. Most people in society will not know that it’s happening. To them, once the model breaks out, it’ll look like it happened overnight. My Uber driver doesn’t know what GPT is or what Dall-E is. Even if slow takeoff happens, it’ll look like fast takeoff to nearly everyone in the world outside our tech bubble.

Thoughts on Paul-Eliezer-Others Dialogues

Something interesting was happening: “I was like, everyone else is being so much more reasonable than Eliezer, but I think Eliezer is right. Wait, what? I think Paul’s post is more reasonable. I think it’s more nuanced and thoughtful. Eliezer even admits that his AGI Ruin post was poorly written. So why did I disagree with Eliezer?”
Eliezer writes in fables and dialogues. Paul writes about technical things and tries to make specific predictions. One of the most frustrating parts of the dialogue is when Paul tries to get Eliezer to bet on anything—literally anything. Pick anything you want, and I’ll make a prediction! And Eliezer resists at every turn. And this caused people to turn on him. This is Mr. Bayes himself and he’s not betting? What??
I didn’t update as hard as others, because I think making these kinds of specific predictions is actually very hard to do. But I understand how people concluded that Paul was being more virtuous in the traditional rationalist sense.
Here’s my hot take: I think Paul and Eliezer were having two totally different conversations. Paul was trying to have a scientific conversation. Eliezer was trying to convey an antimeme.
An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them. So they need to be communicated in a special way. Moral intuitions. Truths about yourself. A psychologist doesn’t just tell you “yo, you’re fucked up bro.” That doesn’t work.
A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.
A lot of his frustration throughout the years has been him telling everyone that it’s really really hard to convey antimemes. Because it is.
If you read The Sequences, some of it is just factual explanations of things. But a lot of it is metaphor. It reads like a religious text. Not because it’s a text of worship, but because it’s about metaphors and stories that affect you more deeply than facts.
What happened in the MIRI dialogues is that Eliezer was telling Paul “hey, I’m trying to communicate an antimeme to you, but I’m failing because it’s really really hard.”

Thoughts on Death with Dignity

The Death with Dignity post is a perfect example of an antimeme. A great way to convey antimemes is through jokes and things outside the Overton Window.
My reaction to Death with Dignity was “oh, of course. Wasn’t that always what your point was?” I didn’t find it surprising.
The antimeme is in the third paragraph. What I love about the post is that he literally spells out the antimeme, but the top comment misses it.
The antimeme is that utilitarianism is hard, and no, it’s not actually a good idea to advocate for really stupid “pivotal acts” that sound ridiculous. Consequentialism is really hard. I have to reason about all of my possible choices and all of their possible consequences. If you have an infinitely big brain, this works. If not, it doesn’t.
It’s too computationally hard to be a perfect consequentialist. And being an imperfect consequentialist is really really bad. If you do one step of reasoning, you might be like “yeaaa let’s get rid of GPUs!” But you don’t realize how that would be super bad for the world, would make cooperation extremely difficult, would make everything become super secretive, etc.
The antimeme is that most people shouldn’t be thinking like consequentialists. Instead of thinking about how to maximize utility, they should be thinking about how to maximize dignity. This is easier. This is computationally tractable. This heuristic will make you do better.
And the top comment completely ignores the antimeme. It feels like it ignored 75% of the post. And this is what it feels like to be Eliezer Yudkowsky. You try for 20 years to convey your antimemes to the world, and people just don’t get them.
The actions generated by the “death with dignity” frame will be better than the actions generated by the “how do we save the world” frame. One problem is that the “how do we save the world” frame is that it gives 0 value to worlds where we all die. But Eliezer says hey, we should get dignity points proportional to how close we got to saving the world. This is a much better frame. It is much psychologically healthier. It generates better ideas.
- This is how I think. I expect that we’re going to die. But you know what, I’m going to die with some dignity, and work on hard problems, and hangout with great friends, and give it my all.
I see so many people come into this arena with the anime protagonist “I’m going to save the world” complex, and then they burnout after 3 months and go do DMT.
I know two humans who can maybe reason better under the consequentialist frame. But for everyone else, if you’re going to do 5 years of soul-crushing difficult research without much support from the outside world, you should think under the dignity frame.

Thoughts on the importance of playing with large models

One mistake I see people make is that they underestimate the importance of getting actual hands-on experience with the thing you are studying. I think it’s important to actually have experience looking at & playing with large models. Throughout history, science has made a lot of progress from people just starting at a complicated thing until they figure it out.
Sometimes rationalists would be like “oh why do you need GPT-3? Why can’t you just do stuff with GPT-2? It has a lot of the same features, you know. What are three specific experiments you want to run with GPT-3?” And I’m like “that’s missing the point. I could come up with three experiments, but the point is that I just want to stare at this complicated thing so I can understand it better.”
Most of my competitive advantage comes from me actually playing with models.

Was Eleuther AI net negative?

In 2020, the alignment community hadn’t updated on scaling.
Eluther AI was a bunch of friends dicking around on Discord. We were always concerned about AI alignment.
Nowadays, there are tons of posts about scaling. Back in 2020, it was just me and Gwern.
I think it’s good for large AI labs to do a lot of research and then never tell anyone about it. 99% of the damage from GPT-3 was them publishing the paper. The biggest secret about the atomic bomb was that it was actually possible.
People critique Eluether AI for accelerating timelines by releasing some stuff. Connor doesn’t buy these criticisms. One counter is that everyone in the capabilities world already knew about these models and the people who were most out-of-the-loop were people in the alignment community. Another counter is that these models would have been released anyways (by someone other than Eleuther AI).
We never released anything that was cutting edge or that was beyond what other people already had.
Eleuther AI created a space that was respectful to the ML community, interfaced with the ML community, and was very explicit about the importance of alignment.
Alignment is obviously the coolest problem to work on. But for various cultural reasons, ML people don’t like the alignment people. But Eluether AI got to play this middle ground—we were alignment researchers who had respect from ML researchers. They respected us, and they were like oh wow, these people think alignment is serious. Maybe alignment is serious! This led some people to become convinced about the importance of alignment.
Eleuther AI did not have a policy of sharing everything. This is a misconception. We have at least 2 examples in which we discovered something cutting-edge and we successfully kept it private from the rest of the world.
Yes, we were open source, but we didn’t release everything. We would release specific things for specific reasons. But we didn’t believe everything should be shared.
Eleuther AI is still a great place for people to talk about nitty-gritty technical stuff about alignment. We’re also happy to share compute with interpretability/alignment research.

Conjecture

Eleuther was cool but people didn’t want to do boring stuff like debug code. So hey, let’s start a company to tackle the alignment problem head-on!
OpenPhil had concerns. We don’t know you well. You’re asking for a lot of money. Your research is weird. You did things in past that were questionable. Then he went to DC and people were like “yeaaa you’re awesome!!! Take all this money!!! Do awesome things!”
I understand why OpenPhil didn’t fund me—they’re conservative and concerned about their reputation (especially relative to VCs).
Conjecture started in March 2022; founding team was like 8-9 people.
We take short timelines seriously; we are an org that is committed to 5-year timelines. There’s a lot of stuff you do in 5-year timelines that no one else is doing. Also we think alignment is going to be hard. Also we are not optimistic about government stuff.
When I talk to people working on AGI, they are very reasonable. I think they are wrong, but they are reasonable. They’re not insane evil people. They just have different models about how the world works. The fact that I can talk to the AGI people is really great. This is not the same as in many other communities.

Thoughts on government coordination

I haven’t met anyone who is actually evil. And those people exist. And maybe they will enter the space. But I haven’t met them yet.
Some people are like “oh maybe we can get the US and China to cooperate.” But there are some people in government who are actually evil. Politics selects for this. I think it’s possible to have Demis and Sam sit down and agree on some stuff. I don’t think that’s possible for the US and China. That’s why I’m pessimistic about government stuff.

Miracles

We think that in order for things to go well, there needs to be some sort of miracle. But miracles do happen. When Newton was thinking about stuff, what were the odds that motion on earth was governed by the same forces that governed motion in the stars? And what were the odds that all of this could be interpreted by humans? Then you see calculus and the laws of motion and you’re like “ah yes, that just makes sense.”
I think it’s really hard for a modern person to actually put themselves in the epistemic state of a pre-scientific person. We see things clearly and we forget how confusing it actually feels like before we understand things.

Uncorrelated bets

One problem with the current alignment community is that they’re very correlated. Anthropic, Open Phil, Redwood, and Paul are very correlated. I want people trying new and uncorrelated things.
More people should be trying things, and people should be trying more different things. Hot take!
I want Conjecture to be a place where people can do lots of different things.
- Example: We have an epistemologist who is trying to understand the history of science. How have questions like this been answered in the past? How can we do science better?

What partial solutions to alignment will look like

Economics is too complicated to reason about unless we put certain assumptions/constraints in place. We do things like “hey, let’s assume efficient markets, because if we do that, we can make all of these interesting insights about how things would work.”
Alignment work looks similar to this. We can do things like “hey, if the agent is myopic, and XYZ is never allowed to happen in training, we can make all of these interesting insights about how it would work.”
This is what I expect the first partial solutions to alignment to look like.

What Conjecture works on

Conjecture works on interpretability. Interpretability is useful because it gives us the tools to implement solutions to alignment. We can’t do proofs on these systems if we don’t understand what’s going on inside them.
Aside: Some people used to think that these models are black-boxes that we can’t understand. This is wrong! There’s so much structure. It’s hard, but we can actually understand a lot.
But interpretability is not enough. Even if we had perfect interpretability tools, these tools alone would not tell us how to build an aligned AI. So we also need conceptual progress.

Refine (alignment incubator)

That’s why we started Refine, an incubator which will bring together alignment researchers who have weird, non-conventional research ideas. The goal is to incubate weird new ideas (they don’t have to work on anything relating to Conjecture). We have no constraints about what they do. We just want them to come up with new ideas.
Every time I read the blogs of the people we chose, I’m like “fuck yeah! This is super crazy.” I don’t expect most of these ideas to work, but I want people thinking about new ideas and trying new things.

Thoughts on infohazards

The way we think about infohazards: There are some simple ideas out there that can speed up capabilities. Our infohazard policy is like a seatbelt. We are going to publish a lot of our stuff. But the default is to keep things to ourselves. We want to check before publishing stuff. We are going to release things, and we’re going to release alignment tools, but we’re going to be careful with what we release.
We’re not in the AGI race. We’re not going to do anything state-of-the-art. We also don’t even have enough compute, and we’re not planning on getting enough compute.

Why is Conjecture for-profit?

It’s good for hiring and for long-term sustainability. In EA, the default is non-profit. And it’s nice that there are a lot of funders who are investing into this space. But talent and compute is really expensive, so it is useful to make money. Relying on the whims of billionaires will not be sufficient.
If you want a long-term organization that moves billions of dollars and isn’t hurt by dips in the crypto market, you want to have a for-profit company.

How will Conjecture make money?

There will be a product team. Some of the stuff that we create for alignment will also be useful for developing consumer products.
We’re interested in hiring people who may have previously been doing earning-to-give. Product managers, web developers, etc. Instead of working for some FANG company and donating to alignment, you can work here. And then for every $1 you earn, we can leverage this to raise so much more money from VCs.
We’re going to develop useful tools that save companies money. There are so many tools that can be valuable for companies without **advancing capabilities.
To be clear, if the EA world wants to fund us sustainably (e.g., if Sam dropped $1B into my lap), we’d be thrilled. We wouldn’t need a product team. But I just don’t expect to get enough money from the EA world.

What does Conjecture need right now?

We’re funding constrained!
We posted on LessWrong saying that we’re hiring, and we got so many high-quality applications. 1 in 3 applications were really good— that never happens! So we have some new people, and we have lots of projects, but we’re currently funding-constrained.

Why invest in Conjecture instead of Redwood or Anthropic?

Well, don’t invest in us instead of those other orgs.
But we’re directly trying to tackle the alignment problem, and we’re doing so in an uncorrelated way. Redwood/Anthropic/ARC are taking correlated bets. We’re taking uncorrelated bets and trying to get new ideas to tackle the alignment problem.

Nice interview, liked it overall! One small question -

Heuristic: Imagine you were in a horror movie. At what point would the audience be like “why aren’t you screaming yet?” And how can you see GPT-3 and Dall-E (especially Dall-E) and not imagine the audience screaming at you?

I feel like I'm missing something; to me, this heuristic obviously seems like it'd track "what might freak people out" rather than "how close are we actually to AI". E.g. it feels like I could also imagine an audience at a horror movie starting to scream in the 1970s if they were shown the sample dialogue with SHRDLU starting from page 155 here. Is there something I'm not getting?

Jonathan Blow had a thread on Twitter about this, like Eroisko SHRDLU has no published code, no similar system showing the same behaviour after 40-50 years. Just the author’s word. I think the performance of both was wildly exaggerated.

But if we are the movie audience seeing just the publication of the paper in the 70s, we don't yet know that it will turn out to be a dead end with no meaningful follow-up after 40-50 years. We just see what looks to us like an impressive result at the time.

And we also don't yet know if GPT-3 and Dall-E will turn out to be dead ends with no significant progress for the next 40-50 years. (I will grant that it seems unlikely, but when the SHRDLU paper was published, it being a dead end must have seemed unlikely too.)

Millions have personally used GPT-3 in this movie.

If we start going to the exact specifics of what makes them different then yes, there are reasonable grounds for why GPT-3 would be expected to genuinely be more of an advance than SHRDLU was. But at least as described in the post, the heuristic under discussion wasn't "if we look at the details of GPT-3, we have good reasons to expect it to be a major milestone"; the heuristic was "the audience of a horror movie would start screaming when GPT-3 is introduced".

If the audience of a 1970s horror movie would have started screaming when SHRDLU was introduced, what we now know about why it was a dead end doesn't seem to matter, nor does it seem to matter that GPT-3 is different. Especially since why would a horror movie introduce something like that only for it to turn out to be a red herring?

I realize that I may be taking the "horror movie" heuristic too literally but I don't know how else to interpret it than "evaluate AI timelines based on what would make people watching a horror movie assume that something bad is about to happen".

Seems like he basically admits the thing was a fraud:

I found this a useful crystallization of what was going on with Death With Dignity (I'm curious if Eliezer thinks this was a good summary)

I appreciate the post and Connor for sharing his views, but the antimeme thing kind of bothers me.

Here’s my hot take: I think Paul and Eliezer were having two totally different conversations. Paul was trying to have a scientific conversation. Eliezer was trying to convey an antimeme.
An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them. So they need to be communicated in a special way. Moral intuitions. Truths about yourself. A psychologist doesn’t just tell you “yo, you’re fucked up bro.” That doesn’t work.
A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.
A lot of his frustration throughout the years has been him telling everyone that it’s really really hard to convey antimemes. Because it is.
If you read The Sequences, some of it is just factual explanations of things. But a lot of it is metaphor. It reads like a religious text. Not because it’s a text of worship, but because it’s about metaphors and stories that affect you more deeply than facts.
What happened in the MIRI dialogues is that Eliezer was telling Paul “hey, I’m trying to communicate an antimeme to you, but I’m failing because it’s really really hard.”

Does Connor ever say what antimeme Eliezer is trying to convey, or is it so antimemetic that no one can remember it long enough to write it down?

I understand that if this antimeme stuff is actually true, these ideas will be hard to convey. But it's really frustrating to hear Connor keep talking about antimemes while not actually mentioning what these antimemes are and what makes them antimemetic. Also, saying "There are all these antimemes out there but I can't convey them to you" is a frustratingly unfalsifiable statement.

I suppose if it’s an a antimeme, I may be not understanding. But this was my understanding:

Most humans are really bad at being strict consequentialists. In this case, they think of some crazy scheme to slow down capabilities that seems sufficiently hardcore to signal that they are TAKING SHIT SERIOUSLY and ignore second order effects that EY/Connor consider obvious. Anyone whose consequentialism has taken them to this place is not a competent one. EY proposes such people (which I think he takes to mean everyone, possibly even including himself) follow a deontological rule instead, attempt to die with dignity. Connor analogizes this to reward shaping - the practice of assigning partial credit to RL agents for actions likely to be useful in reaching the true goal.

I think that's the antimeme from the Dying with Dignity post. If I remember correctly, the MIRI dialogues between Paul and Eliezer were about takeoff speeds, so Connor is probably referring to something else in the section I quoted, no?

That is a very enlightening post! My favorite bits:

A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.
I think Paul and Eliezer were having two totally different conversations. Paul was trying to have a scientific conversation. Eliezer was trying to convey an antimeme.

Yeah, I also really liked those bits.

I'm curious if Eliezer endorses this, especially the first paragraph.

We posted on LessWrong saying that we’re hiring, and we got so many high-quality applications. 1 in 3 applications were really good— that never happens! So we have some new people, and we have lots of projects, but we’re currently funding-constrained.

I want to flag I expect this to not just be funding constrained by network-constrained – onboarding a new employee doesn't just cost money but a massive amount of time, especially if you're trying to scale a nuanced company culture.

I'm starting to think that utilitarianism is the heart of the problem here. "Utilitarianism is intractable" is only an antimeme to utilitarians, in the same way that "Object-Oriented Programming is complex" is only an antimeme to people who are fans of Object-Oriented Programming.

I'd argue that a major part of the problem really is long-term consequentialism, but I'd argue that this is inevitable at least partially as soon as 2 conditions are met by default:

Trade offs exist, and the value of something cannot be infinity nor arbitrarily large values.
Full knowledge of the values of something isn't known to the agent.

It really doesn't matter whether consequentialism or morality is actually true, just whether it's more useful than other approaches (given that capabilities researchers are only focusing on how capable a model is.)

And for a lot of problems in the real world, this is pretty likely to occur.

For a link to a dentological AI idea, here it is:

https://www.lesswrong.com/posts/FSQ4RCJobu9pussjY/ideological-inference-engines-making-deontology

And for a myopic decision theory, LCDT:

https://www.lesswrong.com/posts/Y76durQHrfqwgwM5o/lcdt-a-myopic-decision-theory

This is a really interesting interview with lots of great ideas. Thanks for taking notes on this!

The only point I don't really agree with is the idea that Redwood Research, Anthropic, and ARC are correlated. Although they are all in the same geographic area, they seem to be working on fairly different projects to me:

Redwood Research: controlling the output of language models.
Anthropic: deep transformer interpretability work.
ARC: theoretical alignment research (e.g. ELK).

Lacks explanations of basics like who Conjecture are, and what an antimeme is.

Conjecture is a new AI alignment organization (https://www.conjecture.dev/). Edited the post to include the link.

Connor's explanation of an antimeme (as presented in the interview) is above:

An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them.

And some are too complicated, and some are too unusual, and some are too disturbing. Four very different things.

Yes, all those. And all of them occur in the AI safety discussions. The result is the same though: SCP-style disappearance from personal or public consciousness.

Since the causes are different, the cures are different.