These are my personal thougths about this interview

Epistemic status: I neither consider myself a machine-learning expert, nor am I an alignment expert. My focus is on outreach: explaining AI safety to the general public and professionals outside of the AI safety community. So an interview like this one is important material for me to both understand the situation myself and explain it to others. After watching it, I’m somewhat confused. There were bits in this talk that I liked and others that disturbed me. There seems to be a mix of humbleness and hubris, of openly acknowledging AI risks and downplaying some elements of them. I am unsure how open and honest Sam Altman really was. I don’t mean to criticize. I want to understand what OpenAI’s and Sam Altman’s stance towards AI safety really is.

Below I list transcriptions of the parts that seemed most relevant for AI safety and my thoughts/questions about them. Maybe you can help me better understand this by commenting.

[23:55] Altman: “Our degree of alignment increases faster than our rate of capability progress, and I think that will become more and more important over time.”

I don’t really understand what this is supposed to mean. What’s a “degree of alignment”? How can you meaningfully compare it with “rate of capability progress”? To me, this sounds a lot like marketing: “We know we are dealing with dangerous stuff, so we are extra careful.” Then again, it’s probably hard to explain this in concrete terms in an interview.

[24:40] Altman: “I do not think we have yet discovered a way to align a super powerful system. We have something that works for our current scale: RLHF.”

I find this very open and honest. Obviously, he not only knows about the alignment problem, but openly admits that RLHF is not the solution to aligning an AGI. Good!

[25:10] Altman: “It’s easy to talk about alignment and capability as of orthogonal vectors, they’re very close: better alignment techniques lead to better capabilities, and vice versa. There are cases that are different, important cases, but on the whole I think things that you could say like RLHF or interpretability that sound like alignment issues also help you make much more capable models and the division is just much fuzzier than people think.”

This, I think, contains two messages: “Capabilities research and alignment research are intertwined” and “criticizing us for advancing capabilities so much is misguided, because we need to do that in order to align AI”. I understand the first one, but I don’t subscribe to the second one, see discussion below.

[47:53] Fridman: “Do you think it’s possible that LLMs really is the way we build AGI?”
Altman: “I think it’s part of the way. I think we need other super important things … For me, a system that cannot significantly add to the sum total of scientific knowledge we have access to – kind of discover, invent, whatever you want to call it – new, fundamental science, is not a superintelligence. … To do that really well, I think we need to expand on the GPT paradigm in pretty important ways that we’re still missing ideas for. I don’t know what those ideas are. We’re trying to find them.”

This is pretty vague, which is understandable. However, it seems to indicate to me that the current, relatively safe, mostly myopic GPT approach will be augmented with elements that may make their approach much more dangerous, like maybe long term memory and dynamic learning. This is highly speculative, of course.

[49:50]  Altman: “The thing that I’m so excited about is not that it’s a system that kind of goes off and does its own thing but that it’s this tool that humans are using in this feedback loop … I’m excited about a world where AI is an extension of human will and an amplifier of our abilities and this like most useful tool yet created, and that is certainly how people are using it … Maybe we never build AGI but we just make humans super great. Still a huge win.”

The last sentence is the most promising one in the whole interview from my point of view. It seems to indicate that Sam Altman and OpenAI are willing to stop short of creating an AGI if they can be convinced that alignment isn’t solved and creating an AGI would be suicidal. They may also be willing to agree on “red lines” if there is a consensus about them among leading developers.

[54:50]  Fridman refers to Eliezer Yudkowsky’s view that AI will likely kill all of humanity.
Altman: “I think there’s some chance of that and it’s really important to acknowledge it because if we don’t talk about it, if we don’t treat it as potentially real, we won’t put enough effort into solving it. And I think we do have to discover new techniques to be able to solve it … The only way I know how to solve a problem like this is iterating our way through it, learning early, and limiting the number of one-shot-to-get-it-right scenarios that we have.”

I give Sam Altman a lot of credit for taking Eliezer’s warnings seriously, at least verbally. However, he seems to rule out the approach of solving the alignment problem in theory (or acknowledging its theoretical unsolvability), relying on a trial and error approach instead. This I think is very dangerous. “Limiting the number of one-shot-to-get-it-right scenarios” doesn’t do it in my eyes if that number doesn’t go down to zero.

[59:46] Fridman asks about take-off speed. Altman: “If we imagine a two-by-two matrix of short timelines till AGI starts /long timelines till AGI starts [and] slow take-off/fast take-off … what do you think the safest quadrant will be? … Slow take-off/short timelines is the most likely good world and we optimized the company to have maximum impact in that world, to try to push for that kind of world, and the decisions we make are … weighted towards that. … I’m very afraid of the fast take-offs. I think in the long time-lines it’s hard to have a slow take-off, there’s a bunch of other problems too.”

Here he seems to imply that the two axes aren’t independent: Short timelines supposedly lead to a slow take-off, and vice versa. I don’t see why that should be the case: If an AI gets out of control, that’s it, regardless of when that happens and how fast. I understand the idea of an incremental approach to AI safety, but I don’t think that a high (if not to say breakneck) speed of deployment like OpenAI has demonstrated in the past helps in any way. He seems to use this argument to justify that speed on the grounds of improved safety, which I strongly feel is wrong.

[1:09:00] Fridman asks what could go wrong with an AI. Altman: “It would be crazy not to be a little bit afraid. And I empathize with people who are a lot afraid. … The current worries that I have are that there are going to be disinformation problems or economic shocks or something else at a level far beyond anything we’re prepared for. And that doesn’t require superintelligence, that doesn’t require a super deep alignment problem and the machine waking up trying to deceive us. And I don’t think it gets enough attention. … How would we know if the flow we have on twitter … like LLMs direct whatever’s flowing through that hive mind? … As on twitter, so everywhere else eventually … We wouldn’t [know]. And that’s a real danger. … It’s a certainty there are soon going to be a lot of capable open-sourced LLMs with very few to none safety controls on them … you can try regulatory approaches, you can try with more powerful AIs to detect this stuff happening, I’d like us to try a lot of things very soon.”

This is not really related to AGI safety and I’m not sure if I’m misinterpreting this. But it seems to imply something like “we need to develop our AGI fast because it is needed to combat bad actors and others are less safety-concerned than we are”. If I’m correct, this is another defense of fast deployment, if a more subtle one.

[1:11:19] Fridman asks how OpenAI is prioritizing safety in the face of competitive and other pressures. Altman: “You stick with what you believe and you stick to your mission. I’m sure people will get ahead of us in all sorts of ways and take shortcuts we’re not gonna take. … I think there are going to be many AGIs in the world so it’s not like outcompete everyone. We’re gonna contribute one, other people are gonna contribute some. I think multiple AGIs in the world with some differences in how they’re built and what they do what they’re focused on, I think that’s good. We have a very unusual structure though, we don’t have this incentive to capture unlimited value. I worry about the people who do, but, you know, hopefully it’s all gonna work out.”

I felt somewhat uneasy listening to this. It sounds a lot like “we’re the good guys, so don’t criticize us”. It also feels like downplaying the actual competitive pressure, which OpenAI have increased themselves. Does Sam Altman really believe in a stable world where there are many AGIs competing with each other, some of them with only minimal safety, and all goes well? This is either very naïve or somewhat dishonest in my opinion.

[1:14:50] Altman (talking about the transformation from non-profit to “capped” for-profit): “We needed some of the benefits of capitalism, but not too much.”

[1:16:00] Altman (talking about competition): “Right now there’s like extremely fast and not super deliberate motion inside of some of these companies, but already I think people are, as they see the rate of progress, already people are grappling with what’s at stake here. And I think the better angels are going to win out. … The incentives of capitalism to create and capitalize on unlimited value, I’m a little afraid of, but again, no one wants to destroy the world. … We’ve got the Moloch problem, on the other hand we’ve got people who are very aware of that, and I think, a lot of healthy conversation about how can we collaborate to minimize some of these very scary downsides.

Again, he depicts OpenAI as being ethically “better” than the competition because of the capped profit rule (which, as far as I understand, has a very high ceiling). This in itself sounds very competitive. On the other hand, he seems open for collaboration, which is good.

[1:17:40] Fridman asks whether power might corrupt Altman/OpenAI. Altman: “For sure. I think you want decisions about this technology and certainly decisions about who is running this technology to become increasingly democratic over time. We haven’t figured out quite how to do this. But part of the reason for deploying like this is to get the world to have time to adapt and to reflect and to think about this, to pass regulations, for institutions to come up with new norms, for the people working out together. That is a huge part of why we deploy even though many of the AI Safety people you referenced earlier think it’s really bad. Even they acknowledge that this is like of some benefit. But I think any version of ‘one person is in control of this’ is really bad. … I don’t have and I don’t want like any super voting power, any special … control of the board or anything like that at OpenAI.

Again, there seem to be good and bad messages here. I think it’s good that he acknowledges the enormous power OpenAI has and that it needs democratic regulation. But he again justifies the high deployment speed by arguing that this gives the world “time to adapt”. It think this is a contradiction. If he really wanted to give the world time to adapt, why didn’t they launch ChatGPT, then wait two or three years before launching Bing Chat/GPT-4? Sam Altman would probably argue “we couldn’t because the competition is less safety concerned than we are, so we need to stay ahead”. This is of course speculative on my side, but I don’t like this kind of thinking at all.

[1:44:30] Fridman asks if an AGI could successfully manage a society based on centralized planning Soviet Union-style. Altman: “That’s perfect for a superintelligent AGI. … It might be better [than the human Soviet Union leaders], I expect it’d be better, but not better than a hundred, a thousand AGIs sort of in a liberal democratic system. … Also, how much of that could happen internally in one superintelligent AGI? Not so obvious. … Of course [competition] can happen with multiple AGIs talking to each other.”

Again, he points to a world with many competing AGIs in some kind of "libertarian utopia". I have no idea how anyone could think this would be a stable situation. Even we humans have great difficulty creating stable, balanced societies, and we all have more or less the same level of intelligence. How is this supposed to work if competing AGIs can self-improve and/or amass power? I can’t think of a stable world state which is not dominated by a single all-powerful AGI. But this may of course be due to my lack of imagination/knowledge.

[1:45:35] Fridman mentions Stuart Russell’s proposal that an AI should be uncertain about its goals. Altman: “That feels important.” Fridman asks if uncertainty about its goals and values can be hard-engineered into an AGI. Altman: “The details really matter, but as I understand them, yes I do [think it is possible].”

[1:46:08] Fridman: “What about the off-switch?” Altman: “I’m a fan. … We can absolutely take a model back off the internet. … We can turn an API off.”

These are minor points and I may be misunderstanding them, but they seem to point towards a somewhat naïve view on AI safety.

[1:46:40] Fridman asks if they worry about “terrible usecases” by millions of users. Altman: “We do worry about that a lot. We try to figure it out … with testing and red teaming ahead of time how to avoid a lot of those, but I can’t emphasize enough how much the collective intelligence and creativity of the world will beat OpenAI and all of the red-teamers we can hire. So we put it out, but we put it out in a way we can make changes.”

[2:05:58] Fridman asks about the Silicon Valley Bank. Altman: “It is an example of where I think you see the dangers of incentive misalignment, because as the Fed kept raising [the interest rate], I assume that the incentives on people working at SVB to not sell at a loss their ‘super safe’ bonds which are now down to 20% or whatever … that’s like a classic example of incentive misalignment … I think one takeaway from SVB is how fast the world changes and how little our experts and leaders understand it … that is a very tiny preview of the shifts that AGI will bring. … I am nervous about the speed of these changes and the speed with which our institutions can adapt, which is part of why we want to start deploying these systems really early while they’re really weak, so that people have as much time as possible to do this. I mean it’s really scary to have nothing, nothing, nothing and then drop a super powerful AGI all at once on the world.”

Again, he’s arguing for quick deployment in the name of safety. This more and more feels like a justification for OpenAI’s approach, instead of an open discussion of the arguments for and against it. But that’s probably to be expected from an interview like this.

All in all, I feel a bit uneasy about this interview. In parts, it sounds a lot like what someone would say who wants to be seen as cautious and rational, but actually only wants to stay ahead of the competition whatever the cost and uses this talk to justify their breakneck-speed strategy. On the other hand, there are a lot of things Sam Altman says that show he actually understands his responsibility and is open for cooperation and regulation, which I am very grateful of. Also, most leaders in his position would probably be less open about the risks of their technology.

What’s your take? 

New to LessWrong?

New Comment
22 comments, sorted by Click to highlight new comments since: Today at 1:57 AM

We’ve got the [molecular?] problem

I think he was saying "Moloch", as in the problem of avoiding a tragedy of the commons.

Ah, thanks!

I pasted the YouTube video link into AssemblyAI's Playground (which I think uses Conformer-1 for speech to text) and generated a transcript, available at this link. However, the transcript lacks labels for who is speaking.

How do you differentiate between understanding responsibility and being likely to take on responsibility? Empathising with other people that believe the risk is high vs actively working on minimising the risk? Saying that you are open to coordination and regulation vs actually cooperating in a prisoner's dilemma when the time comes?

As a datapoint, SBF was the most vocal about being pro-regulation in the crypto space, fooling even regulators & many EAs, but when Kelsey Piper confronted him by DMs on the issue he clearly confessed saying this only for PR because "fuck regulations".

I wouldn't want to compare Sam Altman to SBF and I don't assume that he's dishonest. I'm just confused about his true stance. Having been a CEO myself, I know that it's not wise to say everything in public that is in your mind.

It does kinda make sense to plant the world thick with various AIs and counter-AIs, because that makes it harder for one AI to rise and take over everything. It's a flimsy defense but maybe better than none at all.

The elephant in the room though is that OpenAI's alignment efforts for now seem to be mostly about stopping the AI from saying nasty words, and even that in an inefficient way. It makes sense from a market perspective, but it sure doesn't inspire confidence.

It does kinda make sense to plant the world thick with various AIs and counter-AIs, because that makes it harder for one AI to rise and take over everything.

I'm not sure about that. It makes sense if the AIs stay more or less equal in intelligence and power, similar to humans. But it doesn't make sense if the strongest AI is to the next powerful like we are to Gorillas, or mice. The problem is that each of the AGIs will have the same instrumental goals of power-seeking and self-improvment, so there will be a race very similar to the race between Google and Microsoft, only much quicker and more fierce. It's extremely unlikely that they will all grow in power at about the same rate, so one will outpace the others pretty soon. In the end "the winner takes it all", as they say.

It may be that we'll find ways to contain AGIs, limit their power-seeking, etc., for a while. But I can't see how this will remain stable for long. It seems like trying to stop evolution.

I also wrote about this interview via a LinkedIn article: On AGI: Excerpts from Lex Fridman’s interview of Sam Altman with commentary. I appreciated reading you post, in part because you picked-up on some topics I overlooked.  My own assessment is that Altman's outlook derives from a mixture of utopianism and the favorable position of OpenAI.  Utopianism can be good if tethered to realism about existing conditions, but realism seemed lacking in many of Altman's statements.

Altman’s vision would be more admirable if the likelihood of achieving it were higher. Present world conditions are likely to result in very different AGIs emerging from the western democracies and China, with no agreement on a fundamental set of shared values. At worst, this could cause an unmanageable escalation of tensions. And in a world where the leading AI powers are in conflict over values and political and economic supremacy, and where all recognize the pivotal significance of AI, it is hard to imagine the adoption of a verifiable and enforceable agreement to slow, manage, or coordinate AGI development.  In the western democracies this is likely to mean both managed and intensified competition: intensified as awareness of the global stakes grows, and managed because, increasingly, competition will have to be coordinated with national security needs and with efforts to preserve social cohesion and economic openness. AGI could confer unassailable first mover advantages that could lead to extremely broad economic, if not social and political, domination, something the western democracies must prevent if they want to sustain their values.

We’ve got the Moloch problem, on the other hand we’ve got people who are very aware of that, and I think, a lot of healthy conversation about how can we collaborate to minimize some of these very scary downsides.

I'm imagining a comic where someone says this while in line to sacrifice their baby to Moloch.

Does Sam Altman really believe in a stable world where there are many AGIs competing with each other, some of them with only minimal safety, and all goes well?

My weak guess is that it will be comparable to nuclear power. Most AGI will peacefully coexist with the occasional Chernobyl-type disaster but few if any incidents on the scale of threatening all humanity. Most applications for misuse will likely be disinformation or financial fraud or other ways we haven't yet imagined.

I don't think he is naive or dishonest, it's just that we have different mental models when we think of 'AGI.' My intuition is that most AGIs will still be quite narrow, like ChatGPT, until we're close to a robust self improving artificial intelligence.

I don't think he is naive or dishonest, it's just that we have different mental models when we think of 'AGI.' My intuition is that most AGIs will still be quite narrow, like ChatGPT, until we're close to a robust self improving artificial intelligence.

I think Sam Altman's remarks about GPT-4 not being an AGI point in a different direction. He even defines "superintelligence" as a system that is able to significantly advance sciene, in other words, self-improve and invent new technologies (maybe including self-replicating nanobots). It's his declared goal to develop this. It will be much harder to contain and control such a system.

This is the big issue, since a writable memory, a memory that can store arbitrarily long problems, and online learning procedures are too useful for the use case of advancing science significantly, especially cheaper.

Now, I do think we will be able to align even superhuman AI, but this is the issue I have with the proposition of narrowness: Science usually requires way more generality than most tasks.

It depends on what form it takes. I don't see superintelligence emerging from large language models. There will be a lot of impressive, and some scary, technologies that develop from LLMs but they will still be quite narrow. Also, I'm not saying that there's no danger, Chernobyl was terrible after all. I'm opining that it will be isolated incidents, not any singular world ending incident. 

The difficulty of containment and control will depend on the form the superintelligence takes. It could, for example, take the form of a bunch of AGIs combining forces making them functionally a superintelligence but not very integrated. Or it could be one integrated superintelligent system that can self-improve but has little influence outside of virtual space.

One aspect that I think is under discussed is the possible physical attributes of a superintelligence such as how much energy does it consume? or what sort of substrate does it run on? Answering these questions could provide insight into alignment.

I don't see superintelligence emerging from large language models.

I agree, largely due to lack of writable memories that last beyond the interaction, as well as the fact that current LLMs can't solve arbitrarily long/complicated problems in their memory.

But if Nathan Helm-Burger is to be believed, that second thing might change fast, as I heard multiple research groups are closing in on a solution to the problem of AGI such that they can solve arbitrarily long/complicated problems in their memory.

And the reports said that only engineering obstacles stand in the way of that.

In this context does 'engineering obstacles' simply refer to iteration? Or do we, for example, need new hardware that can do writable memory, solve long/complicated problems, has the physical propensity for self improvement, etc.? If it's the latter it will take longer than we've imagined. My intuition is that if we achieve AGI without massive improvements to current hardware it will still be quite narrow.

Basically, it's integrating it into new AI models, including some new hardware, that's the challenge.

But the key thing to note is that while engineering can be hard, it's of a hardness we reliably know how to solve given time. In this case, the fact that we only need engineering to do it massively increases the chance that someone will be able to do it in say, 10 years.

In particular, it means that the solution to AGI is essentially in sight, and all we need to do is essentially do boring engineering work we know how to do rather than say, discovering new algorithms.

Even shorter, it means we're in the endgame for AGI, with the final pieces almost ready to launch.

I'm sure I don't fully understand what you mean by 'integrating it into new AI models' but in any case it seems we disagree on forecasting the level and degree of advancement. I think models and integration will only be as good as the physical hardware it runs on which, to me, is the biggest bottleneck. It doesn't seem practical that our current chips and circuit boards can house a superintelligence regardless of scale and modularity. So in 10 years I think we'll have a lot of strong AGIs that are able to do a lot of useful and interesting work and we'll probably need new subcategories to usefully describe them and tell them apart (I'm just spitballing on this point). 

However, true AI (or superintelligence) that can cognitively outperform all of humanity and can self improve will take longer and run on hardware that would be alien to us today. That's not to say that AGI won't be disruptive or dangerous, just not world ending levels of dangerous. You could say that the endgame for AGI is the opening game of true AI.

It doesn't seem practical that our current chips and circuit boards can house a superintelligence regardless of scale and modularity.

This is my disagreement point. I think that we will be able to build chips that can house a mild superintelligence, solely out of the energy used by the human brain, assuming the most energy efficient chips are used.

And if we allow ourselves to crank the energy up, then it's pretty obviously achievable even using current chips.

And I think this is plenty dangerous, even an existential danger, even without exotic architectures due to copying and coordination.

So this:

However, true AI (or superintelligence) that can cognitively outperform all of humanity and can self improve will take longer and run on hardware that would be alien to us today.

Is not necessary for AGI to happen, or to be a huge deal.

Don't get me wrong, exotic stuff exists for hardware, which if made practical, would be a even huger deal, because at the far end of exotic hardware like quantum computers, such a computer could simulate a brain basically perfectly accurately, and do all sorts of very big deals, but that isn't necessary in order for things to be a wild ride due to AGI this century.

I think that we will be able to build chips that can house a mild superintelligence, solely out of the energy used by the human brain, assuming the most energy efficient chips are used.

I agree with this statement, just not any time soon since hardware advancement is relatively slow. I also agree that this century will be a wild ride due to AGI and I imagine that AGI will play an important role in developing the exotic hardware and/or architecture that leads to superintelligence. 

The speed and order of these emerging technologies we disagree on. I think we'll have powerful AGIs this decade and they'll have a huge impact but will still be quite narrow compared to a true superintelligence. My prediction is that superintelligence will emerge from iteration and development over time and it will run on exotic hardware probably supplemented by AGI. My prediction is mostly informed by physical constraints and current rates of development. 

As for timing I'm going to guess between one and two hundred years. (I wouldn't be surprised if we have the technology to augment human intelligence before then but the implications of that are not obvious. If advanced enough maybe it leads to a world similar to some scifi stories where only analog tech is used and complex work or calculation is done by augmented humans.)

As for timing I'm going to guess between one and two hundred years.

Yep, that's the basic disagreement we have, since I expect this in 10-30 years, not 100-200 years, due to the fact that I see it as we're almost at the point we can create such a mild superintelligence.

The speed and order of these emerging technologies we disagree on.

This is yes, our most general disagreement here on how fast things are.

Maybe it would be useful to define 'mild superintelligence.' This would be human baseline? Or just a really strong AGI? Also, if AI fears spread to the general public as tech improves isn't it possible that it would take a lot longer to develop even a mild superintelligence because there would be regulations/norms in place to prevent it?

I hope your predictions are right. It could turn out that it's relatively easy to build a 'mild superintelligence' but much more difficult to go all the way.

Roughly, I'm talking something like 10x a human's intelligence, roughly, though in practice it's likely 2-4x assuming it uses the same energy as a human brain.

But in this scenario, scaling up superintelligence is actually surprisingly easy by adding in more energy, and this would allow more intelligence at the cost of more energy.

Also, this is still a world which would have vast changes fast.

I don't believe we will go extinct or have a catastrophe, due to my beliefs around alignment, but this would still represent a catastrophic, potentially existential threat if the AGIs/Mild ASIa wanted to.

Remember, that would allow a personal phone or device to host a mild superintelligence, that is 2-10x more intelligent than humans.

That's a huge deal in itself!