If there is a future where human beings, or ex-human beings, are managing to survive in coexistence with superintelligent AI, then I doubt anyone will be judging the past in a moral way. It would be like humans making moral judgments about our quadrupedal ancestors. The fact that none of us are superintelligent I would expect to color the future's view of us, far more than whether we were individually good or bad, to the limited degree allowed by our merely human intellects. A future with a superintelligent transhumanity will understand its past, it will know that certain individuals did things that allowed the future to come about, and that certain other individuals almost prevented it from happening, but I simply wouldn't expect a valence of moral judgment to be associated with that knowledge.
Obviously these "expectations" of mine are visionary assertions rather than actual statements of knowledge. But this is what I expect, if post-singularity humans are able to become superintelligent too. If they are stuck at human level, then maybe they will have moral feelings about pre-singularity humans, but anyone or anything stuck at human level won't be a full participant in the culture of a superintelligent civilization.
Written quickly for the Inkhaven Residency. Less analytical and more eschatological than usual.
In the story of AI Safety, it is the beginning of the end, or at least, long after the end of the beginning. Ever more crazy events happen ever more quickly as the story barrels toward its inevitable (or perhaps evitable?) conclusion.
What was once considered cutting edge research challenges – AIs that can talk in fluent English, or AIs that can solve basic arithmetic word problems, or even AIs that can generate at all realistic images – has become mundane, so much so that many consider these feats – once considered possible future marvels – to be mere “slop”. What were once bogeymen that lurked in the nebulous future – AIs capable of doing tasks that take human experts over a dozen hours to complete, AIs capable of spotting the vulnerabilities of human code, and also AIs with the potential to automate a large fraction of the work of human coders – have not only become manifest, but have begun to generate tens of billions in revenue for their creators. What were once conceptual problems – the possibility of deceptive alignment, the correlation between values and beliefs and action proclivities, and so forth – are now being studied empirically, though perhaps with slightly different and less confused framing.
At the same time, we see the results of plans set into motion long ago (by the standards of this tale, “a few years ago” is long ago). Trillion dollar companies are being built, by those who could see the faint outlines of the future and sought to gain the resources to shape it with their own hands. The political battle over AI is being fought, as the conspiracies of the past run into the currents of the present.
Even if the imminent approach of the end was not certain, it is at least shockingly plausible.
Today, I write not to adjudicate how much of the story is left, but perhaps merely to muse on my role in it.
In the possibly distant, possibly near future, after the story reaches its conclusion and after all the dust has settled (assuming the conclusion is a sufficiently happy one), there will be time for those at its end to look back at us who were once characters in this story. With both the benefit of hindsight and the assistance of superior intellects, we shall know, then, (if we are still around) which of our actions were obviously foolish and which were shockingly wise and prescient. We shall also know, then, how much each of us have done, and (perhaps) be able to answer if we have done enough.
But the story is not over, and we have neither hindsight nor superhuman intellect to guide us. Perhaps then, it is good that there are so many obvious things to do, so many surprising events to react to, and so many fires to put out. I’ve certainly spent my fair share of time just reacting to the circumstances in which I find myself, and trying to make the world a bit more sane and good outcomes a bit more likely, one project at a time.
Yet, oftentimes, in my more pensive moments, when I find a moment of pause between the insanity and urgency of the day-to-day, I find myself asking: what, exactly, are we doing here? And in the end, by which standard shall we be judged?
Today, I write not to offer a standard by which I shall judge others (and by which I shall in turn be judged), but perhaps merely to express some of my feelings on what I have done.
There’s a quote that I think about a lot these days, that is sometimes attributed to Henry Longfellow: “... we judge ourselves by what we feel capable of doing, while others judge us by what we have already done”.
The standard interpretation is, we tend to judge ourselves too harshly, and far more harshly than others would. While others look to our past accomplishments, and compare those to a world in which said accomplishments do not exist, for those like myself it is easy to compare ourselves instead to a potential role that we could play, in a narrative in our heads, if we were just to apply ourselves to our full potential (whatever that actually means).
By many standards, I have done something in this world for this story: the majority of my adult life has seen me be involved in AI Safety in one way or another, and more than a decade in this field has led to me becoming involved in many subplots. I was involved in early work in mechanistic interpretability, as the field transitioned from a few people to a major academic field. I was one of the first people to work on ARC Evals, before we even really knew what we were doing, and back when we still affectionately referred to the work of dangerous capability evaluations as “model poking”. Throughout the years, I wrote a few other papers, did some grantmaking, and had other miscellaneous adventures. More recently, I co-led the project that became the METR Time Horizon.
I am certainly not a main character, but hopefully, have played a named, supporting role, that appears as a reference in a few chapters.
Yet, I find that, by my own lights, and perhaps in the eyes of those in the future casting judgment upon myself, I have fallen far short of what I wanted to be, and what I feel like I should’ve been. If this is truly the end, then it seems plausible to me this feeling of persistent disappointment will likely be all that there is to my tale, in the end.
But as I write the above, another interpretation of this quote comes to mind: in the future, others shall look back and cast judgment on what to them is what we have already done. Things we feel we are capable of doing today may become things we actually do tomorrow, which in turn become the things by which others will judge us in the future. The quote can be re-interpreted, not as saying that people are wont to be too hard on themselves compared to others in the present, but instead as saying that the deeds by which we shall be judged are not yet done.
All of this makes me think also of Scott Alexander’s piece “The Parable of the Talents”,
I can only hope that, by the wiser standards of the future, they will judge me based on what I actually could’ve done, and not just what I think I could’ve done. If the future is to be wise, then they too will not judge me on why I was not the most important person in the world, except insofar as I actually could be. But for me, stuck in the present, what I shall have done in the future exists only as expectations for what I may be capable of doing.
Today, the story is not yet over. I still have things to do, before it ends.