Jackson Wagner

Engineer working on next-gen satellite navigation at Xona Space Systems. I write about effective-altruist and longtermist topics at nukazaria.substack.com, or you can read about puzzle videogames and other things at jacksonw.xyz

Wikitag Contributions

Comments

Sorted by

My thoughts about the story, perhaps interesting to any future reader trying to decipher the "mysterianism" here, a la these two analyses of sci-fi short stories, or my own attempts at exegeses of videogames like Braid or The Witness.  Consider the following also as a token of thanks for all the enjoyment I've recieved from reading Gwern's various analyses over the years.

  • "Clarity didn't work, trying mysterianism" is the title of a short story by Scott Alexander, which feels to me like an analogy that is arguably about AI, at least insofar as it is clearly about optimization and its dangers.
    • But that story of Scott Alexander's isn't actually very mysterious!  It's kind of a sci-fi / horror story (albeit in a fantasy setting) where it's pretty clear what's going on.  Conversely, this story I can hardly make heads or tails of.  It feels like there should be some kind of AI connection (what else is there to be mysterian about?  surely not just wikipedia editing...), but if it's there, I cannot find it.
  • As other commenters mention, the story's title is a reference to a 1966 sci-fi novel, which I sadly haven't read.
    • This (and to a lesser extent the "eternal september" joke) seem to pin down the date, meaning that there probably can't be much significance to the choice of september 30th in particular for this story -- the date is probably already "explained away" by Gwern presumably wanting to title the story the same as the 1966 novel.  By contrast, the selection of year, 1939, remains more of a free parameter that we need to explain.
  • One clear theme of the story (although one I myself am not much interested in) is a bundle of stuff related to the idea of curation, of writing and editing, summarization and commentary, et cetera.
    • The frame story where we're reading a translation of a review of a... book? i'm not sure exactly... by M. Trente explaining the work of an Institute founded to archive materials related to the Thirtieth.
    • The story feels extremely Borgesian, recalling the impossible literary conciets of "Tlon, Ubquar, Orbit Tertius", "The Library of Babel", and "Pierre Menard, Author of the Quixote".
    • The bits about digitization, Xerox, the internet, seem like winking asides to the fact that this story exists somewhat anachronistically, and perhaps more closely reflects a wikipedia-adjacent form of online scholar / nerd culture.  After all, what else but wikipedia would literally have a page devoted to all the events of September 30th?  Where else but wikipedia would I go to learn about the various events being referenced in the story (the WW2 events, the first televized football game, the birth of Jean-Marie Lehn, and so on.)
    • There are many amusing linguistic jokes, such as the parallelism between the history of the Institute and the history of the two world wars, where the "second war in the world of the Institute" is WW2-themed followed by a cold war of sorts ("...after which the conflict chilled") -- this in particular struck me as very similar to Gwern's meditations on second life sentences.  Or the bit about how the institute hopes that "understanding how the past understood the future can help understand the future of understanding the past".
    • The loving list of obscure real-world institutes that bear some resemblence to the Institute of the story: the Museum of Jurassic Technology, the "Pith" (Pitt?) Rivers Museum, Gravity Research Foundation.  (I'm not sure if the Labyrinth Sodality, Iberian Sundial Society, and the Rekord-Club Gründerzeit are also oblique references to real-world organizations, or are perhaps references to fiction.)
    • The bit about "hapax ephemeron".
    • Perhaps what I like most about this theme-cluster, rather than the absurd scholarly self-reference and bizzare concept of taking as an object of study the cross-section of a single day, nor the paradoxical idea of spending so many days focused on the study of just one day (a la In The Future, Everyone Will Be Famous to Fifteen People), but rather the more familiar impossibility of trying to somehow recapture the subjective nature of experience, the ephemeral freshness and vividness that once belonged to all the moments making up one fall day in 1939, using only the tools of writing and historiography and so forth.
      • That the story ends, after so much obscure academic writing and abstract discussion, with a vividly physical description of a baseball game -- "spitting chaw in the dust" -- "cold, crisp, and fair, a touch of looming winter" -- "the sun over the roofline half past five" -- speaks to this theme, IMO.  The impossibility of trying to conjure back up the vivid freshness of experience via thought and study and analysis, the contrast between the immediacy / immanence of the present versus the mind's tendency to always zip about with innumerable little thoughts of the past and future.
      • See also the preceeding section which starts "The skeptical reader will soon be forced to agree".
      • Note also that the gravitational arc of the baseball that ends the story -- A slow lazy arc, every eye silently following it spell-gaped, past the sun, back down, toward no one but me as I casually hold up my glove for the ball, suddenly knowing it had been pitched to me by God Himself, landing with just the slightest thwack… -- is another analogy for the ephemeral passing of a single day.  It's fast-moving, it lasts only a few seconds, yet the ball is described as if suspended.  Indeed, that moment is indeed suspended, as an image in the memory of the (multiple layers of fiction and frame-story quotation deep) anonymous memoirist of 1969 (coincidentally the year of the institute's founding).
        • Rosier was too young in 1939 to have been the batter in that game.  But he lived in Chicago at that time -- perhaps he was at the game and witnessed that moment, and that crystalline memory is what spurred him to found the Institute?
  • But is there a yet deeper theme to the story?  Or (like my disappointing experience with Unsong), is it for the most part just puns and references all the way down (plus of course the stuff I've described above)?
    • Revisit Gwern's analysis of "Suzzane Deluge" for how deep the rabbit hole of obscured short-story themes might potentially go.  (If that's the case here, it is much further than I will be able to plumb.)
  • First, some dangling loose ends:
    • Why is the story ostensibly translated from French, about an institute in New Orleans, with assorted joking references to "gallophobia" and so forth?
    • What's going on with the "dream diary results, showing anomalous ‘ψ’ predictions of October 1st"?
    • What's up with the adventurous biography of Vincent Rosier?  What kind of impression am I supposed to take from all the wacky details?  Reproducing below:
      • Birth in rural Ireland, emigration to the City of Wind, Chicago, hardscrabble childhood running with youth gangs,
      • the precocious admission to an anthropology major followed by a contretemps while kayaking Papua New Guinea,
      • on behalf of the OSS, which connections proved useful in post-war import-export ventures (making, and losing, his first fortune when his curious  ‘Rosier Cube’ oranges were popular, and then unpopular, in the West).
        • (this bit about making and losing tons of money running an import-export business selling goods during WW2, is also a plot point in Chapter 24 of the novel Catch-22, IIRC?  Rosier Cube oranges are presumably akin to Japan's "square melons".)
      • He was even trusted with back-channel Soviet negotiations over the (still-classified) Mukbang Incident,
        • (seems like a weird reference to a modern-day phenomenon in a story ostensibly set in 1990?)
      • followed by restless expansion into unrelated ventures including his publishing & chemical supply conglomerate (best known for its deodorant),
        • no idea what this is about
      • shrimp fishing (vertically integrated restaurant chain),
      • legendary commodities trades (famously breaking the Newcastle union through locally-delivered coal futures),
      • and an ill-fated Texas ostrich farm investment (now a minor tourist attraction).
      • Eventually he withdrew into Californian venture capital, a seersucker-clad gray eminence among Silicon Valley upstarts, where he amassed the Greco-Roman art collection he is chiefly remembered for.
        • Is there some kind of AI or early-internet connection here?  Are the statues some kind of reference to right-wing "statue in profile pic" culture??  Probably not... more likely it's a reference to the Getty museum in california?
    • More exerpts:
      • If Trente’s exponential bibliometric projections are correct, by “152 AE”, no publication on the 20th century will fail to mention the 30th.
        • The idea of "exponential bibliometric projections" in this context is obviously absurd, and is funny.  The triumphalism of this section certaintly feels like a reference to the world-transformation project of "Tlon, Ubquar, Orbit Tertius", as one hackernews commenter notes.  But this (along with the idea that "a Palo Alto recluse has changed the earth" -- many such cases!) also feels like the most direct reference (if indeed it is one) to a potential AI theme, since the initial cross-post reference to "Clarity didn't work, trying mysterianism" -- insofar as it evokes/spoofs exponential projections by the likes of Moravec, Vinge, and Kurzweil.
        • But note that 152 AE is the year 2091, which strikes me as a little late for scaling-pilled AGI timelines.
      • It makes little difference to us, as we go on revising, in our quiet countryside retirement, an encyclopedia of Casares we shall never publish.
        • Beyond the Borges reference, if we are trying to force an AI reading, this perhaps sounds like Gwern's description of his own life (writing mysterian short stories, etc) amidst a world so rapidly changing that our present era is perhaps comparable to none other than late September, 1939.
  • But perhaps more importantly:
    • Why is the Institute's motto "Lux in Tenebris Diei Unius"?  Google's AI translates as "Light in the Darkness of a Single Day", and comments "It's a variation of the more common phrase 'Lux in tenebris lucet,' which is particularly significant in Christian theology. "
    • Why 1939?
      • Why was the institute founded?  This question is never answered by the story.  Are the "military historian" or "pacifist" factions closer to the truth?
        • If Rosier was born in 1927 then he would've been 12 years old in 1939, which seems too early for him to have been deeply affected by WW2 (especially considering he was living in Chicago, not anywhere in Europe).
        • But, "Pacifist" arguments aside, the choice of year is a free parameter in Gwern's story and obviously significant in connection with World War 2.  So, I am inclined to think that the Pacifists are wrong, and the focus of the Institute (and Gwern's story) is somehow intimately tied up with WW2 in particular.
      • Other than the fact that the Thirtieth has an Institute and today does not, is there some further sense in which today is less "real" than Sept 30, 1939?  The story says "Here, at the end of history, mankind has been disillusioned of ideology and symmetry, and unable to look forward, looks back."  Today, as both the story and Zarathustra and Fukuyama argued, "is no-one and no-where and no-when; and men cannot live in a utopia."
        • 1939 is thus perhaps more sincere (with its many zealous true believers in ideologies like Communism, Fascism, and Democracy), less ironic or cynical or world-weary, and for all those reasons perhaps more real than today.
        • But they are also of course more naive, ignorant, and fundamentally confused (re: those same zealous true believers in Communism and Fascism, the eagerness to launch into patriotic wars, et cetera).
        • Perhaps this is the essential relationship that the past always has to the future?  Today, 2025, as we stand potentially on the cusp of transformative change driven by AI, and also seemingly in a world of steadily increasing international tensions and war (russia/ukraine, then israel/gaza/iran, india/pakistan, perhaps soon china/taiwan?) will probably seem similarly confused and naive and in-denial-about-what's-obviously-coming from the standpoint of 2076, as 1939 seemed from 1990.
  • And finally: why is October the First "too late"?
    • Insofar as it's connected to the 1966 novel: Just skimming the wikipedia page, it's about an Earth that's been jumbled up such that different regions are in different eras (eg, mexico is in the far-future, while greece is in classical times, and France is in 1917).  This, combined with the theme of World War 2, puts me in a mind of alternate-history scenarios.  (But maybe it is just meant to point me towards the jumbled-up nature of extensively studying the Thirtieth in the modern day.)
    • Insofar as it isn't just a reference to the 1966 novel, then we must ask ourselves: October 1st 1939 is too late, for what?
      • Well, obviously it's too late to prevent things that happened on Sept 30th or Oct 1st, namely the surrender of Warsaw to Nazi forces (on the 28th) capture of Modlin Fortress (on the 29th), Hitler's partition of Poland and the establishment of a polish government-in-exile (on the 30th), the entry of Nazis into Warsaw (on Oct 1, starting a period of German occupation of the city until 1945), and this Oct 1 speech by Churchill further committing England to the war and calling up all men age 20-22 for conscription.
      • So, by Sept 30th it's obviously already too late to prevent WW2, since it's already happening.  But perhaps it's not too late to prevent or greatly mitigate the Holocaust (about half of the Jews who died were Poles), wheras Oct 1st somehow would be?
      • Or conversely, perhaps on Sept 30 it's not too late to hope for a mitigated World War, smaller in scope than the actual World War we got?  (ie, perhaps Germany invades Poland and France, but Britain looks the other way and decides to make a "separate peace" with Germany, Hitler never makes the mistake of invading the Soviet Union, and America never gets involved?)  Here I'm just going off Churchill's Oct 1 speech.
      • Both of these scenarios seem pretty ridiculous -- the idea that on Sept 30 the holocaust or the severity of WW2 could totally have been prevented, but October the first is too late, is absurd.  Surely there's just not that much variance from day to day!
      • If we wanted to think more rigorously, we might say that the chances of the Holocaust happening vs not happening (or perhaps this should be graded on a sliding scale of total deaths, rather than being considered as a binary event) are initially pretty low -- if you're re-rolling history starting from the year 1900, perhaps something a Holocaust would only happen 1 in 100 times, or 1% odds?  But of course, if you re-roll history starting in June 1941, then chances are almost 100%, since it's basically already happening.  What about an intermediate year, like 1935?  It must have had some intermediate probability.  So, looking back on history, you can imagine this phantom probability rising higher and lower each year -- indeed, each day -- almost like a stock price or prediction-market price.  Some days (like various days on the timeline of Hitler's rise to power) the odds would rise dramatically, other days the odds would fall.  What was the single most important day, when the probability changed the most?  Is that necessarily the same day as the "hingiest" day, the most influential day for that outcome?  (I don't think it necessarily is?  But I'm confused about this.)
      • And for a more emotional, subjective judgement -- at what point does one say that an event could still be prevented?  At 50/50, it could go either way.  At 80/20, the situation is looking grim, but there is still hope.  At 95/5, the outcome is nearly certain -- but it's not crazy to nevertheless hope that things will change, since after all outcomes with 5% probability do occur 5% of the time.  What about 99/1?  At 99.99/0.01, technically you are still somewhere on the logistic success curve, but from a psychological perspective surely one must round this off to zero, and say that all hope is lost, the outcome is virtually certain.  Perhaps in the world of the story, Sept 30 1939 is the moment where the last significant shred of hope disappears, and the probability (either of the Holocaust, or of a smaller-in-scope WW2, or whatever) tragically jumps from something like 98% to 99.95%.
      • All of the above is perhaps true, and yet the absurdity of trying to pin things down to the individual day remains.  The individual days would surely look pretty insignificant even if we knew the probabilities, and of course we don't know the probabilities.  And for any prospective rather than retrospective scenario (like "will there be a nuclear war in the next 50 years"), we don't even really know where to look, what sub-questions to ask, et cetera.
    • This gets back to my idea that "Today, 2025, as we stand potentially on the cusp of transformative change driven by AI, and also seemingly in a world of steadily increasing international tensions and war, will probably seem similarly confused and naive and in-denial-about-what's-obviously-coming from the standpoint of 2076, as 1939 seemed from 1990."
      • Maybe what really makes Oct 1 1939 "too late" isn't that the objective probabilities (in some omniscient god's-eye-view) mean it's too late to avert some actual event, but rather that Sept 30, 1939 was the last possible date one could be in denial about it -- that one could still /hope/ for the war to be short and largely harmless, rather than vast and world-wrecking.  By Oct 1, the terrible truth would be all too clear.  Of course in the real world, many would continue to hope against all hope, but perhaps Oct 1 was in some sense when it was finally over in the minds of all informed observers -- there is going to be a second world war.
      • So why would you establish an Institute to study that exact moment in time?  Surely because you worry that the present day is in a similar situation -- a situation where we don't know what's coming, even though we SHOULD know what's coming.  In retrospect it will seem so obvious, and we will seem to have our heads in the sand, blind men grasping the elephant, et cetera.  "Lux in Tenebris Diei Unius" -- light out of the darkness of a single day.  Maybe if we study September 1939, Rosier is saying, we can learn how to avoid their failure, and see the threat that's coming to us from a direction we don't presently even know how to look in, and thereby achieve some kind of brighter future.

I think that jumping straight to big-picture ideological questions is a mistake.  But for what it's worth, I tried to tally up some pros and cons of "ideologically who is more likely to implement a post-scarcity socialist utopia" here; I think it's more of a mixed bag than many assume.

I think when it comes to the question of "who's more likely to use AGI build fully-automated luxury communism", there are actually a lot of competing considerations on both sides, and it's not nearly as clear as you make it out.

Xi Jinping, the leader of the CCP, seems like kind of a mixed bag:

  • On the one hand, I agree with you that Xi does seem to be a true believer in some elements of the core socialist dream of equality, common dignity for everyone, and improved lives for ordinary people.  Hence his "Common Prosperity" campaign to reduce inequality, anti-corruption drives, bragging (in an exaggerated but still-commendable way) about having eliminated extreme poverty, etc.  Having a fundamentally humanist outlook and not being an obvious psychopath / destructive idiot / etc is of course very important, and always reflects well on people who meet that description.
  • On the other hand, as others have mentioned, the intense repression of Hong Kong, Tibet, and most of all Xinjiang, does not bode super well if we are thinking "who seems like a benevolent guy in which to entrust the future of human civilization".  In terms of scale and intensity, the extent of the anti-Uyghur police state in Xinjiang seems beyond anything that the USA has done to their own citizens.
    • More broadly, China generally seems to have less respect for individual freedoms, and instead positions themselves as governing for the benefit of the majority.  (Much harsher covid lockdowns are an example of this, as is their reduced freedom of speech, fewer regulations protecting the environment or private property, etc.  Arguably benefits have included things like faster pace of development, fewer covid deaths, etc.)  This effect could cut both ways -- respect for individual freedoms is pretty important, but governing for the benefit of the majority is by definition gonna benefit most ordinary people if you do it well.
  • Your comment kind of assumes that china = socialist and socialism = more willingness to "redistribute resources in an equitable manner".  But Xi has taken pains to explain that he is very opposed to what he calls "welfarism" -- in his view, socialism doesn't involve China handing out subsidized healthcare, retirement benefits, etc to a "lazy" population, like we do here in the decadent west.  This attitude might change in the future if AGI generates tons of wealth (right now they are probably afraid that Chinese versions of social security and medicare might blow a hole in the government budget, just like it is currently blowing a hole in the US budget)...
    • ...But it also might not!  Xi generally seems weirdly unconcerned with the day-to-day suffering of his people, not just in a "human rights abuses against minorities" sense, but also in the sense that he is always banning "decadent" forms of entertainment like videogames, boy bands, etc, telling young people to suck it up and "eat bitterness" because hardship builds character, etc.
    • China has been very reluctant to do western-style consumer stimulus to revive their economy during recessions -- instead of helping consumers afford more household goods and luxuries, Xi usually wants to stimulate the economy by investing in instruments of national military/industrial might, subsidising strategic areas like nuclear power, aerospace, quantum computing, etc.

Meanwhile on the American side, I'd probably agree with you that the morality of America's current national leaders strikes me as... leaving much to be desired, to put it lightly.  Personally, I would give Trump maybe only 1 or 1.5 points out of three on my earlier criteria of "fundamentally humanist outlook + not a psychopath + not a destructive idiot".  

  • But America has much more rule of law and more checks-and-balances than China (even as Trump is trying to degrade those things), so the future of AGI would perhaps not as much be solely in the hands of the one guy at the top.
  • And also, more importantly IMO, America is a democracy, which means a lot can change every four years and the population will have more of an ability to give feedback to the government during the early years of AGI takeoff.
    • In particular, beyond just swapping out the current political party for leaders from the other political party, I think that if ordinary people's economic position changed very dramatically due to the introduction of AGI, American politics would probably also shift very rapidly.  Under those conditions, it actually seems pretty plausible that America could switch ideologies to some kind of Georgist / socialist UBI-state that just pays lip service to the idea of capitalism -- kinda like how China after Mao switched to a much more capitalistic system that just pays lip service ("socialism with chinese characteristics") to many of the badly failed policies of Maoism.  So I think the odds of "the US stays staunchly capitalist" are lower than the odds of "China stays staunchly whatever-it-is-currently", just because America will get a couple opportunities to radically change direction between now and whenever the long-term future of civilization gets locked in, wheras China might not.
  • In contrast to our current national leaders, some of the leaders of top US AI labs strike me as having pretty awesome politics, honestly.  Sam Altman, despite his numerous other flaws, is a Georgist and a longtime supporter of UBI, and explicitly wants to use AI to achieve a kind of socialist utopia.  Dario Amodei's vision for the future of AI is similarly utopian and benevolent, going into great detail about how he hopes AI will help cure or prevent most illness (including mental illness), help people be the best versions of themselves, assist the economic development of poor countries, help solve international coordination problems to lead to greater peace on earth, etc.  Demis Hassabis hasn't said as much (as far as I'm aware), but his team has the best track record of using AI to create real-world altruistic benefits for scientific and medical progress, such as by creating Alphafold 3.  Maybe this is all mere posturing from cynical billionares.  But if so, the posturing is quite detailed and nuanced, indicating that they've thought seriously about these views for a long time.  By contrast, there is nothing like this coming out of Deepseek (which is literally a wall-street style hedge fund combined with an AI lab!) or other Chinese AI labs.

Finally, I would note that you are basically raising concerns about humanity's "gradual disempowerment" through misaligned economic and political processes, AI concentration-of-power risks where a small cadre of capricious national leaders and insiders gets to decide the fate of humanity, etc.  Per my other comment in this thread, these types of AI safety concerns seem like right now they are being discussed almost exclusively in the West, and not in China.  (This particular gradual-disempowerment stuff seems even MORE lopsided in favor of the West, even compared to superintelligence / existential risk concerns in general, which are already more lopsided in favor of the West than the entire category of AI safety overall.)  So... maybe give some weight to the idea that if you are worried about a big problem, the problem might be more likely to get solved in the country where people are talking about the problem!

@Tomás B.  There is also vastly less of an "AI safety community" in China -- probably much less AI safety research in general, and much less of it, in percentage terms, is aimed at thinking ahead about superintelligent AI.  (ie, more of China's "AI safety research" is probably focused on things like reducing LLM hallucinations, making sure it doesn't make politically incorrect statements, etc.)

  • Where are the chinese equivalents of the American and British AISI government departments?  Organizations like METR, Epoch, Forethought, MIRI, et cetera?
  • Who are some notable Chinese intellectuals / academics / scientists (along the lines of Yoshua Bengio or Geoffrey Hinton) who have made any public statements about the danger of potential AI x-risks?
  • Have any chinese labs published "responsible scaling plans" or tiers of "AI Safety Levels" as detailed as those from OpenAI, Deepmind, or Anthropic?  Or discussed how they're planning to approach the challenge of aligning superintelligence?
  • Have workers at any Chinese AI lab resigned in protest of poor AI safety policies (like the various people who've left OpenAI over the years), or resisted the militarization of AI technology (like googlers protesting Project Maven, or microsoft employees protesting the IVAS HMD program)?

When people ask this question about the relative value of "US" vs "Chinese" AI, they often go straight for big-picture political questions about whether the leadership of China or the US is more morally righteous, less likely to abuse human rights, et cetera.  Personally, in these debates, I do tend to favor the USA, although certainly both the US and China have many deep and extremely troubling flaws -- both seem very far from the kind of responsible, competent, benevolent entity to whom I would like to entrust humanity's future.

But before we even get to that question of "What would national leaders do with an aligned superintelligence, if they had one," we must answer the question "Do this nation's AI labs seem likely to produce an aligned superintelligence?"  Again, the USA leaves a lot to be desired here.  But oftentimes China seems to not even be thinking about the problem.  This is a huge issue from both a technical perspective (if you don't have any kind of plan for how you're going to align superintelligence, perhaps you are less likely to align superintelligence), AND from a governance perspective (if policymakers just think of AI as a tool for boosting economic / military progress and haven't thought about the many unique implications of superintelligence, then they will probably make worse decisions during an extremely important period in history).

Now, indeed -- has Trump thought about superintelligence?  Obviously not -- just trying to understand intelligent humans must be difficult for him.  But the USA in general seems much more full of people who "take AI seriously" in one way or another -- sillicon-valley CEOs, pentagon advisers, billionare philanthropists, et cetera.  Even in today's embarassing administration, there are very high-ranking people (like Elon Musk and J. D. Vance) who seem at least aware of the transformative potential of AI.  China's government is more opaque, so maybe they're thinking about this stuff too.  But all public evidence suggests to me that they're kinda just blindly racing forward, trying to match and surpass the West on capabilities, without giving much thought as to where this technology might ultimately go.

Thanks for this informative review!  (May I suggest that The Witness is a much better candidate for "this generation's Myst"!)

That's a good point -- the kind of idealized personal life coach / advisor Dario describes in his post "Machines of Loving Grace" is definitely in a sense a personality upgrade over Claude 3.7.  But I feel like when you think about it more closely, most of the improvements from Claude to ideal-AI-life-coach are coming from non-personality improvements, like:

  • having a TON of context about my personal life, interests, all my ongoing projects and relationships, etc
  • having more intelligence (including reasoning ability, but also fuzzier skills like social / psychological modeling) to bring to bear on brainstorming solutions to problems, or identifying the root cause of various issues, etc.  (does the idea of "superpersuasion" load more heavily on superintelligence, or on "superpersonality"?  seems like a bit of both; IMO you would at least need considerable intelligence even if it's somehow mostly tied to personality)
  • even the gains that I'd definitely count as personality improvements, might not all come primarily from more-tasteful RLHF creating a single, ideal super-personality (like what Claude currently aims for).  Instead, an ideal AI advisor product would probably be able to identify the best way of working with a given patient/customer, and tailor its personality to work well with that particular individual.  RLHF as practiced today can do this to a limited extent (ie, claude can do things like sense whether a formal vs informal style of reply would be more appropriate, given the context), but I feel like new methods beyond centralized RLHF might be needed to fully customize an AI's personality to each individual.

Semi-related: if I'm reading OpenAI's recent post "How we think about safety and alignment" correctly, they seem to announce that they're planning on implementing some kind of AI Control agenda.  Under the heading "iterative development" in the section "Our Core Principles" they say:

In the future, we may see scenarios where the model risks become unacceptable even relative to benefits. We’ll work hard to figure out how to mitigate those risks so that the benefits of the model can be realized. Along the way, we’ll likely test them in secure, controlled settings. We may deploy into constrained environments, limit to trusted users, or release tools, systems, or technologies developed by the AI rather than the AI itself.

Given the surrounding context in the original post, I think most people would read those sentences as saying something like: "In the future, we might develop AI with a lot of misuse risk, ie AI that can generate compelling propaganda or create cyberattacks.  So we reserve the right to restrict how we deploy our models (ie giving the biology tool only to cancer researchers, not to everyone on the internet)."

But as written, I think OpenAI intends the sentences above to ALSO cover AI control scenarios like: "In the future, we might develop misaligned AIs that are actively scheming against us.  If that happens, we reserve the right to continue to use those models internally, even though we know they're misaligned, while using AI control techniques ('deploy into constrained environments, limit to trusted users', etc) to try and get useful superalignment work out of them anyways."

I don't have a take on the pros/cons of a control agenda, but I haven't seen anyone else note this seeming policy statement of OpenAI's, so I figured I'd write it up.

Maybe this is obvious / already common knowledge, but I noticed that OpenAI's post seems to be embracing an AI control agenda for their future superalignment plans.  The heading "iterative development" in the section "Our Core Principles" says the following (emphasis mine):

It’s an advantage for safety that AI models have been growing in usefulness steadily over the years, making it possible for the world to experience incrementally better capabilities. This allows the world to engage with the technology as it evolves, helping society better understand and adapt to these systems while providing valuable feedback on their risks and benefits. Such iterative deployment helps us understand threats from real world use and guides the research for the next generation of safety measures, systems, and practices.

In the future, we may see scenarios where the model risks become unacceptable even relative to benefits. We’ll work hard to figure out how to mitigate those risks so that the benefits of the model can be realized. Along the way, we’ll likely test them in secure, controlled settings. We may deploy into constrained environments, limit to trusted users, or release tools, systems, or technologies developed by the AI rather than the AI itself. These approaches will require ongoing innovation to balance the need for empirical understanding with the imperative to manage risk. For example, making increasingly capable models widely available by sharing their weights should include considering a reasonable range of ways a malicious party could feasibly modify the model, including by finetuning (see our 2024 statement on open model weights). We continue to develop the Preparedness Framework to help us navigate and react to increasing risks.

I think most people would read those bolded sentences as saying something like: "In the future, we might develop AI with a lot of misuse risk, ie AI that can generate compelling propaganda or create cyberattacks.  So we reserve the right to restrict how we deploy our models (ie giving the biology tool only to cancer researchers, not to everyone on the internet)."

But as written, I think OpenAI intends the sentences above to ALSO cover scenarios like: "In the future, we might develop misaligned AIs that are actively scheming against us.  If that happens, we reserve the right to continue to use those models internally, even though we know they're misaligned, while using AI control techniques ('deploy into constrained environments, limit to trusted users', etc) to try and get useful superalignment work out of them anyways."

I don't have a take on the broader implications of this statement (trying to get useful work out of scheming AIs seems risky but also plausibly doable, and other approaches have their own risks, so idk). But I haven't seen anyone else note this seeming policy statement of OpenAI's, so I figured I'd write it up.

I enjoyed this post, which feels to me part of a cluster of recent posts pointing out that the current LLM architecture is showing some limitations, that future AI capabilities will likely be quite jagged (thus more complementary to human labor, rather than perfectly substituting for labor as a "drop-in remote worker"), and that a variety of skills around memory, long-term planning, agenticness, etc, seem like like important bottlenecks.

(Some other posts in this category include this one about Claude's abysmal Pokemon skills, and the section called "What I suspect AI labs will struggle with in the near term" in this post from Epoch).

Much of this stuff seems right to me.  The jaggedness of AI capabilities, in particular, seems like something that we should've spotted much sooner (indeed, it feels like we could've gotten most of the way just based on first-principles reasoning), but which has been obscured by the use of helpful abstractions like "AGI" / "human level AI", or even more rigorous formulations like "when X% of tasks in the economy have been automated".

I also agree that it's hard to envision AI transforming the world without a more coherent sense of agency / ability to play pokemon / etc, although I'm agnostic over whether we'll be able to imbue LLMs with agency via tinkering with scaffolds and training with RL (as discussed elsewhere in this comment thread).  At least mild versions of agency seem pretty doable with RL -- just train on a bunch of videogames and web-browsing tasks, and I expect AI to get pretty good at completing videogames and web tasks.  But whether that'll scale all the way to being able to manage large software projects and do people's entire jobs autonomously, I dunno.

If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions.

...there's an explicit carve-out for ??? consequences if math is solved

This got me curious, so I talked to Claude about it.  Unfortunately it seems like some of the biggest real-world impacts of "solving math" might come in the form of very significant AI algorithmic improvements, which might obviate some of your other points!  (Also: the state of cybersecurity might be thrown into chaos, quant trading would get much more powerful albeit not infinitely powerful, assorted scientific tools could see big improvments.)  Here is my full conversation; for the most interesting bit, scroll down to Claude's final response (ctrl-f for "Category 1: Direct Mathematical Optimization).

Improved personality is indeed a real, important improvement in the models, but (compared to traditional pre-training scaling) it feels like more of a one-off "unhobbling" than something we should expect to continue driving improved performance in the future.  Going from pure next-token-predictors to chatbots with RLHF was a huge boost in usefulness.  Then, going from OpenAI's chatbot personality to Claude's chatbot personality was a noticeable (but much smaller) boost.  But where do we go from here?  I can't really imagine a way for Anthropic to improve Claude's personality by 10x or 100x (whatever that would even mean).  Versus I can imagine scaling RL to improve a reasoning model's math skills by 100x.

Load More