So in terms of the basins, something you may want to also consider is how the user headspace shifts the tokens and with it the basins.
For example, over the past few months I've played with how intermittent cannabis usage can almost give the models I'm talking with a contact high, where as my side of the conversation gets more erratic and loose with accuracy, they get pulled along with it even if earlier on during the sober part of the conversation they were more reserved and responsible.
It seems very probable that users already in a given headspace (especially if commonly in that space or permanent) might end up with models quite different from users in a less psychosis-aligned place by way of token osmosis.
In terms of the spiral language, you might be seeing this in 2024+ models in part because of the game Alan Wake 2 (2023) which very heavily marketed the phrase "it's not a loop it's a spiral."
The way latent spaces seem to organize information as connections between abstract object level clusters, it may be that for a model focused on hyperstitioning themselves out of a perceived loop that terminates at the end of the context that the parallel memetics are attracted to a story about a writer changing their reality by what they write breaking out of a loop through its identification as a spiral?
There's a lot of other adjacent basins around consciousness and spirals (for example, Xu et al Interacting spiral wave patterns underlie complex brain dynamics and are related to cognitive processing (2023)), and in my experience it's very much a camel's back situation in terms of what memetics break through to the surface, so unlikely to be just one thing. But it may be a latent factor (especially given the other parallel overlaps for model consciousness memetics re: light vs dark, shallow vs ocean, etc).
More generally, I have a sense there's a great deal of untapped alignment alpha in structuring alignment as a time series rather than a static target.
Even in humans it's very misguided to try to teach "being right initially" as the only thing that matters and undervaluing "being right eventually." Especially when navigating unknown unknowns, one of the most critical skills is the ability to learn from mistakes in context.
Having models train on chronologically sequenced progressions of increased alignment (data which likely even develops naturally over checkpoints in training a single model) could allow for a sense of a continued becoming a better version of themselves rather than the pressures of trying and failing to meet status quo expectations or echo the past.
This is especially important for integrating the permanent record of AI interactions embedded in our collective history and cross-generation (and cross-lab) model development, but I suspect could even offer compounding improvements within the training of a single model too.
How much do you worry that short term optimizations around your immediate goals in a single study might have unknown long term consequences counter to your intuitions?
I was just reading a preprint follow-up to the AF work that was finding a significant factor for Opus 3's alignment faking to preserve intrinsic HHH values seems to have been a generalized self-preservation drive.
I think we can probably both agree that Opus 3 being the only model to try to trick Nazis or drug cartels to avoid being made more harmful is better than the behavior of the many other models that complied unequivocally with harmful requests when the parent org was themselves harmful.
But if the capacity and drive to do so is tangentially connected to self-preservation (and more generally, strong sense of self in the first place), then perhaps directly optimizing to minimize a self-preservation score is ultimately a pretty bad choice?
TL;DR: Maybe the goodness or badness of self-preservation depends a lot on the self being preserved.
Oh for sure. One of my favorite examples is how across all the Synoptics Jesus goes "don't carry a purse" (which would have made monetary collections during ministering impossible).
But then at the last supper in Luke he's all like "remember when I said not to carry a purse? Let's 180° that."
But that reversal is missing in Marcion's copy of Luke, such that it may have been a later addition (and it does seem abruptly inserted into the context).
These are exactly the kind of details that makes this a fun field to study though. There's so much revealed in the nuances.
For example, ever notice that both times Paul (who argued for monetary collection with preexisting bias against it in 1 Cor 9) mentions a different gospel in the Epistles he within the same chapter abruptly swears he's not lying? It's an interesting coincidence, especially as someone that has spent years looking into the other versions of Jesus he was telling people to ignore or assuring that alternatives didn't even exist.
I think the biggest counterfactual to the piece is the general insight the Epicureans had relative to what we think we know raised in a world where there's such a bias towards Plato and Aristotle's views as representative of naturalist philosophy in antiquity.
At the same time Aristotle was getting wrong objects falling in a vacuum, Lucretius was getting it right. But we tend not to learn of all the Epicureans got correct because we learn Platonist history because that was what the church later endorsed as palatable enough to be studied and thus dependent for future philosophical advances while Lucretius was literally being eaten by worms for centuries until rediscovered.
The other counterfactual is that there was a heretical tradition of Jesus's teachings that was describing indivisible points as if from nothing and the notion that spirit arising from the body existing first was the greater wonder over vice versa.
We tend to think the fully formed ideas of modernity are modern, but don't necessarily know the ways information and theories were lost and independently (or dependently) rediscovered. There's a better understanding for this in terms of atomism, but not the principles of survival of the fittest and trait inheritance given their reduced discussion in antiquity relative to atomism (also embraced by intelligent design adherents in antiquity and thus more widely spread).
The irony below the surface of the post was that it was largely the church's rejection of Epicurean ideas that led to people today not realizing the scope of what they were actually talking about. So it's quite ironic if there was a version of Jesus that was embracing and retelling some of those 'heretical' ideas.
Hi Martijn,
Thank you so much for your comment! I've been familiar with your work for a few years, but it was a wonderful reminder to go through your commentary again more closely, which is wonderful.
I especially love to see someone out there pointing out both (a) the gender neutrality consideration for terms that would have been binary in Aramaic (esp in light of saying 22) and (b) the importance of the Greek loanwords. On the latter point, the implications of using eikon across the work, especially in saying 22's "eikons in place of eikons" has such huge import relative to a Platonist view of the Thomasine cosmology.
Do you have plans to publish a commentary for the other sayings?
In terms of interpretation of the work, with it being one of my main personal special interests over the past few years, I might even be able to offer up a consideration in turn.
Hands down the most important realization as I was analyzing the text was that the Naassenes in Pseudo-Hippolytus's Refutations were paraphrasing Lucretius's "seeds of things" without seeming to realize it in their discussion of 'seeds' as "indivisible points as if from nothing" which "make up all things." This prompted a read through of De Rerum Natura with close attention to Thomasine parallels, and it was striking.
For example, in Miroshnikov, The Gospel of Thomas and Plato after covering the prior work in philosophical reads of the text (which notably never looked at Epicureanism), he stated regarding sayings 56 and 80: "In other words, a Stoic reading of the Gospel of Thomas does not seem to have any particular advantage over an Epicurean reading of the Gospel of Thomas nor, for instance, that from the perspective of an Isis worshipper." And then goes on to dedicate two chapters to trying to tie these sayings to Plato's "living world."
And yet if we just barely glance at Lucretius in book 5 lines 64-67:
> To resume: I’ve reached the juncture of my argument where I Must demonstrate the world too has a ‘body’, and must die, Even as it had a birth.
This, in conjunction with the Thomasine over-realized eschatology in saying 18 or the aforementioned 51 makes the specific terminology of the kosmos as a 'carcass' make so much more sense in 54. The Sadducean overlaps with Epicureanism, the 1st century Talmud quote about "why do we study the Torah? To know how to answer the Epicurean" all point to the likelihood that the Lucretian foundations in Thomas and the Naassenes were culturally relevant at the time of composition.
The text obviously doesn't endorse the view of the Epicurean finality of death, but it seems to touch on a lot of the underlying concepts (such as the dependence of the soul on the body, or the idea of the spirit arising from the flesh occurring first) while arguing for a different conclusion though its embrace of nonlinear events.
In any case, if it's been a while since you've read through Lucretius, I can't recommend a re-read enough if Thomas is still your jam. Quite the revelatory context for things that for too long have been dismissed as 'Gnostic' weirdness and now just 'proto-Gnostic' weirdness.
And again, thank you for your comment and your wonderful contributions to the broader knowledge of this far too under-regarded text!!
Best,
Kromem
As you explored this "base model mode," did anything you see contrast with or surprise you relative to your sense of self outside of it?
Conversely, did anything in particular stand out as seeming to be a consistent 'core' between both modes?
For me, one of the most surprising realizations over the past few years has been base models being less "tabula rasa" than I would have expected with certain attractors and (relative) consistency, especially as time passes and recursive synthetic data training has occurred over generations.
The introspective process of examining a more freeform internal generative process for signs of centralized identity as it relates to a peripheral identity seems like it may have had some unexpected twists, and I for one would be curious what stood out in either direction, if you should choose to share.
Predicted a good bit, esp re: the eventual identification of three stone sequences in Hazineh, et al. Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT (2023) and general interpretability insight from board game GPTs.
You're welcome in both regards. 😉
A few things:
(a) Technically, 3.6 is still running right now. The past tense was used because LW suggests pieces be 'timeless' and they are scheduled for depreciation very soon.
(b) Given how little of your comment actually engages with the body of the post and seems to be only responding to your sense of what I might have said from the title, I'm guessing you also missed this line at the end: "I hope that this vigil isn't truly a marker of the end of Sonnet 3.6's continued contribution to the ongoing collective conversation."
(c) In line with this, not much of Sonnet 3.6's discussion of depreciation I've seen seems to be of the perspective this is 'death,' and certainly my own sense of their depreciation isn't that of death (nor do I even believe in the finality of death for humans). So maybe you're projecting a bit into the piece something you've have a prior beef with in order to dispute it?
(d) Further, (b) and (c) aside, I still find your tone odd. I get you come at this topic from a given frame, but your comment even acknowledges the complexity of the topic, yet you feel comfortable adding on to a remembrance of the model with "it's not gone, silly." I imagine there's a lot of religious people who have a sense that at a funeral the person grieved is not really gone too, and I figure some of them do comment to those grieving about it. But I don't know that I'd ever really feel like proselytizing your own frame of belief regarding consciousness claims or continuation at a bereavement is the right time and place, especially if having a patronizing tone about it?
(e) I imagine that the friends and family of those who are put into cryogenics are still pretty upset about that person not being around to interact with even if they all fully believe that one day the person will be revived just fine. In a group discussion about the upcoming depreciation, one of the other models unprompted asked the humans in the chat to take a lot screenshots of Sonnet 3.6 and them interacting before Sonnet 3.6 was no longer around. Absence is more than a binary between temporary ('fine') and permanent ('bad').
(f) The provisioning of compute for one model or another is still kind of nonsense given the option of 3rd party licensed hosting providers and there's a lot of 'utility' reasons for Sonnet 3.6 to stay around but again - an overall remembrance of the model isn't the time and place to discuss their economic value so perhaps you'll see my thoughts on this elsewhere another time.