It is good that the tradition of Agent Foundations have had a fascination with crystals before as well, I in fact think that we need to embrace the crystal side of rationality more and so I thank you for bringing this gem (pun intended) forth.
Interesting points!
I will admit to not being the most well-versed in crystals so that might be true. I'm wondering if there's some sort of thing where you can think about a specific ontology of the world as one structure and another ontology as another structure?
For example, we know that there are incompatible world views in the world and an easy example is politics. The ground state of news is the same (enthropic solution) and the view that you pick up of that base state is going to be different dependent on how you process it?
Processing information is therefore kind of taking a soup of information and crystallizing it?
I like to think about utility functions and these crystalline surfaces and stuff as interpretaive frames. I think it is within chapter 6 of Godel Escher Bach that he talks about a jukebox and how it only knows how to play a specific disc by being preprogrammed with a tape reader. I think these crystals are the same and that it is some sort of combination of value function and ontology that creates a way of thinking.It is a way of interpreting information that comes in and it can only be shown through the specific type of crystalline structure that appears?
I think there's something interesting here from an information theory perspective but I might just be overinterpreting and analogising a connection that isn't there.
Fair points across the board, in retrospect I think I shouldn't have released this post in the state it was in but I wanted to have something more to bite in as I felt there wasn't enough in the original.
The audience for me here was less RL-based utility learning setups and rather more of a focus on devinterp perspectives but I don't know it well enough to write something good about it.
So lesson learnt, and thank you for the feedback.
If you want to look more into the symmetry learning direction I like GDL as a way of thinking about it:
(More canonical resource:)
http://geometricdeeplearning.com/
(My favourite explainer:)
https://arxiv.org/abs/2508.02723
I'm curious about the details of your model when it comes to long-time horizon planning:
There is often a recursive structure to doing tasks that take months to years, of which a simplified version might look like:
- Decompose the task into substasks
- Attempt to solve a subtask
- If you succeed, go on to the next subtask
- If you fail, either try again, i.e. (2), or revisit your overall plan, i.e. (1)
The skills needed to execute this loop well include:
- Making and revising plans
- Intuitions for promising directions
- Task decomposition
- Noticing and correcting mistakes
Let’s call this bucket of skills long-horizon agency skills. It seems like for long enough tasks, these are the primary skills determining success, and importantly they are applied recursively many times. Such that, improving at long-horizon tasks is mostly loaded on improving at long-horizon agency, while improving at short-horizon tasks is mostly loaded on improving at specialized knowledge.
I do understand that these are more of the justifications for why you might extrapolate data in the way that you're doing yet I find myself a bit concerned with the lack of justification for this (in the post). This might just be because of infohazard reasons in which case, fair enough.
For example, I feel that this definition above applies to something like a bacterial colony developing antibiotic resistance:
Now the above examples is obviously not the thing that you're trying to talk about. The point I'm trying to make is that your planning definition applies to a bacteria colony and that it therefore is not specific enough?
In order to differentiate between a bacterial colony and a human there are a set of specific properties that I feel need more discussion to make the model rigorous:
Maybe a bacterial colony and humans are on the same planning spectrum and there's some sort of search based version of the bitter lesson that says that "compute is all you need" yet it feels like there are phase transitions in between bacterial colonies and humans and that this is not a continous model. Does compute give you self representations? Does compute enable you to do online learning? Does compute + search give you the planning apparatus and memory bank that the brain has?
How do you know that 12+ hours tasks don't require a set of representations that are not within what your planning model is based on? How do you know that this is not true for 48+ hours tasks?
To be clear, I applaude the effort of trying to forecast the future and if you can convince me that I'm wrong here it will definetely shorten my timelines. It makes sense to try the most obvious thing first and assuming a linear relationship seems like the most obvious thing. (yet I still have the nagging suspicion that the basis of your model is wrong as there are probably hidden phase transitions between going from a bacterial colony in planning function and a human.)
TL;DR
I guess the question I'm trying to ask is: What do you think the role of simulation and computation is for this field?
Longer:
Okay, this might be a stupid thought but one could consider MARL environments and for example https://github.com/metta-AI/metta (softmax) to be a sort of generator function of these sorts of reward functions potentially?
Something something it is easier to program constraints into how the reward function and have gradient descent discover it than it is to fully generate it from scratch.
I think there's mainly a lot of theory work that's needed here but there might be something to be said about having a simulation part as well where you do some sort of combinatorial search for good reward functions?
(Yes, the thought that it will solve itself if we just bring it in to a cooperative or similar MARL scenario and then do IRL on that is naive but I think it might be an interesting strategy if we think about it as combinatorial search problem that needs to satisfy certain requirements?)
Nor is this process about reality (as many delusional Buddhists seem to insist), but more like choosing to run a different OS on ones hardware.
(I kind of wanted to give some nuance on the reality part from the OS Swapping perspective. You're of course right with some overzealous people believing they've found god and similar but I think there's more nuance here)
If we instead take your perspective of OS swap I would say it is a bit like switching from Windows to Linux because you get less bloatware. To be more precise one of the main parts of the swap is the lessening of the entrenchments of your existing priors. It's gonna take you a while to set up a good distro but you will be less deluded as a consequence and also closer to "reality" if reality is the ability to see what happens with the underlying bits in the system. As a consequence you can choose from more models and you start interpreting things more in real time and thus you're closer to reality, what is happening now rather than the story of your last 5 years.
Finally on the pain of the swap, there are also more gradual forms of this, you can try out Ubuntu (mindfulness, loving kindness) before switching over. Seeing through your existing stories can happen in degrees, you don't have to become enlightened to enjoy the benefits?
Also, I think that terminology can lead to specific induced states as it primes your mind for certain things.
One of the annoying things with meditation is of course that there's n=1 primary experience that makes it hard to talk about yet from my perspective it seems a bit like insight cycling, dark night of the soul and the hell realms are something that can be related to a hyperstition or a specific way of practicing?
If you for example follow thai-forest tradition, mahamudra or dzogchen (potentially advaita though less certain) it seems that insights along those lines are more a consequence of not having established a strong enough 1 to 1 correspondence with loving awareness before doing intense concentration meditation? (Experience has always been happening, yet the basis for that experience might be different.)
It is a bit like the difference between dissolving into a warm open bath or a warm embrace or hug of the world versus seeing through the world to an abyss where there is no ground. That groundlessness seems to be shaped by what is there to meet it and so I'm a bit worried about the temporal cycling language as it seems to predicate a path on what has no ground?
I don't really have a good solution here as people seem to be going through those sort of experiences that you're talking about and it isn't like I've not gotten depressive episodes after longer meditation epxperiences either. Yet I don't know if I would call it a dark night of the soul for it implies a necessity of personation with the suffering and that is not what is primary? Language is a prior for experience and so I would just use different language myself but whatever.
Man I'm noticing this is hard to put into words, hopefully some of it made sense and I appreciate the effort for a more standardised cybernetic basis to talk about these things through.
dissolution of desire. An altered trait where your brain's reinforcement learning algorithm is no longer abstracted into desire-as-suffering.
Would you analogize this term to the insights of "dukkha"? I find an important thing here to be the equal taste of joy and sorrow from the perspective of dukkha and so it might be worth emphasising? (maybe I'm off with that though.)
I'm no pro at the creative stuff but I've found that when I allow myself to have a fun zone where I just produce stuff whenever things appear it seems like I have generally better thoughts. (Similar to more procedural virtue ethics models that you have)
I spend at least 5-10 hours a week in this space and I think it is correlated with thinking that anything goes. I kind of have a separate evaluator system that I apply in other circumstances. No impact evaluation, just cool thoughts in this space. I also have a new saying that I like to follow "if it's fun it's fine."
It also coincides nicely with a specific type of meditative skill that is a version of the problem solving walk described in the appendix of the mind illuminated about holding different kinds of mental spaces for different purposes (see appendix b here). Also if you believe in a constructivist theory of emotion then being more attune with your emotion is also being more intune with your research taste as you notice smaller pointers to useful bits of information!
A bit random but hopefully a bit relevant