TL;DR
I guess the question I'm trying to ask is: What do you think the role of simulation and computation is for this field?
Longer:
Okay, this might be a stupid thought but one could consider MARL environments and for example https://github.com/metta-AI/metta (softmax) to be a sort of generator function of these sorts of reward functions potentially?
Something something it is easier to program constraints into how the reward function and have gradient descent discover it than it is to fully generate it from scratch.
I think there's mainly a lot of theory work that's needed here but there might be something to be said about having a simulation part as well where you do some sort of combinatorial search for good reward functions?
(Yes, the thought that it will solve itself if we just bring it in to a cooperative or similar MARL scenario and then do IRL on that is naive but I think it might be an interesting strategy if we think about it as combinatorial search problem that needs to satisfy certain requirements?)
Nor is this process about reality (as many delusional Buddhists seem to insist), but more like choosing to run a different OS on ones hardware.
(I kind of wanted to give some nuance on the reality part from the OS Swapping perspective. You're of course right with some overzealous people believing they've found god and similar but I think there's more nuance here)
If we instead take your perspective of OS swap I would say it is a bit like switching from Windows to Linux because you get less bloatware. To be more precise one of the main parts of the swap is the lessening of the entrenchments of your existing priors. It's gonna take you a while to set up a good distro but you will be less deluded as a consequence and also closer to "reality" if reality is the ability to see what happens with the underlying bits in the system. As a consequence you can choose from more models and you start interpreting things more in real time and thus you're closer to reality, what is happening now rather than the story of your last 5 years.
Finally on the pain of the swap, there are also more gradual forms of this, you can try out Ubuntu (mindfulness, loving kindness) before switching over. Seeing through your existing stories can happen in degrees, you don't have to become enlightened to enjoy the benefits?
Also, I think that terminology can lead to specific induced states as it primes your mind for certain things.
One of the annoying things with meditation is of course that there's n=1 primary experience that makes it hard to talk about yet from my perspective it seems a bit like insight cycling, dark night of the soul and the hell realms are something that can be related to a hyperstition or a specific way of practicing?
If you for example follow thai-forest tradition, mahamudra or dzogchen (potentially advaita though less certain) it seems that insights along those lines are more a consequence of not having established a strong enough 1 to 1 correspondence with loving awareness before doing intense concentration meditation? (Experience has always been happening, yet the basis for that experience might be different.)
It is a bit like the difference between dissolving into a warm open bath or a warm embrace or hug of the world versus seeing through the world to an abyss where there is no ground. That groundlessness seems to be shaped by what is there to meet it and so I'm a bit worried about the temporal cycling language as it seems to predicate a path on what has no ground?
I don't really have a good solution here as people seem to be going through those sort of experiences that you're talking about and it isn't like I've not gotten depressive episodes after longer meditation epxperiences either. Yet I don't know if I would call it a dark night of the soul for it implies a necessity of personation with the suffering and that is not what is primary? Language is a prior for experience and so I would just use different language myself but whatever.
Man I'm noticing this is hard to put into words, hopefully some of it made sense and I appreciate the effort for a more standardised cybernetic basis to talk about these things through.
dissolution of desire. An altered trait where your brain's reinforcement learning algorithm is no longer abstracted into desire-as-suffering.
Would you analogize this term to the insights of "dukkha"? I find an important thing here to be the equal taste of joy and sorrow from the perspective of dukkha and so it might be worth emphasising? (maybe I'm off with that though.)
Here's an extension of what you said in terms of dullness and sharpness within attention based practices. (Partly to check that I understand)
Dullness = subcriticality and distance in cascading below the criticality line
Monkey mind = supercriticality and cascading above the criticality line (activates for whatever shows up)
If we look at the 10 stages of TMI (9-stage Elephant path), the progression goes something like distracted mind -> subcriticality (stage 2-3) -> practices to increase cascading of brain (4-5) -> practices for the attention to calibrate around the criticality line (6-10)
Also this is why the tip to meet your meditation freshly wherever it is appearing is important because it is a criticality tuning process that is different for everyone?
(I very much like this way of thinking about this, nice!)
Based on a true map of the territory. (I really like this advice a good exploration strat seems similar to the one about taking photographies, it is really just about taking a bunch of them and you'll learn what works over time.)
I really appreciated this post.
I didn't know that you had concepts for aliveness and boggling within the rationality sphere as I find these two of my most previous states that I've been cultivating over the last couple of years and they've always felt semi-orthogonal to more classic rationality (which I associate more with the betting, TDT and deep empiricism stuff).
Meditation seems to bring aliveness, boggling and focusing forth quite well and I just really appreciate that they're things you place high value on as I find them some of the best ways of getting out of pre-existent frames. (Which for me seems one of the best ways of becoming more rational)
On character alignment for LLMs.
I would like to propose that we think of a John Rawls style original position (https://en.wikipedia.org/wiki/Original_position) as one view when looking at character prompting for LLMs. More specifically I would want you to imagine that you're on a social network or similar and that you're put into a word with mixtures of AI and human systems, how do you program the AI in order to make the situation optimal? You're a random person among all of the people, this means that some AIs are aligned to you some are not. Most likely, the majority of AIs will be run by larger corporations since the amount of AIs will be proportional to the power you have.
How would you prompt each LLM agent? What are their important characteristics? What happens if they're thought of as "tool-aligned"?
If we're getting more internet based over time and AI systems are more human in that they can flawlessly pass the turing test, I think the veil of ignorance style thinking becomes more and more applicable.
Think more of how you would design a societly of LLMs and what if the entire society of LLMs had this alignment rather than just the individual LLM.
This is a nice way to get around the problems raised in Andrew Critch's post on consciousness as well since it is a lot less conflationary
I'm curious about the details of your model when it comes to long-time horizon planning:
I do understand that these are more of the justifications for why you might extrapolate data in the way that you're doing yet I find myself a bit concerned with the lack of justification for this (in the post). This might just be because of infohazard reasons in which case, fair enough.
For example, I feel that this definition above applies to something like a bacterial colony developing antibiotic resistance:
Now the above examples is obviously not the thing that you're trying to talk about. The point I'm trying to make is that your planning definition applies to a bacteria colony and that it therefore is not specific enough?
In order to differentiate between a bacterial colony and a human there are a set of specific properties that I feel need more discussion to make the model rigorous:
Maybe a bacterial colony and humans are on the same planning spectrum and there's some sort of search based version of the bitter lesson that says that "compute is all you need" yet it feels like there are phase transitions in between bacterial colonies and humans and that this is not a continous model. Does compute give you self representations? Does compute enable you to do online learning? Does compute + search give you the planning apparatus and memory bank that the brain has?
How do you know that 12+ hours tasks don't require a set of representations that are not within what your planning model is based on? How do you know that this is not true for 48+ hours tasks?
To be clear, I applaude the effort of trying to forecast the future and if you can convince me that I'm wrong here it will definetely shorten my timelines. It makes sense to try the most obvious thing first and assuming a linear relationship seems like the most obvious thing. (yet I still have the nagging suspicion that the basis of your model is wrong as there are probably hidden phase transitions between going from a bacterial colony in planning function and a human.)