I've been thinking through the following philosophical argument for the past several months.
1. Most things that currently exist have properties that allow them to continue to exist for a significant amount of time and propagate, since otherwise, they would cease existing very quickly.
2. This implies that most things capable of gaining adaptations, such as humans, animals, species, ideas, and communities, have adaptations for continuing to exist.
3. This also includes decision-making systems and moral philosophies.
4. Therefore, one could model the morality of such things as tending towards the ideal of perfectly maintaining their own existence and propagating as much as possible.
Many of the consequences of this approximation of the morality of things seem quite interesting. For instance, the higher-order considerations of following an "ideal" moral system (that is, utilitarianism using a measure of one's own continued existence at a point in the future) lead to many of the same moral principles that humans actually have (e.g. cooperation, valuing truth) while also avoiding a lot of the traps of other systems (e.g. hedonism). This chain of thought has led me to believe that existence itself could be a principal component of real-life morality.
While it does have a lot of very interesting conclusions, I'm very concerned that if I were to write about it, I would receive 5 comments directing me to some passage by a respected figure that already discusses the argument, especially given the seemingly incredibly obvious structure it has. However, I've searched through LW and tried to research the literature as well as I can (through Google Scholar, Elicit, and Gemini, for instance), but I must not have the right keywords, since I've come up fairly empty, other than for philosophers with vaguely similar sounding arguments that don't actually get at the heart of the matter (e.g. Peter Singer's work comes up a few times, but he particularly focused on suffering rather than existence itself, and certainly didn't use any evolutionary-style arguments to reach that conclusion).
If this really hasn't been written about extensively anywhere, I would update towards believing the hypothesis that there's actually some fairly obvious flaw that renders it unsound, stopping it from getting past, say, the LW moderation process or the peer review process. As such, I suspect that there is some issue with it, but I've not really been able to pinpoint what exactly stops someone from using existence as the fundamental basis of moral reasoning.
Would anyone happen to know of links that do directly explore this topic? (Or, alternatively, does anyone have critiques of this view that would spare me the time of writing more about this if this isn't true?)
As for one more test, it was rather close on reversing 400 numbers:
Given these results, it seems pretty obvious that this is a rather advanced model (although Claude Opus was able to do it perfectly, so it may not be SOTA).
Going back to the original question of where this model came from, I have trouble putting the chance of this necessarily coming from OpenAI above 50%, mainly due to questions about how exactly this was publicized. It seems to be a strange choice to release an unannounced model in Chatbot Arena, especially without any sort of associated update on GitHub for the model (which would be in https://github.com/lm-sys/FastChat/blob/851ef88a4c2a5dd5fa3bcadd9150f4a1f9e84af1/fastchat/model/model_registry.py#L228 ). However, I think I still have some pretty large error margins, given how little information I can really find.
OK, what I actually did was not realize that the link provided did not link directly to gpt2-chatbot (instead, the front page just compares two random chatbots from a list). After figuring that out, I reran my tests; it was able to do 20, 40, and 100 numbers perfectly.
I've retracted my previous comments.
Interesting; maybe it's an artifact of how we formatted our questions? Or, potentially, the training samples with larger ranges of numbers were higher quality? You could try it like how I did in this failing example:
When I tried this same list with your prompt, both responses were incorrect:
By using @Sergii's list reversal benchmark, it seems that this model seems to fail reversing a list of 10 random numbers from 1-10 from random.org about half the time. This is compared to GPT-4's supposed ability to reverse lists of 20 numbers fairly well, and ChatGPT 3.5 seemed to have no trouble itself, although since it isn't a base model, this comparison could potentially be invalid.
This does significantly update me towards believing that this is probably not better than GPT-4.
I looked a little into the literature on how much alcohol consumption actually affects rates of oral cancers in populations with ALDH polymorphism, and this particular study seems to be helpful in modelling how the likelihood of oral cancer increases with alcohol consumption for this group of people (found in this meta-analysis).
The specific categories of drinking frequency don't seem to be too nice here, given that it was split between drinking <=4 days a week, drinking >=5 days a week and having less than 46g of ethanol per week, and drinking >=5 days a week and having more than 46g of ethanol per week. Only in the latter category was there an actual significant increase in oral cancer rates (4.4x), although there is some non-significant evidence for about a 1.5x increase in the high-moderate group and the moderate group. Comparing the ~77 mg/week ingestion rate from the document to the 10-40g range I would estimate the high-moderate group to have, I would imagine that there is probably a much more minor effect for Lumina (if I had to estimate, maybe like a 1.1x risk, which might be offset by the benefits of lower levels of lactic acid at that level).
One other argument against this (which I would put a lower epistemic status on given my basic intuition of enzyme kinetics) would be that since people probably tend to have more than 10 milligrams of alcohol every time they have a gulp of an alcoholic beverage, ALDH2 deficiency would be much more of a bottleneck as acetaldehyde levels rise rapidly after consuming alcohol compared to the rather low background generation of acetaldehyde we might see for Lumina users.
As I am being slightly a "man of one study" here, I'd be interested to see if you've found any studies of your own that demonstrate more of an effect of alcohol consumption on oral cancer for ALDH2 deficiency than I've been listing here.
One other interesting quirk of your model of green is that it appears most of the central (and natural) examples of green for humans involve the utility function box adapting to these stimulating experiences so that their utility function is positively correlated with the way latent variables change over the course of one of an experience. In other words, the utility function gets "attuned" to the result of that experience.
For instance, taking the Zadie Smith example from the essay, her experience of greenness involved starting to appreciate the effect that Mitchell's music had on her, as opposed to starting to dislike it. Environmentalist greenness, in the same vein, might arise from humans' utility boxes attuning to the current processes of life, leading to a wish for it to continue.
Notably, I can't really think of any examples where green alone goes against one of these processes in humans, with most examples of people being "attuned" against an experience being caused by it simply conflicting with a separate, already-existing goal. While disgust does technically conflict with what changes during the process of becoming physically sick, I can't think of any reason that might occur other than how it prevents a human from achieving goals (black, or perhaps red). Desires for immortality, while conflicting with the process of death, seem to mostly extend from a red desire to have things continue to live (which would, by this model, was a desire that stemmed itself from green). If I attempt to think of some experience that would be completely uncorrelated with a human's prior preferences (e.g. an infant looking at a flowing river for the first time from a distance), it doesn't seem natural to imagine the human suddenly disliking that in any particular circumstance (the infant wouldn't start despising flowing rivers), but I could still see a small chance of it beginning to appreciate it (as long as I'm not missing some obvious counterexample).
This natural positive correlation (or "attunement", "appreciation", or for especially spicy takes, "alignment"), if I had to guess, could be explained from either humans simply gaining reward from expanding their world models (maybe this just simplifies to "humans naturally like learning", but that feels a little anti-climactically blue). It is also possible that attunement as opposed to indifference is only created just by some minor positive association generated from deeper levels in the brain, although that would imply that green could just be a consequence of red for a human's utility function box.
As for your first question, there are certainly other thought systems (or I suppose decision theories) that allow a thing to propagate itself, but I highlight a hypothetical decision theory that would be ideal in this respect. Of course, given that things are different from each other (as you mention), this ideal decision theory would necessarily be different for each of them.
Additionally, as the ideal decision theory for self-propagation is computationally intractable to follow, "the most virulent form" isn't[1] actually useful for anything that currently exists. Instead, we see more computationally tractable propagation-based decision theories based on messy heuristics that happened to correlate with existence in the environment where such heuristics were able to develop.
For your final question, I don't think that this theory explains initial conditions like having several things in the universe. Other processes analogous to random mutation, allopatric speciation, and spontaneous creation (that is, to not only species, but ideas, communities, etc.) would be better suited for answering such questions. "Propagative decision theory" does have some implications for the decision theories of things that can actually follow a decision theory, as well as giving a very solid indicator on otherwise unsolvable/controversial moral quandaries (e.g. insect suffering), but it otherwise only really helps as much as evolutionary psychology when it comes to explaining properties that already exist.
Other than in the case that some highly intelligent being manages to apply this theory well enough to do things like instrumental convergence that the ideal theory would prioritize, in which case this paragraph suddenly stops applying.