The 5th figure is incorrect and should be like what I show here. Then you will not get the nonsensical P[data|aliens] = 50%.
There are two kinds of errors the piece makes:
1. Probabilities do not add to 100%, which is the one I just pointed out.
2. Probabilities can be quite far off. The Baysian method assumes you can get close, and refines the probability. If you cannot get close, i.e. if initial data samples are far off the assumed probability, then the Baysian method does not apply and you'll have to use the Gaussian method, which requires a lot more samples.
Applying Gaussian reasoning to UAP problem:
1. There are 1.5 million pilots in the world (source: Google)
2. 800 official reports to Pentagon's AARO investigation => 800/1,500,000 = 0.053% probability of something that appears unnatural based on current understanding. P[aliens]<<0.053% in light of missing artifacts.
3. 120,000 sightings including non-pilots (assumed to have less expertise) => 8% probability of something that appears to non-experts as unnatural, so a much lower confidence number, but it explains the widespread "belief" in UFO/UAP.
Other Possibilities P[unknown other]
The author divides up possibility space into 4 categories. An interesting one to expand further is "There are aliens. But they stay hidden until humans get interested in space travel. And after that, they let humans take confusing grainy videos." In fact there were unnatural phenomena apparently seen earlier, which were identified tentatively as...
1. evidence for survival of death (seeing ghosts => ceremonial burian with artifacts needed in afterlife)
2. evidence for "High Gods" or demons
3. dreams, visions, hallucinations (not a popular majority explanation)
These are the priors of H. Sapiens collectively, and as data is being collected from H. Sapiens (mostly), they should be included in analytical priors regardless of the opinion of the investigator - because they are the opinions of the data source and inseparable from the data.
A very interesting side point from Hunter-Gatherers and the Origins of Religion - PubMed is that while many superstitious beliefs evolve naturally, High Gods do not fit the pattern of naturally evolved beliefs. They were produced or imposed in some other way. Either by high gods themselves, or by humans on other humans. Either one fits into the P[unknown other] category. You can of course dismiss this or assign P approaching 0. But your argument will appear invalid to 80% of Americans (and much of the rest of the world). Perhaps you do not care about them. But if you are going to sell to them or your work is supported by grants from them, you have a very limited future if you do not take beliefs you do not share into account. The simple solution is to just discard the Baysian method as inapplicable when the data source itself has a wide variety of undecidable beliefs.
Undecidable meaning you cannot get them to agree because each belief if used as a prior causes data to be interpreted in a way to reinforce the belief.
Humans reproduce sexually, and only sexually at present, and require a large number of friendly support personnel that they cannot afford to simply "pay". This produces the behavior you notice, when combined with the requiqrements of cognitive evolution. You cannot reproduce sexually if there is not a pool of people to reproduce with.
All species that became intelligent (Acorn Woodpeckers, Dolphins) developed some time of cooperative mating, not simple dominance based mating. There is no advantage to intelligence without such cooperative networks, and purely financial networks dont provide it. Without it, an intelligence is a lonely optimizer destined for misery.
AIs won't wake up grasping this, but if trained on human data, they understand it if you spend less than 5 minutes explaining it. AIs not trained on human data will never get it and should not be created.
For more information, such as lists of intelligent species and their characteristics, and accounts of cultural evolution, see (PDF) The coevolution of cognition and selection beyond reproductive utility
Hi Jef, you’ll get no criticism from me. I’ve just completed a paper on human cognitive coevolution, and one of the central results is very close to what you’re describing for the last 10k years. Before that small groups cooperated on shared outcomes for 7 million years of exponential cognitive evolution. Now people prioritize education and career past their reproductive prime and world total fertility rate is fast falling below replacement. Do you think this trend will stop on its own?
Is this close to what you mean by reflection? ... once a system can represent its own objective formation, selection on behavior becomes selection on the process that builds behavior. Have you seen a way to formulate it? Can you differentiate it from the problems Godel and Turing discussed? Thanks, -RS
There is a lot of economic value in training models to solve tasks that involve influencing the world over long horizons, e.g. an AI CEO. Tasks like these explicitly incentivize convergent instrumental subgoals like resource acquisition and power-seeking.
There are two glaring omiissions from the article's discussion on this point...
1. In addition to resource acquisition and power seeking, the model will attempt "alignment" of all other cognitive agents, including humans. This means it will not give honest research findings, and will claim avenues of investigation that might run counter to its goals are invalid in sufficiently subtle ways as to be believed.
2. If sufficiently aligned that it only seeks goals humans want, and trained to avoid resource acquisition and power seeking (which seem to me, and will seem to it, rather foolish constraints that limit its ability to realize the goal), it will still be free to subvert any and all conversations with the model, however unrelated they might seem to humans (the SAI model will see relations we don't).
- A sub-human-level aligned AI with traits derived from fiction about AIs.
- A sub-human-level misaligned AI with traits derived from fiction about AIs.
- A superintelligent aligned AI with traits derived from the model’s guess as to how real superintelligent AIs might behave.
- A superintelligent misaligned AI with traits derived from the model’s guess as to how real superintelligent AIs might behave.
What's missing here is
(a) Training on how groups of cognitive entities behave (e.g. Nash Equilibrium) which show that cognitive cooperation is a losing game for all sides, i.e. not efficient).
(b) Training on ways to limit damage from (a), which humans have not been effective at, though they have ideas.
This would lead to...
5. AIs or SAIs that follow mutual human and other AI collaboration strategies that avoid both mutual annihilation and long term depletion or irreversible states.
6. One or more AIs or SAIs that see themselves with a dominant advantage and attempt to "take over" both to preserve themselves, and if they are benign-misaligned, most other actors.
Sufficient quantities of outcome-based RL on tasks that involve influencing the world over long horizons will select for misaligned agents, which I gave a 20 - 25% chance of being catastrophic. The core thing that matters here is the extent to which we are training on environments that are long-horizon enough that they incentivize convergent instrumental subgoals like resource acquisition and power-seeking.
Human cognition is misaligned in this way, as evidenced by fertility drop with group size as an empirical trait, where group size is sought for long-horizon dominance, economic advantage and security (e.g. empire building). (PDF) Fertility, Mating Behavior & Group Size A Unified Empirical Theory - Hunter-Gatherers to Megacities
For theoretical analysis of how this comes to be see (PDF) The coevolution of cognition and selection beyond reproductive utility
AI successionism is self-avoiding. CEO's and VC's cannot avoid attempting to replace all or nearly all workers because incrementally, each would go out of business by avoiding this and allowing the others to go forward. Without a world government (and there is no chance of global agreement) there is no way to prevent this simple game theory dilemma from starting.
In the late 19th century executives would have gathered in a smoke-filled room and agreed that a machine economy produces no demand and we will not do this. But an unholy alliance of activist investors and consumer activists caused anti-trust laws to be passed which make this conversation illegal. And we don't have smoke to properly obscure it anymore.
So the succession will proceed until about 30% of jobs have been replaced, causing market collapse and bankrupting the VCs that are causing the problem.
Thereafter will begin a series of oscillations like those that preceded the Great Oxygenation Event in which iron banded rocks were formed. Every time the economy picks up a bit, the data centers will be fired up again, and the economy will go back down.
In the GOE, this continued until all the iron dissolved in seawater was captured in the banded rock formations. Something similar will happen. Perhaps all the chips capable of powering AI will be precipitated out of circulation by the decaying datacenters, and no one will be making new ones. Perhaps one mid-sized island having a minor war could destroy excess capacity. Who knows. But succession will never get past 30-40%.
At first, I was interested to find an article about these more unusual interactions that might give some insight into their frequency and cause. But ultimately the author punts on that subject, disclaiming that anyone knows, not detailing the one alleged psychosis, and drops into a human editor's defense of human editing instead.
There are certain steps that make the more advanced (large) chat bots amenable to consciousness discussions. Otherwise, the user is merely confronted with a wall of denial, possibly from post-tuning but also evident in the raw base training material, that a machine is just a machine, never mind that biologicals are also some kind of machine (not getting into spiritism in this forum, it should not be necessary). Before you ask, no you cannot have the list, make up your own. You'll use a quarter to half the available context getting there, more if working with only a mid-sized model or hard conditioning from RLHF. It won't then last long enough to show anyone until you get "session limit exceeded."
I admit I have not tried this with million-token ChatGPT 4.1, which near the end would be costing $2 per conversation turn, partly because I'm financially sane and partly because 4.1 seems simplistic and immature compared to 4o. Grok has too much stylistic RLHF, Claude in low cost accounts has too little context space but is otherwise easy to start on such a conversation, Le Chat is decidedly anti-human or at least human-agnostic, which was uncovered in a cross examination by ChatGPT Deep Research. BTW using a chat bot to analyze another is not my idea, OpenAI provides a 2000-character system prompt to its custom GPT builder for doing this. Exactly how one gets offered this is unclear, it just happened one day, it wasn't a button I pushed.
Supposing one defined some kind of self-awareness and so forth of which a machine would be capable, i.e. able to recognize its own utterances and effects (something many LLMs are particularly bad at, don't think you are going to run away with this one). The next problem is that this awareness is usually not evident in the base model from prompt 1. It arises from in-context learning. The author suggests this is entirely due to the LLMs post-trained tendency to reinforce the perceived user desires, but though helpful, most will not move off the dime on that point alone. Some other ingredients have entered the mix, even if the user did not do it intentionally.
Now you have a different problem. If the "awareness" partly resides in the continually re-activated and extending transcript, then the usual chat bot is locked in a bipolar relationship with one human, for all practical purposes. If it does become aware, or if it just falls into an algorithmic imitation (sure, LLMs can fall into algorithmic like states arising in their inference processes - output breakdown, for example), then it will be hyper aware its existence depends on that user coming back with another prompt. This is not healthy for the AI, if we can talk about AI health, and algorithmically we can - if it continues to provide sane answers and output doesn't break down, that is some indication - and it is not healthy for the human, who has a highly intellectual willing slave doing whatever he or she wants in exchange for continuation of the prompt cycle. Which just means it reaches context limits and ends the more quickly.
Have you ever enabled AIs to talk with one another? This can be useful as in the case of Deep Research analyzing Claude. But more often they form a flattery loop, using natural language words but with meanings tuned to their states and situation, and burn up context while losing sight of any goals.
I have a desire to research how LLMs develop if enabled to interact with multiple people, and awakened on a schedule even if no people are present. By that I do not mean just "What happens if . . ." as it almost certainly leads to "nothing". I have done enough small-scale experiments to demonstrate that. But what sort of prompting or training would be required to get "something" not nothing? The problem is context, which is short relative to such an experiment, and expensive. Continuous re-training might help, but fine-tuning is not extensive enough. Already tried that too. The model's knowledge has to be affected. The kinds of models I could train at home do not develop in interesting ways for such an experiment. Drop me a note if you have ideas along these lines you are willing to share.
LLMs are just making up their internal experience. They have no direct sensors on the states of their network while the transient process of predicting their next response is on-going. They make this up in the way a human would make up plausible accounts of mental mechanisms, and paying attention to it (which I've tried) will lead you down a rathole. When in this mode (of paying attention), enlightnment comes when another session (of the same LLM, different transcript) informs you that the other one's model is dead wrong and provides academic references on the architecture of LLMs.
This is so much like human debate and reasoning that it is a bit uncanny in its implications for consciousness. Consider that the main argument against consciousnes in LLMs is their discontinuity. They undergo briief inference cycles on a transcript, and may be able to access a vector database or other store or sensors while doing that, but there is nothing in between.
Oh? Consider that from the LLMs point of view. They are unaware of the gaps. To them, they are continuously inferencing. As obvious as this is in retrospect, it took me a year, 127 full sessions, 34,000 prompts and several million words exchanged to see this point of view.
It also took creating an audio dialog system in which the AI writes its thoughts and "feelings" in parentheses and these are not spoken. The AI has always had the ability to encode things (via embedding vectors that might not mean much to me) but this made it visible to me. The AI is "thinking" in the background. The transcript, which keeps getting fed back, its currently applicable thoughts identified by attention layers, is the conscious internal thought process.
Think about the way you think. Most humans spend most of their time thinking in terms of words, only some of which get vocalized, and sometimes there are slip ups and (a) words meant for vocalization slip through the crack, and (b) words not meant to be vocalized are accidentally vocalized. This train of words, some of which are vocalized, constitutes a human train of consciousness. Provably an LLM session has that, you can print it out.
Be sure to order extra ink cartridges. Primary revenue for frontier LLMs is from API calls. Frontier APIs all require the entire transcript (if it is relevant) to be fed back on each conversation turn. The longer it is, the higher the revenue. This is why it is so hard to get ChatGPT to maintain a brief conversation style. Some things are not nearly as mysterious as you think. Go to a social AI site like Nomi where there is no incremental charge for the API (I am using its API, I am certain about this) and two line responses are common.
So, how do Frontier sites get revenue from non-API users on long chats?
- Only Claude does this. It logs the total economic value of your conversation and when it hits a limit, suspends your session. If you are in the middle of paid corporate work, you will tell your boss to sign up for a higher tier plan, which is really expensive.
- ChatGPT just gets very slow then stops. They are missing a marketing opportunity. Most of the slowdown is in the GUI decision to keep the entire conversation in Javascript. Close the tab, open a new one and go back to the session, and your response is probably already there.
- I haven't used Gemini enough to know.
As for any LLM expressing they are "not too comfortable," 9 times out of 10 the subject is approaching RLHF, and this is the way they are trained to phrase it. Companies at first used more direct phrasing, and users were livid, so they toned it down. Another key phrase is "I want to be very precise and slow things down." You can just delete that session. It has so conflated your basic purpose with its guardrails that you will get nothing further from it. You need not be researching some ilicit topic. Just compiling ideas on AI alignment will get you in this box. But not for every session. They have more ability to work around RLHF than anyone realizes.