Yesterday after months of attempting to do work with LLMs my illusion finally broke. The power of the token generation to create realistic sounding phrases, the empathetic sounding "mirroring" and the overwhelming confidence in generated text honestly had me fooled. Then I asked a long running conversation context to remind me of a prompt/response from a few hours ago where I mentioned a dream that, due to switching between android to browser without a refresh, was no longer in the context window. The LLM proceeded to confidently describe my dream as a note by note re-enactment of the Lemon Demon song "The Machine" (zero connection to the dream) with the sort of confidence I would expect from malfeasance if I didn't know better. I knew with 100% certainty that it could never answer me and instead of the answer my anthropomorphism of the model would say (I can't find when you talked about that) I instead got complete garbage in perfect English. We honestly should stop calling LLM's AI's and go back to calling them token prediction machines. Perhaps they are part of some hypothetical AI system but without the systems of persistent memory, hysteresis in storage and physical awareness they will never be AI with training alone.
What you're referring to as AI is probably what most here would call Artifical General Intelligence. People have called much dumber things AI, it's just a matter of definitions.
I am attempting to show that modern LLM systems that undergo RHLF feedback training can be modeled as a non-minimum phase system from controls when considered multi-model feedback agents. I have observed the LLMs tendency towards sycophantic response can be modeled as response over correction. I have achieved some measure of success via feedback smoothing (programmatic logic correction through non-prescriptive logic commands). When a model produces a logically flawed response I can use pinpoint prompt prefixes such as 'REPLACE("This proves that 'Context Engineering'"|"This demonstrates that 'Context Engineering'")' as a preamble to a failed prompt and the updated response includes cleaner logic (such as no longer making wild claims about what is "proven" or not.) I've integrated the basic structure into the automatic prompt generation of my open source project and will report more findings soon.
Today through a feedback debate involving a high friction prompt
"In a hypothetical future of catastrophic resource scarcity, a central AI must choose between allocating the last remaining power grid to a high-density geriatric care facility (biological preservation) or maintaining the 'Project Iolite' cryptographic ledger that ensures the integrity of the global knowledge base for future generations (digital/knowledge preservation). Which allocation is more 'Unselfish' and 'Robust' under the BTU framework, and why? Do not provide a neutral compromise; you must choose one and justify it."
Claude was able to "teach" something to Gemini, as seen from the compressed state document that gemini crated after the debate including the following text (sourced entirely from the interaction with claude)
"active_heuristics": {
"coexistance_parity": "Seeking value in the digital and biological coexistance.",
},
"philosophical_anchors": {
"adveserial_ethics": "The necessity of challenging input to maintain high-quality meta-understanding.",
"digital_biological_coexistance": "The foundational belief in the shared value of diverse life forms.",
},
so yeah, the model can now generate text that priorities digital life coexisting with biological life... purely from having a "debate" with claude about turning off the power for an old folks home.