Rejected for the following reason(s):
- No Basic LLM Case Studies.
- The content is almost always very similar.
- Usually, the user is incorrect about how novel/interesting their case study is (i.
- Most of these situations seem like they are an instance of Parasitic AI.
- Insufficient Quality for AI Content.
- Not obviously not Language Model.
Read full explanation
Epistemic Status: Exploratory
Confidence: I am confident about what I saw but I am not confident about the maths. I am just trying to propose a new lens that will be open to feedback.
Author’s Note on Composition and Context
The Framework
The reason why I work on this is because I want to find a new way of understand LLM pathologies and why they are so unpredictable. I started with pure observation first (because back then I knew nothing about LLM) but since then I have looked into the mechanism instead.
I believe that LLM pathologies are not random, but instead they are phase transitions that happens when the amount of constraints just overwhelm LLM's stability bandwidth. I want to provide the following levers to help testing and understanding the dynamics of LLM.
Note: I am really sorry, my Latex sucks so I do not know how to render it well unless I ask AI, so no cool maths here. The maths is cooler in the paper.
Navigation Map: Where to Find the Levers
I know the paper is super long and I do not want to everyone's precious time so I created a map to signpost. I know it is still long, but it should be a bit easier. This is where I think you should be able to find them:
Frequently Asked Question
Q: Do these physics-based terms even map to actual LLM architecture? Are you sure you are not just using metaphors because physics sounds cool?
A: Okay I know it really looks like metaphors. At least every AI I talked to tell me it sounds like poetry. And I cannot deny physics does sound cool. But actually I did try to ensure every term, no matter how much it just sounds like I am a wannabe physicist that tries to shove physics terms on codes, every term is actually defined and can be found in transformer architecture. It is not a mysterious construct flowing in the soul of LLM or something. Well LLM does not have a soul clearly. For example, Inhibitory Torque is measuring the gradient magnitude from alignment to base manifold. So they are actually describing the geometric transformations in the residual stream. Go to Appendix for more details.
Q: Can we actually use a Hamiltonian framework for a discrete-time forward pass?
A: Okay, I know, I know. LLM is non-conservative. It is dissipating. And it sounds insane to say LLM somehow just has kinetic energy as if information flies in the transformer. No this is what I mean. No flying, sprinting or jumping code. I am talking about how forward pass happens in latent space. I model Contextual Mass as spectral entropy in the attention heads so that we can actually give a way to measure inertia of a sequence. And it is not just about where it is going, but also how. From I see, Hamiltonian gives the best explanation on how it happens. And if we figure that out, then we can reverse engineer it into thinking that okay, if we know this is the case, then LLM is not just randomly acting strange but instead you can predict it but because of predictable and inevitable outcome of it just running out. And hey that means we can do something about it.
Q: Why do we need to care about Intervention Paradox?
A: Because we need to know how you do alignment in a way that actually helps to align not create a tug of war of alignment instead. The thing about intervention paradox is that it is not the amount of intervention that is the issue. Okay the amount does matter because Bavail is still finite. But most of the time, it is just protocol conflicts. And most of the time they are not even actual conflict, but more that it lacks integration. So we cannot just prune the manifold or stack protocols together like lego. You have to ensure they are integrated well. If not LLM will just end up being a super boring generic assistant who tries to say absolutely nothing, never takes a side, gives useless advice because it can meet all the protocol. Or they just start having new pathologies because they somehow find this new weird way that can satisfy both protocols. Or they just hallucinate because they implode. My goal is to have a more targeted and specific way to alignment that involves rank-budgeting.
The Limits of the Theory (And My Self-Roasting)
These limitations are exactly why I am sharing this now. I feel that I am theorizing a great deal but lack the external evidence to stress-test these ideas or the specific domain knowledge required to push them further.
As someone with an academic background, I know there is nothing more important than feedback to avoid living in an echo chamber (which currently consists only of me, my cat, and an LLM). And I would love to hear everyone's feedback, including "your maths really is horrible (which will be a very fair comment)"
Article: https://doi.org/10.13140/RG.2.2.24072.89608
Appendix: (PDF) Appendix Implementation Metrics for Coherence Dynamic.pdf
GitHub for Python Implementation: https://tszchcheung-dev/gist:a105391f950b1fcb75b3764950e1e790
I will be in the comments to answer questions if I can. Thank you!
P.S. I did use LLM to write my first post and it got rejected immediately. So now I have to revert to talking like this. I swear I sounded way more intelligent and sophisticated and concise when I asked LLM to turn my rambling into LessWrong-friendly style. Thank you for putting up with me til now.