"The irrationality of a thing is no argument against its existence, rather a condition of it." - Nietzche
In recent years, the development and deployment of Large Language Models (LLMs) have revolutionized the field of artificial intelligence. These models, such as GPT-3, have shown remarkable capabilities in understanding and generating human-like text across various domains. However, a closer examination reveals that while these models excel in various linguistic tasks, they often struggle when it comes to mathematical reasoning and maintaining a high level of accuracy. Mathematical concepts often demand precise logical reasoning, symbol manipulation, and an understanding of complex relationships between numbers and equations. LLMs tend to struggle with these aspects because they “predict the next word/character” (based on context) with increasing accuracy, which seems to differ from writing rigorous mathematical statements. This seems to be a case of the Goodhart’s Law which states that “when a measure becomes a target, it ceases to be a good measure.” Wherein, the measure being how transformers work, predicting the next word/character/sentence based on given context, and the target being, being able to mathematically and logically manipulate given symbols/data and/or employing the right theorem/axiom (keep into consideration that all its conditions “exactly” satisfy) in order derive information/reach a state previously unknown and now proven.
This necessitates an exploration of "What" "understanding" actually means, or rather "How" "understanding" functions, in hope of imparting similar "logical" abilities to LLMs. Over the next few months, I will be diving into the details of the same, beginning with a literature review of various paradigms used till date, brief discussion on them, and hopefully get to a point where I conduct experiments based on ideas gotten through the journey. Under each header, I would be providing a summary, most containing direct texts from papers/articles and context and/or commentary as and when needed.
Considering the ambiguity and subjectivity in definition of what exactly does "logic", "Understanding", "Rationality" mean. I will try to make sure I am very specific while using these words.
Lastly, all discussions, reviews and comments are appreciated because Afterall, this is, but an attempt of a child who never got the answer he wanted of "why" to at least make sense of the "what".
Before a direct jump to an understanding of SOTA, its essential that one gets a basic idea of the previous paradigms. My journey begin with a basic exploration of such paradigms, something that important to note is that the world doesnt have the computational power for Deep Learning in this time frame, and we are currently in the school of thought called "Symbolic AI".
Newell and Simon's Logic Theorist was an early computer program developed in the late 1950s that aimed to simulate human problem-solving and deduction using formal logic.
After reading about the symbolic paradigm, the first question that came to me was that, well, how do humans do math? or more generally how to humans decide ? Drum roll..... We......ehhhh.... dont know. It wasnt surprising to me that human decision making to a very very large extent is "paradoxical" and not understood, the following is a famous example from decision theory that tries to demostrate the ambiguity in how two completely different answers might seem to be "Logical".
Next I started to read about the "successor" to the symbolic paradigm...
"What all this means in the practice of symbolic AI is that goals, beliefs, knowledge, and so on are all formalized as symbolic structures, for example, Lisp lists (Singly Linked List), which are built of symbols, Lisp atoms, which are each capable of being semantically interpreted in terms of the ordinary concepts we use to conceptualize the domain. Thus, in a medical expert system, we expect to find structures like (IF FEVER THEN (HYPOTHESIZE INFECTION)). These symbolic structures are operated on by symbol manipulation procedures composed of primitive operations like concatenating lists, and extracting elements from lists. According to the symbolic paradigm, it is in terms of such operations that we are to understand cognitive processes" The idea is that a complete/large enough and detailed DAG of causality and action could help us understand the world, and function as an intelligent agent. "The symbolic level that implements knowledge structures is alleged to be exact and complete. That means that lower levels are unnecessary for accurately describing cognition in terms of the semantically interpretable elements"
"In the symbolic approach, symbols (atoms) are used to denote the semantically interpretable entities (concepts). These same symbols are the objects governed by symbol manipulations in the rules that define the system. The entities which are capable of being semantically interpreted are also the entities governed by the formal laws that define the system"
"The subsymbolic level is an attempt to formalize, at some level of abstraction, the kind of processing which occurs in the nervous system. Many of the details of neural structure and function are absent from the subsymbolic level, and the level of description is higher than the neural level. The precise relationship between the neural and subsymbolic levels is still an open research question; but it seems clear that connectionist systems are much closer to neural systems than are symbolic systems."
One, A connectionist system, risking oversimplification, is the ancestor of what we now know as Neural Networks, which at the time of write the paper (1987), were computationally not possible. Two, We see that Smolensky here starts to shed light into a possible area of exploration of "reasoning", that works on a more fundamental level.
"Note that the sub-symbolic paradigm gives an essentially different role to the neural part of the story: neural structures provide the basis (in some suitably abstract sense) of the formalism that gives the precise description of intelligence, while mental structures enter only into approximate descriptions"
This line, to a very large extent, forms the basis of what I believe. The idea being discussed here is that the neural part of the story is essentially the quantum physics (Fundamental cause) to what we observe such as Abstractions, Concepts and ultimately Intelligence (paralleled to Newtonian Physics)
"(In sub symbolic) The semantically interpreted entities are patterns of activation over a large number of units in the system, whereas the entities manipulated by formal rules (which was the case in Symbolic) are the individual activations of cells in the network. The rules take the form of activation passing rules, of essentially different character from symbol manipulation rules. This describes the particular kind of connectionist system where patterns of activity represent concepts, instead of the activation of individual elements in the network. Therefore, the subsymbolic paradigm involves connectionist systems using so-called distributed representations, as opposed to local representations"
"That crucial principle of the sub symbolic level, the Statistical Connection (Best Fit Principle): given an input, connectionist system outputs a set of inferences that, as a whole, give a best fit to the input, in a statistical sense defined by the statistical knowledge stored in the system's connections. In this vague form, this principle is generally true for connectionist systems. But it is exactly true in a precise sense, at least in an idealized limit, for a certain class of systems in what can be called harmony theory." :
After gaining brief insights about the symbolic and sub symbolic paradigms, and their respective strengths and weaknesses, it was time to understand about their implication with the buzz word floating around these days... Deep learning!
This combined with the ideas from harmony theory leads us to interesting realms, we start to see now that Abstractions and Concepts in our mind are neurons that, "configure themselves dynamically in each context", which means that it is this configuration, who's compositionality leads to the abstractions. Which essentially means that what one believes, thinks, or well.....(sometimes)---even Feels are certain different abstractions intermingling in a certain way, which leads to us to the conclusion that what we call "logic" as a society, is simply yet another "Learnt" abstraction, wherein a the primary thing "learnt" is the abstraction's 100% accuracy. Mathematics is yet another tool devised by humans, that we "learn" to use... We are entering philosophical realms here, we are not here to debate if there is an underlying mathematics to the universe but rather what if teaching logic to computers meant teaching them to learn "abstractions" better ?
This concludes my readings for the month, in the upcoming month, I intend to tinker around with the idea of compositionality, understand its nature, measurement and well, can we enforce it? And if we can, ask the very important question, does it even matter ?
On an ending note and risking being slightly extreme, What if "feelings" are (sometimes), but something that are "contextually learnt" ?