I like your "tralsity" terminology. I think I was thinking about this differently from you. I was thinking from a computer science or graph theory perspective, where the "solution" to an argument would b the set of families of state transitions for statements when the truth values are updated iteratively.
This is more related to arguments, systems of statements, being valid or invalid, tautological, or contradictory... although I think the ideas would need to be extended in some way to apply to arguments with cyclic form, as opposed to the more normal form of having premises and conclusions.
It seems like tralsity, which you are talking about, is more focused on assigning valid tralse values to statements regardless of them having logical contradictions. I think this is an interesting thing to be trying to do. I think I prefer the argument structure and state transition approach because it would point out which states are valid and what is causing the state to be invalid when it is invalid.
So rather than "this statement is false" going to "x=-x" and being solved as 0, it would go to "x<--x" and the solution would be the sequence "...-1,1,-1,1,-1,1..." which is unrolled from a state transition that might look something like {( x=1 --> x=-1 ), ( x=-1 --> x=1)}, or something similar. That creates a single connected transition graph with two nodes, one representing the state "x=1" and one representing the state "x=-1" with each pointing at the other. And that is the only system of transitions this argument can take. But other arguments might have disconnected transition graphs. Only states in transition graphs that only point to themselves would be valid, but the different kinds of subgraphs with more than one node would tell you about the way they are invalid.
Haha... sorry, this isn't fully thought through so I apologize if it isn't clear or easy to follow.
I think there's probably a relationship between this idea of assigning tralsity to states vs looking at transition graphs for arguments, but I'm not sure exactly what that relationship would be.
It is reminding me of the book "Topoi: the categorical analysis of logic" which I have not gotten into far enough to know if it is related at all or not.
Alas, so much math, so little time.
At its core, it is an attempt to draw a low-dimensional map of meaning.
How I'm thinking about semantic manifolds in semantic spaces doesn't seem well represented by "attempting to draw a low dimensional map of meaning".
I'm sorry, but I'm having trouble connecting with what you're saying. It seems you are talking about some group of peoples attempt to understand neural networks. I think it would be helpful if you stated your assumptions about that groups assumptions, because I don't think I share them and don't know what they are.
In particular, "You cannot morph cat into dog through valid states", and "Make a table of ~14 sublayers of a transformer and note if the manifold is valid." seem like meaningless statements to me because it's unclear what "valid" would mean in this context.
I really like this. I think there's two different vibes here I like: (1) evaluation of arguments should compare to other possible arguments within some reference class rather than being based solely on the properties of the argument, and (2) more tangentially, the truth value of a system of statements need not just be "true" or "false", "consistent" or "inconsistent".
I don't really have much more to say about (1) other than that I liked your explanations of the loopy blue universe and tessellating beetles. Both are absurd, but have explanatory power. I would like to explore more about the relationship of arguments to their reference classes and the structure of possible argument reference classes.
I like thinking about (2) as digraphs where each node is a statement and arrows are references to other statements, then each statement has a set of possible truth values which either makes the graph inconsistent or not. Some graphs may have a single, or multiple consistent sets of truth values, but there is more to be said about inconsistent ones than merely that they are inconsistent. If at each iteration you "updated" the truth value for the node corresponding to each inconsistent arrow, you would get patterns of propagation based on the starting state. Further, an obvious way to force an inconsistent graph is to have cycles in the graph with an odd number of "bit flip" arrows, which assert some reference is false, and so must be false if the referent is true. Then you can ask about the other possible structures which introduce forced inconsistency.
( I'm sure people have focused on this, but I'm not sure what it would be called... )
One application for this might be that if you could throw out all inconsistency forcing structures you could (perhaps) find the subset of programs for which the halting problem doesn't apply.
Yeah! I think distance, direction, and position (not topology) are at least locally important in semantic spaces, if not globally important, but continuity and connectedness (yes topology) are probably important for understanding the different semantic regions, especially since so much of what neural nets seem to do is warping the spaces in a way that wouldn't change anything about them from a topological perspective!
subspaces of it where meaningful information might actually be stored
At least for vanilla networks, the input can be embedded into higher dimensions or projected into lower dimensions, so you're only ever really throwing away information, which I think is an interesting perspective for when thinking about the idea that meaningful information would be stored in different subspaces. It feels to me more like specific kinds of data points (inputs) which had specific locations in the input distribution would, if you projected their activation for some layer into some subspace, tell you something about that input. But whatever it tells you was in the semantic topology of the input distribution, it just needed to be transformed geometrically before you could do a simple projection to a subspace to see it.
Interesting. Some thoughts:
AI alignment has been getting so much bigger as a field! It's encouraging, but we still have a long way to go imo.
Did you see Shallow review of technical AI safety, 2025? I'd recomment looking through that post or their shallow review website and finding something that seems interesting and starting there. Each sub-domain has its own set of jargon and assumptions so I wouldn't worry too much about trying to learn the foundations since we don't have a common set of foundations yet.
Just reading posts isn't bad, but since there isn't that common set of foundations, it could be confusing when you're just starting out (or even when you're quite experienced).
Good luck and glad to have you!
I sadly do not have the capacity to learn about the majority of it.
Sadly, it's a problem you share with me and most humans, I think, with possible rare exceptions like Paul Erdős.
I'll try to build up a quick sketch of what the residual stream is, forgive me if I say things that are basic, obtuse, or slightly wrong for brevity.
All neural networks (NN) are built using linear transformations/maps which in NN jargon are called "weights" and non-linear maps called "activation functions". The output the activation functions are called "activations". There are also special kinds of maps and operations depending on the "architecture" of the NN (eg: convNet, resNet, LSTM, Transformer).
A vanilla NN is just a series of "layers" consisting of a linear map and then an activation function.
The activation functions are not complicated nonlinear maps, but quite simple to understand. One of the most common, ReLu, can be understood as "for all vectors, leave positive components alone, set negative components to 0" or "project all negative orthants onto the 0 hyperplane". So, since most of the complex behaviour of NNs is coming from the interplay of the linear maps and these simple nonlinear maps, so linear algebra is a very foundational tool for understanding them.
The transformer architecture is the fanciest new architecture that forms the foundation of modern LLMs which act as the "general pretrained network" for products such as chat-GPT. The architecture is set up with a series of "transformer blocks" each of which has a stack of "attention heads" which is still matrix transformations but set up in a special way, and then a vanilla NN.
The output of each transformer block is summed with the input to use as the input for the next transformer block. The input is called a "residual" from the terminology of resNets. So the transformer block can be thought of as "reading from" and "writing to" a "stream" of residuals passed along from one transformer block to the next like widgets on a conveyor belt, each worker doing their one operation and then letting the widget pass to the next worker.
For a language model, the input to the first transformer block is a sequence of token embeddings representing some sequence of natural language text. The output of the last transformer block is a sequence of predictions for what the next token will be based on the previous ones. So I imagine the residual stream as a high dimensional semantic space, with each transformer block making linear transformations and limited nonlinear transformations to that space to take the semantics from "sequence of words" to "likely next word".
I am interested in understanding those semantic spaces and think linear algebra, topology, and manifolds are probably good perspectives.
This is good context to have. If it is a Schelling point on LW that's probs a good enough reason to choose it as the term to adopt, although some consideration might be warranted for it's adoption in wider communities, but I can't think of any other term would work better for that.
For those who insist these aren't intelligent I like my extremely general term "Outcome Influencing System (OIS)" pronounced "oh-ee" and defined as "any system composed of capabilities and preferences which uses its capabilities informed by its preferences to influence reality towards outcomes it prefers". Then it becomes trivially true that these things have capabilities and the capabilities are improving.
The halting problem is the problem of looking at the structure of a program and using it to determine whether or not the program would halt if you ran it. There is a proof by contradiction that you can't have such a program (from the wikipedia page for the halting problem):
This is a contradiction because if halts returns true then it shouldn't have, because g will loop forever based on halts returning true, but if halts returns false then g will return None and halt, and so halts should have returned true.
But learning about this annoyed me, because it's obvious what's going on, halts is being forced into self reference, and a contradiction is only forced if halts is forced to return true or false. If it instead returned "I am embedded in a program with a 2 state logical contradiction" then that would be a better answer.
I think this is also Russell's paradox in set theory, and might be related to Godel's incompleteness. Basically the thing seems to be "self referential statements break our languages" and my feeling is just "create your languages so they talk about the nature of self reference! You're creating a problem where there does not need to be a problem"... but obviously I haven't fully understood the ideas or thought through all the implications, so it would be rash for me to make such a statement, but it is a feeling that I get.