Yeah! I think distance, direction, and position (not topology) are at least locally important in semantic spaces, if not globally important, but continuity and connectedness (yes topology) are probably important for understanding the different semantic regions, especially since so much of what neural nets seem to do is warping the spaces in a way that wouldn't change anything about them from a topological perspective!
subspaces of it where meaningful information might actually be stored
At least for vanilla networks, the input can be embedded into higher dimensions or projected into lower dimensions, so you're only ever really throwing away information, which I think is an interesting perspective for when thinking about the idea that meaningful information would be stored in different subspaces. It feels to me more like specific kinds of data points (inputs) which had specific locations in the input distribution would, if you projected their activation for some layer into some subspace, tell you something about that input. But whatever it tells you was in the semantic topology of the input distribution, it just needed to be transformed geometrically before you could do a simple projection to a subspace to see it.
Interesting. Some thoughts:
AI alignment has been getting so much bigger as a field! It's encouraging, but we still have a long way to go imo.
Did you see Shallow review of technical AI safety, 2025? I'd recomment looking through that post or their shallow review website and finding something that seems interesting and starting there. Each sub-domain has its own set of jargon and assumptions so I wouldn't worry too much about trying to learn the foundations since we don't have a common set of foundations yet.
Just reading posts isn't bad, but since there isn't that common set of foundations, it could be confusing when you're just starting out (or even when you're quite experienced).
Good luck and glad to have you!
I sadly do not have the capacity to learn about the majority of it.
Sadly, it's a problem you share with me and most humans, I think, with possible rare exceptions like Paul Erdős.
I'll try to build up a quick sketch of what the residual stream is, forgive me if I say things that are basic, obtuse, or slightly wrong for brevity.
All neural networks (NN) are built using linear transformations/maps which in NN jargon are called "weights" and non-linear maps called "activation functions". The output the activation functions are called "activations". There are also special kinds of maps and operations depending on the "architecture" of the NN (eg: convNet, resNet, LSTM, Transformer).
A vanilla NN is just a series of "layers" consisting of a linear map and then an activation function.
The activation functions are not complicated nonlinear maps, but quite simple to understand. One of the most common, ReLu, can be understood as "for all vectors, leave positive components alone, set negative components to 0" or "project all negative orthants onto the 0 hyperplane". So, since most of the complex behaviour of NNs is coming from the interplay of the linear maps and these simple nonlinear maps, so linear algebra is a very foundational tool for understanding them.
The transformer architecture is the fanciest new architecture that forms the foundation of modern LLMs which act as the "general pretrained network" for products such as chat-GPT. The architecture is set up with a series of "transformer blocks" each of which has a stack of "attention heads" which is still matrix transformations but set up in a special way, and then a vanilla NN.
The output of each transformer block is summed with the input to use as the input for the next transformer block. The input is called a "residual" from the terminology of resNets. So the transformer block can be thought of as "reading from" and "writing to" a "stream" of residuals passed along from one transformer block to the next like widgets on a conveyor belt, each worker doing their one operation and then letting the widget pass to the next worker.
For a language model, the input to the first transformer block is a sequence of token embeddings representing some sequence of natural language text. The output of the last transformer block is a sequence of predictions for what the next token will be based on the previous ones. So I imagine the residual stream as a high dimensional semantic space, with each transformer block making linear transformations and limited nonlinear transformations to that space to take the semantics from "sequence of words" to "likely next word".
I am interested in understanding those semantic spaces and think linear algebra, topology, and manifolds are probably good perspectives.
This is good context to have. If it is a Schelling point on LW that's probs a good enough reason to choose it as the term to adopt, although some consideration might be warranted for it's adoption in wider communities, but I can't think of any other term would work better for that.
For those who insist these aren't intelligent I like my extremely general term "Outcome Influencing System (OIS)" pronounced "oh-ee" and defined as "any system composed of capabilities and preferences which uses its capabilities informed by its preferences to influence reality towards outcomes it prefers". Then it becomes trivially true that these things have capabilities and the capabilities are improving.
What do you think of "Locked In AI (LIAI)" for when an AI becomes sufficiently capable that it's preferences / utility function are "locked in" and can no longer be altered or avoided by other agents? This "locking in" is how I refer to the theoretical point of an RSI when it becomes too late to stop or alter course.
Also, for what it's worth, I like "artificial general super intelligence (AGSI) which then frees up "AGI" for AI that does general reasoning and language at any level of capability, and hilariously frees up "ASI" to refer to any AI that does what it does better than any human, so a pocket calculator is an ASI because it does arithmetic better than any human. Though more confusing, LLMs would be ASI and not AGI because they are superhuman at text prediction, but chatbots made from LLMs would be AGI and not ASI because they reason and talk with general intelligence, but it seems more limited in some ways than human reasoning.
I like the pointing out things that need names and attempting to name them. Good stuff!
To rephrase the definition pointed to by "neuro-scaffold" to see if I understood, it is "an integration of ML models and non-ML computer programs that creates nontrivial capabilities beyond those of the ML model or computer program"?
Naively I would refer to this as an "ML deployment" but the "... nontrivial capabilities beyond..." aspect is important and not implied by "ML deployment", so "ML integration" might be better, but both are clunky and "ML" can refer to many data science and AI techniques other than neural nets, so I think we're stuck with the "neuro" terminology. Although, I think I would prefer if people called them "multi-layer perceptrons" to disambiguate them from the biological neurons they were inspired by. "Artificial neural networks" would also be an improvement. "MLP" or "ANN".
I think I dislike "scaffold" because it implies a temporary structure used for building or repairing another structure, and I don't think that represents the programs the ANNs are integrated with well. The program might be temporary, but it might not be. So it could perhaps be called an "integrated ANN system" or "integrated MLP system", or acronymized, an "IANN" or "IMLP". But these suggestions seem klunky. They don't seem as easy to say or understand as a "neuro-scaffold", so "neuro-scaffold" is probably a better term despite the issues I have with the words "neuro" and the words "scaffold".
"Wuckles" means "I am surprised and confused, and, it's kinda interesting that I'm surprised and confused."
[...] if you were gonna say "wha?" because you were a bit confused. But, something about it felt silly enough to warrant adding an 'uckles' to the end of it.
[...] wanted to say "huh" but something felt interesting about the huh and I wanted to draw extra attention to it.
I like this style of language creation, but I'm a bit confused. It could be that:
Or maybe it's some combination or amalgamation of those meanings?
I really like this. I think there's two different vibes here I like: (1) evaluation of arguments should compare to other possible arguments within some reference class rather than being based solely on the properties of the argument, and (2) more tangentially, the truth value of a system of statements need not just be "true" or "false", "consistent" or "inconsistent".
I don't really have much more to say about (1) other than that I liked your explanations of the loopy blue universe and tessellating beetles. Both are absurd, but have explanatory power. I would like to explore more about the relationship of arguments to their reference classes and the structure of possible argument reference classes.
I like thinking about (2) as digraphs where each node is a statement and arrows are references to other statements, then each statement has a set of possible truth values which either makes the graph inconsistent or not. Some graphs may have a single, or multiple consistent sets of truth values, but there is more to be said about inconsistent ones than merely that they are inconsistent. If at each iteration you "updated" the truth value for the node corresponding to each inconsistent arrow, you would get patterns of propagation based on the starting state. Further, an obvious way to force an inconsistent graph is to have cycles in the graph with an odd number of "bit flip" arrows, which assert some reference is false, and so must be false if the referent is true. Then you can ask about the other possible structures which introduce forced inconsistency.
( I'm sure people have focused on this, but I'm not sure what it would be called... )
One application for this might be that if you could throw out all inconsistency forcing structures you could (perhaps) find the subset of programs for which the halting problem doesn't apply.