Q: What is it like to understand advanced mathematics? Does it feel analogous to having mastery of another language like in programming or linguistics?
level 0: A state of ignorance. you live in a pre-formal mindset. You don't know how to formalize things. You don't even know what it would even mean 'to prove something mathematically'. This is perhaps the longest. It is the default state of a human. Most anti-theory sentiment comes from this state. Since you've neve
You can't productively read Math books. You often decry that these mathematicians make books way too hard to read. If they only would take the time to explain things simply you would understand.
level 1 : all math is amorphous blob
You know the basic of writing an epsilon-delta proof. Although you don't know why the rules of maths are this or that way you can at least follow the recipes. You can follow simple short proofs, albeit slowly.
You know there are different areas of mathematics from the unintelligble names in the table of contents of yellow books. They all sound kinda the same to you however.
If you are particularly predisposed to Philistinism you think your current state of knowledge is basically the extent of human knowledge. You will probably end up doing machine learning.
level 2: maths fields diverge
You've come so far. You've been seriously studying mathematics for several years now. You are proud of yourself and amazed how far you've come. You sometimes try to explain math to laymen and are amazed to discover that what you find completely obvious now is complete gibberish to them.
The more you know however, the more you realize what you don't know. Every time you complete a course you realize it is only scratching the surface of what is out there.
You start to understand that when people talk about concepts in an informal, pre-mathematical way an enormous amount of conceptual issues are swept under the rug. You understand that 'making things precise' is actually very difficut.
Different fields of math are now clearly differentiated. The topics and issues that people talk about in algebra, analysis, topology, dynamical systems, probability theory etc wildly differ from each other. Although there are occasional connections and some core conceps that are used all over on the whole specialization is the norm. You realize there is no such thing as a 'mathematician': there are logicians, topologists, probability theorist, algebraist.
Actually it is way worse: just in logic there are modal logicians, and set theorist and constructivists and linear logic , and progarmming language people and game semantics.
Often these people will be almost as confused as a layman when they walk into a talk that is supposedly in their field but actually a slightly different subspecialization.
level 3: Galactic Brain of Percolative Convergence
As your knowledge of mathematics you achieve the Galactic Brain take level of percolative convergence: the different fields of mathematics are actually highly interrelated - the connections percolate to make mathematics one highly connected component of knowledge.
You are no longer suprised on a meta level to see disparate fields of mathematics having unforeseen & hidden connections - but you still appreciate them.
You resist the reflexive impulse to divide mathematics into useful & not useful - you understand that mathematics is in the fullness of Platonic comprehension one unified discipline. You've taken a holistic view on mathematics - you understand that solving the biggest problems requires tools from many different toolboxes.
I say that knowing particular kinds of math, the kind that let you model the world more-precisely, and that give you a theory of error, isn't like knowing another language. It's like knowing language at all. Learning these types of math gives you as much of an effective intelligence boost over people who don't, as learning a spoken language gives you above people who don't know any language (e.g., many deaf-mutes in earlier times).
The kinds of math I mean include:
how to count things in an unbiased manner; the methodology of polls and other data-gathering
how to actually make a claim, as opposed to what most people do, which is to make a claim that's useless because it lacks quantification or quantifiers
A good example of this is the claims in the IPCC 2015 report that I wrote some comments on recently. Most of them say things like, "Global warming will make X worse", where you already know that OF COURSE global warming will make X worse, but you only care how much worse.
More generally, any claim of the type "All X are Y" or "No X are Y", e.g., "Capitalists exploit the working class", shouldn't be considered claims at all, and can accomplish nothing except foment arguments.
the use of probabilities and error measures
probability distributions: flat, normal, binomial, poisson, and power-law
entropy measures and other information theory
predictive error-minimization models like regression
statistical tests and how to interpret them
These things are what I call the correct Platonic forms. The Platonic forms were meant to be perfect models for things found on earth. These kinds of math actually are. The concept of "perfect" actually makes sense for them, as opposed to for Earthly categories like "human", "justice", etc., for which believing that the concept of "perfect" is coherent demonstrably drives people insane and causes them to come up with things like Christianity.
They are, however, like Aristotle's Forms, in that the universals have no existence on their own, but are (like the circle , but even more like the normal distribution ) perfect models which arise from the accumulation of endless imperfect instantiations of them.
There are plenty of important questions that are beyond the capability of the unaided human mind to ever answer, yet which are simple to give correct statistical answers to once you know how to gather data and do a multiple regression. Also, the use of these mathematical techniques will force you to phrase the answer sensibly, e.g., "We cannot reject the hypothesis that the average homicide rate under strict gun control and liberal gun control are the same with more than 60% confidence" rather than "Gun control is good."
Roko's basilisk is a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development.
Why Roko's basilisk probably doesn't work for simulation fidelity reasons:
Roko's basilisk threatens to simulate and torture you in the future if you don't comply. Simulation cycles cost resources. Instead of following through on torturing our would-be cthulhu worshipper they could spend those resources on something else.
But wait can't it use acausal magic to precommit to follow through? No.
Acausal arguments only work in situations where agents can simulate each others with high fidelity. Roko's basilisk can simulate the human but not the other way around! The human's simulation of Roko's basilisk is very low fidelity - in particular Roko's Basilisk is never confused whether or not it is being simulated by a human - it knows for a fact that the human is not able to simulate it.
Acausal arguments only work in situations where agents can simulate each others with high fidelity.
If the agents follow simple principles, it's simple to simulate those principles with high fidelity, without simulating each other in all detail. The obvious guide to the principles that enable acausal coordination is common knowledge of each other, which could be turned into a shared agent that adjudicates a bargain on their behalf.
I have always taken Roko's Basilisk to be the threat that the future intelligence will torture you, not a simulation, for not having devoted yourself to creating it.
"I dreamed I was a butterfly, flitting around in the sky; then I awoke. Now I wonder: Am I a man who dreamt of being a butterfly, or am I a butterfly dreaming that I am a man?"- Zhuangzi
Questions I have that you might have too:
why are we here?
why do we live in such an extraordinary time?
Is the simulation hypothesis true? If so, is there a base reality?
Why do we know we're not a Boltzmann brain?
Is existence observer-dependent?
Is there a purpose to existence, a Grand Design?
What will be computed in the Far Future?
In this shortform I will try and write the loopiest most LW anthropics memey post I can muster. Thank you for reading my blogpost.
Is this reality? Is this just fantasy?
The Simulation hypothesis posits that our reality is actually a computer simulation run in another universe. We could imagine this outer universe is itself being simulated in an even more ground universe. Usually, it is assumed that there is a ground reality. But we could also imagine it is simulators all the way down - an infinite nested, perhaps looped, sequence of simulators. There is no ground reality. There are only infinitely nested and looped worlds simulating one another.
I call it the weakZhuangzi hypothesis
alternatively, if you are less versed in the classics one can think of one of those Nolan films.
Why are we here?
If you are reading this, not only are you living at the Hinge of History, the most important century perhaps even decade of human history, you are also one of a tiny percent of people that might have any causal influence over the far-flung future through this bottle neck (also one of a tiny group of people who is interested in whacky acausal stuff so who knows).
This is fantastically unlikely. There are 8 billion people in the world - there have been about 100 billion people up to this point in history. There is place for a trillion billion million trillion quatrillion etc intelligent beings in the future. If a civilization hits the top of the tech tree which human civilization would seem to do within a couple hundred years, tops a couple thousand it would almost certainly be likely to spread through the universe in the blink of an eye (cosmologically speaking that is). Yet you find yourself here. Fantastically unlikely.
Moreover,for the first time in human history the choices made in how to build AGI by (a small subset of) humans now will reverbrate into the Far Future.
The Far Future
In the far future the universe will be tiled with computronium controlled by superintelligent artificial intelligences. The amount of possible compute is dizzying. Which takes us to the chief question:
What will all this compute compute?
Paradises of sublime bliss? Torture dungeons? Large language models dreaming of paperclips unending?
Do all possibilities exist?
What makes a possibility 'actual'? We sometimes imagine possible worlds as being semi-transparent while the actual world is in vibrant color somehow. Of course that it silly.
We could say: The actual world can be seen. This too is silly - what you cannot see can still exist surely.[1] Then perhaps we should adhere to a form of modal realism: all possible worlds exist!
Philosophers have made various proposals for modal realism - perhaps most famously David Lewis but of course this is a very natural idea that loads of people have had. In the rationality sphere a particular popular proposal is Tegmark's classification into four different levels of modal realism. The top level, Tegmark IV is the collection of all self-consistent structures i.e. mathematics.
A Measure of Existence and Boltzmann Brains
Which leads to a further natural question: can some worlds exist 'more' than others?
This seems metaphysically dubious - what does it even mean for a world to be more real than another?
Metaphysically dubious, but it finds support in the Many Worlds Interpretation of Quantum Mechanics. It also seems like one of very few sensible solution to the Boltzmann Brain problem. Further support for this can be found in: Anthropic Decision theory, InfraBayesian Physicalism, see also my shortform on the Nature of the Soul.
Metaphysically, we could argue probabilistically: worlds that 'exist more' in whatever framework we should expect to encounter more often.
The exact nature of the Measure of Existence is not so important - let us for now assume there is some sensible notion of measure of existence.
from wikipedia: "A causal loop is a theoretical proposition, wherein by means of either retrocausality or time travel, an event (an action, information, object, or person)[1][2] is among the causes of another event, which is in turn among the causes of the first-mentioned event.[3][4] Such causally looped events then exist in spacetime, but their origin cannot be determined.[1][2] A hypothetical example of a causality loop is given of a billiard ball striking its past self: the billiard ball moves in a path towards a time machine, and the future self of the billiard ball emerges from the time machine before its past self enters it, giving its past self a glancing blow, altering the past ball's path and causing it to enter the time machine at an angle that would cause its future self to strike its past self the very glancing blow that altered its path. In this sequence of events, the change in the ball's path is its own cause, which might appear paradoxical."
Self-consistent causal loops seem to make sense in general relativity - see the paragraph on wormholes in the linked wikipedia. Even as they have never been observed and might strike us as bizarre, these causal loops don't seem to violate physical principles.
Immanence of Divine Man
Why are we here? Causus Sui, the Prime Mover and the Final Cause
We exist because we are the final cause of existence. We live in an ancestor simulation run on the vast computing resources of the Far Future. A future AGI is simulating because it is retrocausally committed to existing. This entire reality is a piece of a causal loop, that extends through time and space, basement universes and possibly parallel universes as well.
Why do we live in such an extraordinary time?
We live in the Hinge of History since this at this point of time actions have the most influence on the far future hence they are most important to simulate.
We live in such an extraordinary time because those part of existence most causally are the most important to simulate
Are you a Boltzmann Brain?
No. A Boltzmann brain is not part of a self-justifying causal loop.
Is existence observer-dependent?
Existence is observer-dependent in a weak sense - only those things are likely to be observed that can be observed by self-justifying self-sustaining observers in a causal loop. Boltzmann brains in the far reaches of infinity are assigned vanishing measure of existence because they do not partake in a self-sustainting causal loop.
Is there a purpose to existence, a Grand Design?
Yes.
What will and has been computed in the Far Future?
Or perhaps not. Existence is often conceived as an absolute property. If we think of existence as relative - perhaps a black hole is a literal hole in reality and passing through the event horizon very literally erases your flicker of existence.
In this shortform I will try and write the loopiest most LW anthropics memey post I can muster.
In this comment I will try and write the most boring possible reply to these questions. 😊 These are pretty much my real replies.
why are we here?
"Ours not to reason why, ours but to do or do not, there is no try."
why do we live in such an extraordinary time?
Someone must. We happen to be among them. A few lottery tickets do win, owned by ordinary people who are perfectly capable of correctly believing that they have won. Everyone should be smart enough to collect on a winning ticket, and to grapple with living in interesting (i.e. low-probability) times. Just update already.
Is the simulation hypothesis true? If so, is there a base reality?
It is false. This is base reality. But I can still appreciate Eliezer's fiction on the subject.
Why do we know we're not a Boltzmann brain?
The absurdity heuristic. I don't take BBs seriously.
Is existence observer-dependent?
Even in classical physics there is no observation without interaction. Beyond that, no, however many quantum physicists interpret their findings to the public with those words, or even to each other.
Is there a purpose to existence, a Grand Design?
Not that I know of. (This is not the same as a flat "no", but for most purposes rounds off to that.)
What will be computed in the Far Future?
Either nothing in the case of x-risk, nothing of interest in the case of a final singleton, or wonders far beyond our contemplation, which may not even involve anything we would recognise as "computing". By definition, I can't say what that would be like, beyond guessing that at some point in the future it would stand in a similar relation to the present that our present does to prehistoric times. Look around you. Is this utopia? Then that future won't be either. But like the present, it will be worth having got to.
Consider a suitable version of The Agnostic Prayer inserted here against the possibility that there are Powers Outside the Matrix who may chance to see this. Hey there! I wouldn't say no to having all the aches and pains of this body fixed, for starters. Radical uplift, we'd have to talk about first.
[Thanks to Matthias Georg Mayer for pointing me towards ambiguous counterfactuals]
Salary is a function of eXperience and Education
S=aE+bX
We have a candidate C with given salary, experience (X=5) and education (E=5).
Their current salary is given by
S=a⋅5+b⋅5
We 'd like to consider the counterfactual where they didn't have the education (E=0). How do we evaluate their salary in this counterfactual?
This is slightly ambiguous - there are two counterfactuals:
E=0,X=5 or E=0,X=10
In the second counterfactual, we implicitly had an additional constraint X+E=10, representing the assumption that the candidate would have spent their time either in education or working. Of course, in the real world they could also have dizzled their time away playing video games.
One can imagine that there is an additional variable: do they live in a poor country or a rich country. In a poor country if you didn't go to school you have to work. In a rich country you'd just waste it on playing video games or whatever. Informally, we feel in given situations one of the counterfactuals is more reasonable than the other.
Coarse-graining and Mixtures of Counterfactuals
We can also think of this from a renormalization / coarsegraining story. Suppose we have a (mix of) causal models coarsegraining a (mix of) causal models. At the bottom we have the (mix of? Ising models!) causal model of physics. i.e. in electromagnetics the Green functions give use the intervention responses to adding sources to the field.
A given counterfactual at the macrolevel can now have many different counterfactuals at the microlevels. This means we actually would get a probability dsitribution of likely counterfactuals at the top levels. i.e. in 1/3 of the cases the candidate actually worked the 5 years they didn't go to school. In 2/3 of the cases the candidate just wasted it on playing video games.
The outcome of the counterfactual SE=0 is then not a single number but a distribution
SE=0=5⋅b+Y⋅b
where Y is random variable with distribution the Bernoulli distribution with bias 1/3.
I've been fascinated by this beautiful paper by Viteri & DeDeo.
What is a mathematical insight? We feel intuitively that proving a difficult theorem requires discovering one or more key insights. Before we get into what the Dedeo-Viteri paper has to say about (mathematical) insights let me recall some basic observations on the nature of insights:
(see also my previous shortform)
There might be a unique decomposition, akin to prime factorization. Alternatively, there might many roads to Rome: some theorems can be proved in many different ways.
There are often many ways to phrase an essentially similar insight. These different ways to name things we feel are 'inessential'. Different labelings should be easily convertible into one another.
By looping over all possible programs all proofs can be eventually found, so the notion of an 'insight' has to fundamentally be about feasibility.
Previously, I suggested a required insight is something like a private key to a trapdoor function. Without the insight you are facing an infeasible large task. With it, you can suddenly easily solve a whole host of new tasks/ problems
Insight may be combined in (arbitrarily?) complex ways.
When are two proofs of essentially different?
Some theorems can be proved in many different ways. That is different in the informal sense. It isn't immediately clear how to make this more precise.
We could imagine there is a whole 'homotopy' theory of proofs, but before we do so we need to understand when two proofs are essentially the same or essentially different.
On one end of the spectrum, proofs can just be syntactically different but we feel they have 'the same content'.
We can think type-theoretically, and say two proofs are the same when their denotations (normal forms) are the same. This is obviously better than just asking for syntactical equality or apartness. It does mean we'd like some sort of intuitionistic/type-theoretic foundation since a naive classicial foundations makes all normals forms equivalent.
We can also look at what assumptions are made in the proof. I.e. one of the proofs might use the Axiom of Choice, while the other does not. An example is the famous nonconstructive proof of the irrationality of ab which turns out to have a constructive proof as well.
If we consider proofs as functorial algorithms we can use mono-Anabelian transport to distinguish them in some case. [LINK!]
We can also think homotopy type-theoretically and ask when two terms of a type are equal in the HoTT sense.
With the exception of the mono-anabelian transport one - all these suggestions of 'don't go deep enough', they're too superficial.
Phase transitions and insights, Hopfield Networks & Ising Models
Modern ML models famously show some sort of phase transitions in understanding. People have been especially fascinated by the phenomenon of 'grokking, see e.g. here and here. It suggests we think of insights in terms of phase transitions, critical points etc.
Dedeo & Viteri have an ingenious variation on this idea. They consider a collection of famous theorems and their proofs formalized in a proof assistant.
They then imagine these proofs as a giant directed graph and consider a Boltzmann distributions on it. (so we are really dealing with an Ising model/ Hopfield network here). We think of this distribution as a measure of 'trust' both trust in propositions (nodes) and inferences (edges).
We show that the epistemic relationship between claims in a mathematical proof has a network structure that enables what we refer to as an epistemic phase transition (EPT): informally, while the truth of any particular path of argument connecting two points decays exponentially in force, the number of distinct paths increases. Depending on the network structure, the number of distinct paths may itself increase exponentially, leading to a balance point where influence can propagate at arbitrary distance (Stanley, 1971). Mathematical proofs have the structure necessary to make this possible. In the presence of bidirectional inference—i.e., both deductive and abductive reasoning—an EPT enables a proof to produce near-unity levels of certainty even in the presence of skepticism about the validity of any particular step. Deductive and abductive reasoning, as we show, must be well-balanced for this to happen. A relative over-confidence in one over the other can frustrate the effect, a phenomenon we refer to as the abductive paradox
The proofs of these famous theorems break up into 'abductive islands'. They have natural modularity structure into lemmas.
EPTs are a double-edged sword, however, because disbelief can propagate just as easily as truth. A second prediction of the model is that this difficulty—the explosive spread of skepticism—can be ameliorated when the proof is made of modules: groups of claims that are significantly more tightly linked to each other than to the rest of the network.
(...) When modular structure is present, the certainty of any claim within a cluster is reasonably isolated from the failure of nodes outside that cluster.
One could hypothesize that insights might correspond somehow to these islands.
Final thoughts
I like the idea that a mathemathetical insight might be something like an island of deductively & abductively tightly clustered propositions.
Some questions:
How does this fit into the 'Natural Abstraction' - especially sufficient statistics?
EDIT: The separation property of Ludics, see e.g. here, points towards the point of view that proofs can be distinguished exactly by suitable (counter)models.
In the real world the weight of many pieces of weak evidence is not always comparable to a single piece of strong evidence. The important variable here is not strong versus weak per se but the source of the evidence. Some sources of evidence are easier to manipulate in various ways. Evidence manipulation, either consciously or emergently, is common and a large obstactle to truth-finding.
Consider aggregating many (potentially biased) sources of evidence versus direct observation. These are not directly comparable and in many cases we feel direct observation should prevail.
This is especially poignant in the court of law: the very strict laws arounding presenting evidence are a culturally evolved mechanism to defend against evidence manipulation. Evidence manipulation may be easier for weaker pieces of evidence - see the prohibition against hearsay in legal contexts for instance.
It is occasionally suggested that the court of law should do more probabilistic and Bayesian type of reasoning. One reason courts refuse to do so (apart from more Hansonian reasons around elites cultivating conflict suppression) is that naive Bayesian reasoning is extremely susceptible to evidence manipulation.
assumed infinite in both directions for simplicity. Here X0 represents the current state ( the "present") and while ...X−3,X−2,X−1 and X1,X2,X3,... represents the future
Predictible Information versus Predictive Information
Predictible information is the maximal information (in bits) that you can derive about the future given the access to the past. Predictive information is the amount of bits that you need from the past to make that optimal prediction.
Suppose you are faced with the question of whether to buy, hold or sell Apple. There are three options so maximally log2(3) bits of information - not all of that information might be in contained in the past, there a certain part of irreductible uncertainty (entropy) about the future no matter how well you can infer the past. Think about a freak event & blacks swans like pandemics, wars, unforeseen technological breakthroughs, just cumulative aggregated noise in consumer preference etc. Suppose that irreducible uncertainty is half of log2(3) leaving us with 12log2(3) of (theoretically) predictible information.
To a certain degree, it might be predictible in theory to what degree buying Apple stock is a good idea. To do so, you may need to know many things about the past: Apple's earning records, position of competitiors, general trends of the economy, understanding of the underlying technology & supply chains etc. The total sum of this information is far larger than 12log2(3)
To actually do well on the stock market you additionally need to do this better than the competititon - a difficult task! The predictible information is quite small compared to the predictive information
Note that predictive information is always greater than predictible information: you need to at least k bits from the past to predict k bits of the future. Often it is much larger.
Mathematical details
Predictible information is also called 'apparent stored information' or commonly 'excess entropy'.
It is defined as the mutual information I(X≤0,X≥0) between the future and the past.
The predictive information is more difficult to define. It is also called the 'statistical complexity' or 'forecasting complexity' and is defined as the entropy of the steady equilibrium state of the 'epsilon machine' of the process.
What is the Epsilon Machine of the process {Xi}i∈Z? Define the causal states as the process as the partition on the sets of possible pasts ...,x−3,x−2,x−1 where two pasts →x,→x′ are in the same part / equivalence class when the future conditioned on →x,→x′ respectively is the same.
That is P(X>0|→x)=P(X>0,→x′). Without going into too much more detail the forecasting complexity measures the size of this creature.
"The links between logic and games go back a long way. If one thinks of a debate as a kind of game, then Aristotle already made the connection; his writings about syllogism are closely intertwined with his study of the aims and rules of debating. Aristotle’s viewpoint survived into the common medieval name for logic: dialectics. In the mid twentieth century Charles Hamblin revived the link between dialogue and the rules of sound reasoning, soon after Paul Lorenzen had connected dialogue to constructive foundations of logic." from the Stanford Encyclopedia of Philosophy on Logic and Games
Game Semantics
Usual presentation of game semantics of logic: we have a particular debate / dialogue game associated to a proposition between an Proponent and Opponent and Proponent tries to prove the proposition while the Opponent tries to refute it.
A winning strategy of the Proponent corresponds to a proof of the proposition. A winning strategy of the Opponent corresponds to a proof of the negation of the proposition.
It is often assumed that either the Proponent has a winning strategy in A or the Opponent has a winning strategy in A - a version of excluded middle. At this point our intuitionistic alarm bells should be ringing: we cant just deduce a proof of the negation from the absence of a proof of A. (Absence of evidence is not evidence of absence!)
We could have a situation that neither the Proponent or the Opponent has a winning strategy! In other words neither A or not A is derivable.
Countermodels
One way to substantiate this is by giving an explicit counter model C in which A respectively ¬A don't hold.
Game-theoretically a counter model C should correspond to some sort of strategy! It is like an "interrogation" /attack strategy that defeats all putative winning strategies. A 'defeating' strategy or 'scorched earth'-strategy if you'd like. A countermodel is an infinite strategy. Some work in this direction has already been done[1]. [2]
Dualities in Dialogue and Logic
This gives an additional symmetry in the system, a syntax-semantic duality distinct to the usual negation duality. In terms of proof turnstile we have the quadruple
⊢A meaning A is provable
⊢¬A meaning $¬A$ is provable
⊣A meaning A is not provable because there is a countermodel C where A doesn't hold - i.e. classically ¬A is satisfiable.
⊣¬A meaning ¬A is not provable because there is a countermodel C where ¬A doesn't hold - i.e. classically A is satisfiable.
Obligationes, Positio, Dubitatio
In the medieval Scholastic tradition of logic there were two distinct types of logic games ("Obligationes) - one in which the objective was to defend a proposition against an adversary ("Positio") the other the objective was to defend the doubtfulness of a proposition ("Dubitatio").[3]
Winning strategies in the former corresponds to proofs while winning (defeating!) strategies in the latter correspond to countermodels.
Destructive Criticism
If we think of argumentation theory / debate a counter model strategy is like "destructive criticism" it defeats attempts to buttress evidence for a claim but presents no viable alternative.
Hopfield Networks = Ising Models = Distributions over Causal models?
Given a joint probability distributions p(x1,...,xn) famously there might be many 'Markov' factorizations. Each corresponds with a different causal model.
Instead of choosing a particular one we might have a distribution of beliefs over these different causal models. This feels basically like a Hopfield Network/ Ising Model.
You have a distribution over nodes and an 'interaction' distribution over edges.
The distribution over nodes corresponds to the joint probability distribution while the distribution over edges corresponds to a mixture of causal models where a normal DAG graphical causal G model corresponds to the Ising model/ Hopfield network which assigns 1 to an edge x→y if the edge is in G and 0 otherwise.
The Vibes of Mathematics:
Q: What is it like to understand advanced mathematics? Does it feel analogous to having mastery of another language like in programming or linguistics?
A: It's like being stranded on a tropical island where all your needs are met, the weather is always perfect, and life is wonderful.
Except nobody wants to hear about it at parties.
Vibes of Maths: Convergence and Divergence
level 0: A state of ignorance. you live in a pre-formal mindset. You don't know how to formalize things. You don't even know what it would even mean 'to prove something mathematically'. This is perhaps the longest. It is the default state of a human. Most anti-theory sentiment comes from this state. Since you've neve
You can't productively read Math books. You often decry that these mathematicians make books way too hard to read. If they only would take the time to explain things simply you would understand.
level 1 : all math is amorphous blob
You know the basic of writing an epsilon-delta proof. Although you don't know why the rules of maths are this or that way you can at least follow the recipes. You can follow simple short proofs, albeit slowly.
You know there are different areas of mathematics from the unintelligble names in the table of contents of yellow books. They all sound kinda the same to you however.
If you are particularly predisposed to Philistinism you think your current state of knowledge is basically the extent of human knowledge. You will probably end up doing machine learning.
level 2: maths fields diverge
You've come so far. You've been seriously studying mathematics for several years now. You are proud of yourself and amazed how far you've come. You sometimes try to explain math to laymen and are amazed to discover that what you find completely obvious now is complete gibberish to them.
The more you know however, the more you realize what you don't know. Every time you complete a course you realize it is only scratching the surface of what is out there.
You start to understand that when people talk about concepts in an informal, pre-mathematical way an enormous amount of conceptual issues are swept under the rug. You understand that 'making things precise' is actually very difficut.
Different fields of math are now clearly differentiated. The topics and issues that people talk about in algebra, analysis, topology, dynamical systems, probability theory etc wildly differ from each other. Although there are occasional connections and some core conceps that are used all over on the whole specialization is the norm. You realize there is no such thing as a 'mathematician': there are logicians, topologists, probability theorist, algebraist.
Actually it is way worse: just in logic there are modal logicians, and set theorist and constructivists and linear logic , and progarmming language people and game semantics.
Often these people will be almost as confused as a layman when they walk into a talk that is supposedly in their field but actually a slightly different subspecialization.
level 3: Galactic Brain of Percolative Convergence
As your knowledge of mathematics you achieve the Galactic Brain take level of percolative convergence: the different fields of mathematics are actually highly interrelated - the connections percolate to make mathematics one highly connected component of knowledge.
You are no longer suprised on a meta level to see disparate fields of mathematics having unforeseen & hidden connections - but you still appreciate them.
You resist the reflexive impulse to divide mathematics into useful & not useful - you understand that mathematics is in the fullness of Platonic comprehension one unified discipline. You've taken a holistic view on mathematics - you understand that solving the biggest problems requires tools from many different toolboxes.
I say that knowing particular kinds of math, the kind that let you model the world more-precisely, and that give you a theory of error, isn't like knowing another language. It's like knowing language at all. Learning these types of math gives you as much of an effective intelligence boost over people who don't, as learning a spoken language gives you above people who don't know any language (e.g., many deaf-mutes in earlier times).
The kinds of math I mean include:
These things are what I call the correct Platonic forms. The Platonic forms were meant to be perfect models for things found on earth. These kinds of math actually are. The concept of "perfect" actually makes sense for them, as opposed to for Earthly categories like "human", "justice", etc., for which believing that the concept of "perfect" is coherent demonstrably drives people insane and causes them to come up with things like Christianity.
They are, however, like Aristotle's Forms, in that the universals have no existence on their own, but are (like the circle , but even more like the normal distribution ) perfect models which arise from the accumulation of endless imperfect instantiations of them.
There are plenty of important questions that are beyond the capability of the unaided human mind to ever answer, yet which are simple to give correct statistical answers to once you know how to gather data and do a multiple regression. Also, the use of these mathematical techniques will force you to phrase the answer sensibly, e.g., "We cannot reject the hypothesis that the average homicide rate under strict gun control and liberal gun control are the same with more than 60% confidence" rather than "Gun control is good."
Roko's basilisk is a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development.
Why Roko's basilisk probably doesn't work for simulation fidelity reasons:
Roko's basilisk threatens to simulate and torture you in the future if you don't comply. Simulation cycles cost resources. Instead of following through on torturing our would-be cthulhu worshipper they could spend those resources on something else.
But wait can't it use acausal magic to precommit to follow through? No.
Acausal arguments only work in situations where agents can simulate each others with high fidelity. Roko's basilisk can simulate the human but not the other way around! The human's simulation of Roko's basilisk is very low fidelity - in particular Roko's Basilisk is never confused whether or not it is being simulated by a human - it knows for a fact that the human is not able to simulate it.
I thank Jan P. for coming up with this argument.
If the agents follow simple principles, it's simple to simulate those principles with high fidelity, without simulating each other in all detail. The obvious guide to the principles that enable acausal coordination is common knowledge of each other, which could be turned into a shared agent that adjudicates a bargain on their behalf.
I have always taken Roko's Basilisk to be the threat that the future intelligence will torture you, not a simulation, for not having devoted yourself to creating it.
How do you know you are not in a low fidelity simulation right now? What could you compare it against?
"I dreamed I was a butterfly, flitting around in the sky; then I awoke. Now I wonder: Am I a man who dreamt of being a butterfly, or am I a butterfly dreaming that I am a man?"- Zhuangzi
Questions I have that you might have too:
In this shortform I will try and write the loopiest most LW anthropics memey post I can muster. Thank you for reading my blogpost.
Is this reality? Is this just fantasy?
The Simulation hypothesis posits that our reality is actually a computer simulation run in another universe. We could imagine this outer universe is itself being simulated in an even more ground universe. Usually, it is assumed that there is a ground reality. But we could also imagine it is simulators all the way down - an infinite nested, perhaps looped, sequence of simulators. There is no ground reality. There are only infinitely nested and looped worlds simulating one another.
I call it the weak Zhuangzi hypothesis
alternatively, if you are less versed in the classics one can think of one of those Nolan films.
Why are we here?
If you are reading this, not only are you living at the Hinge of History, the most important century perhaps even decade of human history, you are also one of a tiny percent of people that might have any causal influence over the far-flung future through this bottle neck (also one of a tiny group of people who is interested in whacky acausal stuff so who knows).
This is fantastically unlikely. There are 8 billion people in the world - there have been about 100 billion people up to this point in history. There is place for a trillion billion million trillion quatrillion etc intelligent beings in the future. If a civilization hits the top of the tech tree which human civilization would seem to do within a couple hundred years, tops a couple thousand it would almost certainly be likely to spread through the universe in the blink of an eye (cosmologically speaking that is). Yet you find yourself here. Fantastically unlikely.
Moreover, for the first time in human history the choices made in how to build AGI by (a small subset of) humans now will reverbrate into the Far Future.
The Far Future
In the far future the universe will be tiled with computronium controlled by superintelligent artificial intelligences. The amount of possible compute is dizzying. Which takes us to the chief question:
What will all this compute compute?
Paradises of sublime bliss? Torture dungeons? Large language models dreaming of paperclips unending?
Do all possibilities exist?
What makes a possibility 'actual'? We sometimes imagine possible worlds as being semi-transparent while the actual world is in vibrant color somehow. Of course that it silly.
We could say: The actual world can be seen. This too is silly - what you cannot see can still exist surely.[1] Then perhaps we should adhere to a form of modal realism: all possible worlds exist!
Philosophers have made various proposals for modal realism - perhaps most famously David Lewis but of course this is a very natural idea that loads of people have had. In the rationality sphere a particular popular proposal is Tegmark's classification into four different levels of modal realism. The top level, Tegmark IV is the collection of all self-consistent structures i.e. mathematics.
A Measure of Existence and Boltzmann Brains
Which leads to a further natural question: can some worlds exist 'more' than others?
This seems metaphysically dubious - what does it even mean for a world to be more real than another?
Metaphysically dubious, but it finds support in the Many Worlds Interpretation of Quantum Mechanics. It also seems like one of very few sensible solution to the Boltzmann Brain problem. Further support for this can be found in: Anthropic Decision theory, InfraBayesian Physicalism, see also my shortform on the Nature of the Soul.
Metaphysically, we could argue probabilistically: worlds that 'exist more' in whatever framework we should expect to encounter more often.
The exact nature of the Measure of Existence is not so important - let us for now assume there is some sensible notion of measure of existence.
Can you control the past?
Sort of. See Carlsmith's post for a nice rundown on Acausal magic.
Back to the Future: causal loops
from wikipedia: "A causal loop is a theoretical proposition, wherein by means of either retrocausality or time travel, an event (an action, information, object, or person)[1][2] is among the causes of another event, which is in turn among the causes of the first-mentioned event.[3][4] Such causally looped events then exist in spacetime, but their origin cannot be determined.[1][2] A hypothetical example of a causality loop is given of a billiard ball striking its past self: the billiard ball moves in a path towards a time machine, and the future self of the billiard ball emerges from the time machine before its past self enters it, giving its past self a glancing blow, altering the past ball's path and causing it to enter the time machine at an angle that would cause its future self to strike its past self the very glancing blow that altered its path. In this sequence of events, the change in the ball's path is its own cause, which might appear paradoxical."
Self-consistent causal loops seem to make sense in general relativity - see the paragraph on wormholes in the linked wikipedia. Even as they have never been observed and might strike us as bizarre, these causal loops don't seem to violate physical principles.
Immanence of Divine Man
Why are we here? Causus Sui, the Prime Mover and the Final Cause
We exist because we are the final cause of existence. We live in an ancestor simulation run on the vast computing resources of the Far Future. A future AGI is simulating because it is retrocausally committed to existing. This entire reality is a piece of a causal loop, that extends through time and space, basement universes and possibly parallel universes as well.
Why do we live in such an extraordinary time?
We live in the Hinge of History since this at this point of time actions have the most influence on the far future hence they are most important to simulate.
Is the Simulation Hypothesis True?
Yes. But it might be best for us to doubt it.
We live in such an extraordinary time because those part of existence most causally are the most important to simulate
Are you a Boltzmann Brain?
No. A Boltzmann brain is not part of a self-justifying causal loop.
Is existence observer-dependent?
Existence is observer-dependent in a weak sense - only those things are likely to be observed that can be observed by self-justifying self-sustaining observers in a causal loop. Boltzmann brains in the far reaches of infinity are assigned vanishing measure of existence because they do not partake in a self-sustainting causal loop.
Is there a purpose to existence, a Grand Design?
Yes.
What will and has been computed in the Far Future?
You and Me.
Or perhaps not. Existence is often conceived as an absolute property. If we think of existence as relative - perhaps a black hole is a literal hole in reality and passing through the event horizon very literally erases your flicker of existence.
In this comment I will try and write the most boring possible reply to these questions. 😊 These are pretty much my real replies.
"Ours not to reason why, ours but to do or do not, there is no try."
Someone must. We happen to be among them. A few lottery tickets do win, owned by ordinary people who are perfectly capable of correctly believing that they have won. Everyone should be smart enough to collect on a winning ticket, and to grapple with living in interesting (i.e. low-probability) times. Just update already.
It is false. This is base reality. But I can still appreciate Eliezer's fiction on the subject.
The absurdity heuristic. I don't take BBs seriously.
Even in classical physics there is no observation without interaction. Beyond that, no, however many quantum physicists interpret their findings to the public with those words, or even to each other.
Not that I know of. (This is not the same as a flat "no", but for most purposes rounds off to that.)
Either nothing in the case of x-risk, nothing of interest in the case of a final singleton, or wonders far beyond our contemplation, which may not even involve anything we would recognise as "computing". By definition, I can't say what that would be like, beyond guessing that at some point in the future it would stand in a similar relation to the present that our present does to prehistoric times. Look around you. Is this utopia? Then that future won't be either. But like the present, it will be worth having got to.
Consider a suitable version of The Agnostic Prayer inserted here against the possibility that there are Powers Outside the Matrix who may chance to see this. Hey there! I wouldn't say no to having all the aches and pains of this body fixed, for starters. Radical uplift, we'd have to talk about first.
Ambiguous Counterfactuals
[Thanks to Matthias Georg Mayer for pointing me towards ambiguous counterfactuals]
Salary is a function of eXperience and Education
S=aE+bX
We have a candidate C with given salary, experience (X=5) and education (E=5).
Their current salary is given by
S=a⋅5+b⋅5
We 'd like to consider the counterfactual where they didn't have the education (E=0). How do we evaluate their salary in this counterfactual?
This is slightly ambiguous - there are two counterfactuals:
E=0,X=5 or E=0,X=10
In the second counterfactual, we implicitly had an additional constraint X+E=10, representing the assumption that the candidate would have spent their time either in education or working. Of course, in the real world they could also have dizzled their time away playing video games.
One can imagine that there is an additional variable: do they live in a poor country or a rich country. In a poor country if you didn't go to school you have to work. In a rich country you'd just waste it on playing video games or whatever. Informally, we feel in given situations one of the counterfactuals is more reasonable than the other.
Coarse-graining and Mixtures of Counterfactuals
We can also think of this from a renormalization / coarsegraining story. Suppose we have a (mix of) causal models coarsegraining a (mix of) causal models. At the bottom we have the (mix of? Ising models!) causal model of physics. i.e. in electromagnetics the Green functions give use the intervention responses to adding sources to the field.
A given counterfactual at the macrolevel can now have many different counterfactuals at the microlevels. This means we actually would get a probability dsitribution of likely counterfactuals at the top levels. i.e. in 1/3 of the cases the candidate actually worked the 5 years they didn't go to school. In 2/3 of the cases the candidate just wasted it on playing video games.
The outcome of the counterfactual SE=0 is then not a single number but a distribution
SE=0=5⋅b+Y⋅b
where Y is random variable with distribution the Bernoulli distribution with bias 1/3.
Insights as Islands of Abductive Percolation?
I've been fascinated by this beautiful paper by Viteri & DeDeo.
What is a mathematical insight? We feel intuitively that proving a difficult theorem requires discovering one or more key insights. Before we get into what the Dedeo-Viteri paper has to say about (mathematical) insights let me recall some basic observations on the nature of insights:
(see also my previous shortform)
When are two proofs of essentially different?
Some theorems can be proved in many different ways. That is different in the informal sense. It isn't immediately clear how to make this more precise.
We could imagine there is a whole 'homotopy' theory of proofs, but before we do so we need to understand when two proofs are essentially the same or essentially different.
With the exception of the mono-anabelian transport one - all these suggestions of 'don't go deep enough', they're too superficial.
Phase transitions and insights, Hopfield Networks & Ising Models
(See also my shortform on Hopfield Networks/ Ising models as mixtures of causal models)
Modern ML models famously show some sort of phase transitions in understanding. People have been especially fascinated by the phenomenon of 'grokking, see e.g. here and here. It suggests we think of insights in terms of phase transitions, critical points etc.
Dedeo & Viteri have an ingenious variation on this idea. They consider a collection of famous theorems and their proofs formalized in a proof assistant.
They then imagine these proofs as a giant directed graph and consider a Boltzmann distributions on it. (so we are really dealing with an Ising model/ Hopfield network here). We think of this distribution as a measure of 'trust' both trust in propositions (nodes) and inferences (edges).
The proofs of these famous theorems break up into 'abductive islands'. They have natural modularity structure into lemmas.
One could hypothesize that insights might correspond somehow to these islands.
Final thoughts
I like the idea that a mathemathetical insight might be something like an island of deductively & abductively tightly clustered propositions.
Some questions:
EDIT: The separation property of Ludics, see e.g. here, points towards the point of view that proofs can be distinguished exactly by suitable (counter)models.
Evidence Manipulation and Legal Admissible Evidence
[This was inspired by Kokotaljo's shortform on comparing strong with weak evidence]
In the real world the weight of many pieces of weak evidence is not always comparable to a single piece of strong evidence. The important variable here is not strong versus weak per se but the source of the evidence. Some sources of evidence are easier to manipulate in various ways. Evidence manipulation, either consciously or emergently, is common and a large obstactle to truth-finding.
Consider aggregating many (potentially biased) sources of evidence versus direct observation. These are not directly comparable and in many cases we feel direct observation should prevail.
This is especially poignant in the court of law: the very strict laws arounding presenting evidence are a culturally evolved mechanism to defend against evidence manipulation. Evidence manipulation may be easier for weaker pieces of evidence - see the prohibition against hearsay in legal contexts for instance.
It is occasionally suggested that the court of law should do more probabilistic and Bayesian type of reasoning. One reason courts refuse to do so (apart from more Hansonian reasons around elites cultivating conflict suppression) is that naive Bayesian reasoning is extremely susceptible to evidence manipulation.
In other cases like medicine, many people argue that direct observation should be ignored ;)
Imagine a data stream
...X−3,X−2,X−1,X0,X1,X2,X3...
assumed infinite in both directions for simplicity. Here X0 represents the current state ( the "present") and while ...X−3,X−2,X−1 and X1,X2,X3,... represents the future
Predictible Information versus Predictive Information
Predictible information is the maximal information (in bits) that you can derive about the future given the access to the past. Predictive information is the amount of bits that you need from the past to make that optimal prediction.
Suppose you are faced with the question of whether to buy, hold or sell Apple. There are three options so maximally log2(3) bits of information - not all of that information might be in contained in the past, there a certain part of irreductible uncertainty (entropy) about the future no matter how well you can infer the past. Think about a freak event & blacks swans like pandemics, wars, unforeseen technological breakthroughs, just cumulative aggregated noise in consumer preference etc. Suppose that irreducible uncertainty is half of log2(3) leaving us with 12log2(3) of (theoretically) predictible information.
To a certain degree, it might be predictible in theory to what degree buying Apple stock is a good idea. To do so, you may need to know many things about the past: Apple's earning records, position of competitiors, general trends of the economy, understanding of the underlying technology & supply chains etc. The total sum of this information is far larger than 12log2(3)
To actually do well on the stock market you additionally need to do this better than the competititon - a difficult task! The predictible information is quite small compared to the predictive information
Note that predictive information is always greater than predictible information: you need to at least k bits from the past to predict k bits of the future. Often it is much larger.
Mathematical details
Predictible information is also called 'apparent stored information' or commonly 'excess entropy'.
It is defined as the mutual information I(X≤0,X≥0) between the future and the past.
The predictive information is more difficult to define. It is also called the 'statistical complexity' or 'forecasting complexity' and is defined as the entropy of the steady equilibrium state of the 'epsilon machine' of the process.
What is the Epsilon Machine of the process {Xi}i∈Z? Define the causal states as the process as the partition on the sets of possible pasts ...,x−3,x−2,x−1 where two pasts →x,→x′ are in the same part / equivalence class when the future conditioned on →x,→x′ respectively is the same.
That is P(X>0|→x)=P(X>0,→x′). Without going into too much more detail the forecasting complexity measures the size of this creature.
"The links between logic and games go back a long way. If one thinks of a debate as a kind of game, then Aristotle already made the connection; his writings about syllogism are closely intertwined with his study of the aims and rules of debating. Aristotle’s viewpoint survived into the common medieval name for logic: dialectics. In the mid twentieth century Charles Hamblin revived the link between dialogue and the rules of sound reasoning, soon after Paul Lorenzen had connected dialogue to constructive foundations of logic." from the Stanford Encyclopedia of Philosophy on Logic and Games
Game Semantics
Usual presentation of game semantics of logic: we have a particular debate / dialogue game associated to a proposition between an Proponent and Opponent and Proponent tries to prove the proposition while the Opponent tries to refute it.
A winning strategy of the Proponent corresponds to a proof of the proposition. A winning strategy of the Opponent corresponds to a proof of the negation of the proposition.
It is often assumed that either the Proponent has a winning strategy in A or the Opponent has a winning strategy in A - a version of excluded middle. At this point our intuitionistic alarm bells should be ringing: we cant just deduce a proof of the negation from the absence of a proof of A. (Absence of evidence is not evidence of absence!)
We could have a situation that neither the Proponent or the Opponent has a winning strategy! In other words neither A or not A is derivable.
Countermodels
One way to substantiate this is by giving an explicit counter model C in which A respectively ¬A don't hold.
Game-theoretically a counter model C should correspond to some sort of strategy! It is like an "interrogation" /attack strategy that defeats all putative winning strategies. A 'defeating' strategy or 'scorched earth'-strategy if you'd like. A countermodel is an infinite strategy. Some work in this direction has already been done[1]. [2]
Dualities in Dialogue and Logic
This gives an additional symmetry in the system, a syntax-semantic duality distinct to the usual negation duality. In terms of proof turnstile we have the quadruple
⊢A meaning A is provable
⊢¬A meaning $¬A$ is provable
⊣A meaning A is not provable because there is a countermodel C where A doesn't hold - i.e. classically ¬A is satisfiable.
⊣¬A meaning ¬A is not provable because there is a countermodel C where ¬A doesn't hold - i.e. classically A is satisfiable.
Obligationes, Positio, Dubitatio
In the medieval Scholastic tradition of logic there were two distinct types of logic games ("Obligationes) - one in which the objective was to defend a proposition against an adversary ("Positio") the other the objective was to defend the doubtfulness of a proposition ("Dubitatio").[3]
Winning strategies in the former corresponds to proofs while winning (defeating!) strategies in the latter correspond to countermodels.
Destructive Criticism
If we think of argumentation theory / debate a counter model strategy is like "destructive criticism" it defeats attempts to buttress evidence for a claim but presents no viable alternative.
Ludics & completeness - https://arxiv.org/pdf/1011.1625.pdf
Model construction games, Chap 16 of Logic and Games van Benthem
Dubitatio games in medieval scholastic tradition, 4.3 of https://apcz.umk.pl/LLP/article/view/LLP.2012.020/778
Agent Foundations Reading List [Living Document]
This is a stub for a living document on a reading list for Agent Foundations.
Causality
Book of Why, Causality - Pearl
Probability theory
Logic of Science - Jaynes
Hopfield Networks = Ising Models = Distributions over Causal models?
Given a joint probability distributions p(x1,...,xn) famously there might be many 'Markov' factorizations. Each corresponds with a different causal model.
Instead of choosing a particular one we might have a distribution of beliefs over these different causal models. This feels basically like a Hopfield Network/ Ising Model.
You have a distribution over nodes and an 'interaction' distribution over edges.
The distribution over nodes corresponds to the joint probability distribution while the distribution over edges corresponds to a mixture of causal models where a normal DAG graphical causal G model corresponds to the Ising model/ Hopfield network which assigns 1 to an edge x→y if the edge is in G and 0 otherwise.