This is a special post for short-form writing by Alexander Gietelink Oldenziel. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.

This is a special post for short-form writing by Alexander Gietelink Oldenziel. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.

Alexander Gietelink Oldenziel's Shortform

10Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

37Alexander Gietelink Oldenziel

3Daniel Murfet

1nim

1Tao Lin

2Alexander Gietelink Oldenziel

3quetzal_rainbow

2Alexander Gietelink Oldenziel

1quetzal_rainbow

4Alexander Gietelink Oldenziel

2Nathan Helm-Burger

2Alexander Gietelink Oldenziel

5Garrett Baker

4Alexander Gietelink Oldenziel

7JBlack

3Alexander Gietelink Oldenziel

3Vladimir_Nesov

1lukehmiles

2Alexander Gietelink Oldenziel

2Michaël Trazzi

2niplav

2acertain

2Alexander Gietelink Oldenziel

2Thomas Kwa

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

13Alexander Gietelink Oldenziel

1Zach Furman

2Alexander Gietelink Oldenziel

1Zach Furman

3Alexander Gietelink Oldenziel

1Zach Furman

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

3Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

14Alexander Gietelink Oldenziel

5Daniel Murfet

2Noosphere89

3RHollerith

8Thomas Kwa

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2JBlack

3Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

1Daniel Murfet

4Alexander Gietelink Oldenziel

14Alexander Gietelink Oldenziel

8johnswentworth

4Alexander Gietelink Oldenziel

5Noosphere89

1Alexander Gietelink Oldenziel

2ChristianKl

3Alexander Gietelink Oldenziel

1Richard_Kennaway

1TAG

2Vladimir_Nesov

1Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

3Alexander Gietelink Oldenziel

2Richard_Kennaway

13Alexander Gietelink Oldenziel

1Daniel Murfet

7PhilGoetz

1Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

1Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

Pockets of Deep ExpertiseWhy am I so bullish on academic outreach? Why do I keep hammering on 'getting the adults in the room'?

It's not that I think academics are all Super Smart.

I think rationalists/alignment people correctly ascertain that most professors don't have much useful to say about alignment & deep learning and often say silly things. They correctly see that much of AI congress is fueled by labs and scale not ML academia. I am bullish on non-ML academia, especially mathematics, physics and to a lesser extent theoretical CS, neuroscience, some parts of ML/ AI academia. This is because

while I think 95 % of academia is bad and/or uselessthere arePockets of Deep Expertise.Most questions in alignment are close to existing work in academia in some sense - but we have to make the connection!A good example is 'sparse coding' and 'compressed sensing'. Lots of mech.interp has been rediscovering some of the basic ideas of sparse coding. But there is vast expertise in academia about these topics. We should leverage these!

Other examples are singular learning theory, computational mechanics, etc

Abnormalised sampling?Probability theory talks about sampling for probability distributions, i.e.

normalized measures. However, non-normalized measures abound: weighted automata, infra-stuff, uniform priors on noncompact spaces, wealth in logical-inductor esque math, quantum stuff?? etc.Most of probability theory constructions go through just for arbitrary measures, doesn't need the normalization assumption. Except, crucially, sampling.

What does it even mean to sample from a non-normalized measure? What is

~~unnormalized~~abnormal sampling?I don't know.... (read more)

Why don't animals have guns?Or why didn't evolution evolve the Hydralisk?

Evolution has found (sometimes multiple times) the camera, general intelligence, nanotech, electronavigation, aerial endurance better than any drone, robots more flexible than any human-made drone, highly efficient photosynthesis, etc.

First of all let's answer another question: why didn't evolution evolve the wheel like the alien wheeled elephants in His Dark Materials?

Is it biologically impossible to evolve?

Well, technically, the flagella of various bacteria is a proper wheel.

No the likely answer is that wheels are great when you have roads and suck when you don't. Roads are build by ants to some degree but on the whole probably don't make sense for an animal-intelligence species.

Aren't there animals that use projectiles?Hold up. Is it actually true that there is not a single animal with a gun, harpoon or other projectile weapon?

Porcupines have quils, some snakes spit venom, a type of fish spits water as a projectile to kick insects of leaves than eats insects. Bombadier beetles can produce an explosive chemical mixture. Skunks use some other chemicals. Some snails shoot harpoons from very c... (read more)

Reasonable interpretations of Recursive Self Improvement are either trivial, tautological or false?- (Trivial) AIs will do RSI by using more hardware - trivial form of RSI
- (Tautological) Humans engage in a form of (R)SI when they engage in meta-cognition. i.e. therapy is plausibly a form of metacognition. Meta-cognition is plausible one of the remaining hallmarks of true general intelligence. See Vanessa Kosoy's "Meta-Cognitive Agents".

... (read more)In this view, AGIs will naturally engage in meta-cognition because they're generally intelligent. The

SLT and phase transitionsThe morphogenetic SLT story says that during training the Bayesian posterior concentrates around a series of subspaces W0(1)⇝...⇝W0(n) with rlcts λ1<...<λn and losses L1=L(w1),...,Ln=L(wn),wi∈W0(i). As the size of the data sample N is scaled the Bayesian posterior makes transitions W0(i)⇝W0(i+1) trading off higher complexity (higher λi+1>λi) for better accuracy (lower loss Li+1<Li).

This is the radical new framework of SLT: phase transitions happen i... (read more)

Alignment by Simulation?I've heard this alignment plan that is a variation of 'simulate top alignment researchers' with an LLM. Usually the poor alignment researcher in question is Paul.

This strikes me as deeply unserious and I am confused why it is having so much traction.

That AI-assisted alignment is coming (indeed, is already here!) is undeniable. But even somewhat accurately simulating a human from textdata is a crazy sci-fi ability, probably not even physically possible. It seems to ascribe nearly magical abilities to LLMs.

Predicting... (read more)

Fractal Fuzz: making up for sizeGPT-3 recognizes 50k possible tokens. For a 1000 token context window that means there are (5⋅105)103≈105000 possible prompts. Astronomically large. If we assume the output of a single run of gpt is 200 tokens then for each possible prompt there are ≈102500 possible continuations.

GPT-3 is probabilistic, defining for each possible prompt x (≈105000) a distribution q(x) on a set of size 102500, in other words a 102500−1 dimensional space.

^{[1]}Mind-boggingly large. Compared to these numbers the amount of data (40 trillion tokens??) and the size of the model (175 billion parameters) seems absolutely puny in comparison.

I won't be talking about the data, or 'overparameterizations' in this short, that is well-explained by Singular Learning Theory. Instead, I will be talking about nonrealizability.

Nonrealizability & the structure of natural dataRecall the setup of (parametric) Bayesian learning: there is a sample space Ω, a true distribution q(x) on Ω and a parameterized family of probability distributions p(x|w),w∈W⊂Rd.

It is often assumed that the true distrib... (read more)

Trivial but important

Aumann agreement can fail for purely epistemic reasons because real-world minds do not do Bayesian updating. Bayesian updating is intractable so realistic minds sample from the prior. This is how e.g. gradient descent works and also how human minds work.

In this situation a two minds can end in two different basins with similar loss on the data. Because of computational limitations. These minds can have genuinely different expectation for generalization.

(Of course this does not contradict the statement of the theorem which is correct.)

Optimal Forward-chaining versus backward-chaining.In general, this is going to depend on the domain. In environments for which we have many expert samples and there are many existing techniques backward-chaining is key. (i.e. deploying resources & applying best practices in business & industrial contexts)

In open-ended environments such as those arising Science, especially pre-paradigmatic fields backward-chaining and explicit plans breakdown quickly.

Incremental vs CumulativeIncremental: 90% forward chaining 10% backward chaining f... (read more)

Corrupting influencesThe EA AI safety strategy has had a large focus on placing EA-aligned people in A(G)I labs. The thinking was that having enough aligned insiders would make a difference on crucial deployment decisions & longer-term alignment strategy. We could say that the strategy is an attempt to

corruptthe goal of pure capability advance & making money towards the goal of alignment. This fits into a larger theme that EA needs to get close to power to have real influence.[See also the large donations EA has made to OpenAI & Anthropic. ]

Whether this strategy paid off... too early to tell.

What has become apparent is that the large AI labs & being close to power have had a strong corrupting influence on EA epistemics and culture.

- Many people in EA now think nothing of being paid Bay Area programmer salaries for research or nonprofit jobs.
- There has been a huge influx of MBA blabber being thrown around. Bizarrely EA funds are often giving huge grants to for profit organizations for which it is very unclear whether they're really EA-aligned in the long-term or just paying lip service. Highly questionable that EA should be trying to do venture

... (read more)Thin versus Thick ThinkingThick:aggregate many noisy sources to make a sequential series of actions in mildly related environments, model-free RLcarnal sins: failure of prioritization / not throwing away enough information , nerdsnipes, insufficient aggegration, trusting too much in any particular model, indecisiveness, overfitting on noise, ignoring consensus of experts/ social reality

default of the ancestral environment

CEOs, general, doctors, economist, police detective in the real world, trader

Thin: precise, systematic analysis, preferably ... (read more)[Thanks to Vlad Firoiu for helping me]

An Attempted Derivation of the Lindy EffectWikipedia:

Laplace Rule of SuccesionWhat is the probability that the Sun will rise tomorrow, given that is has risen every day for 5000 years?

Let p denote the probability that the Sun will rise tomorrow. A priori we have no information on the value of&... (read more)

Imprecise Information theoryWould like a notion of entropy for credal sets. Diffractor suggests the following:

let C⊂Credal(Ω) be a credal set.

Then the entropy of C is defined as

HDiffractor(C)=suppH(p)

where H(p) denotes the usual Shannon entropy.

I don't like this since it doesn't satisfy the natural desiderata below.

Instead, I suggest the following. Let meC∈C denote the (absolute) maximum entropy distribution, i.e. H(meC)=maxp∈CH(p) and let H(C)=Hnew(C)=H(mec).

Desideratum 1: H({p}... (read more)

Generalized Jeffrey Prior for singular models?For singular models the Jeffrey Prior is not well-behaved for the simple fact that it will be zero at minima of the loss function.

Does this mean the Jeffrey prior is only of interest in regular models? I beg to differ.

Usually the Jeffrey prior is derived as parameterization invariant prior. There is another way of thinking about the Jeffrey prior as arising from an 'indistinguishability prior'.

The argument is delightfully simple: given two weights w1,w2∈W if they encode the same distributi... (read more)

Latent abstractions Bootlegged.Let X1,...,Xn be random variables distributed according to a probability distribution p on a sample space Ω.

Defn. A (weak) natural latent of X1,...,Xn is a random variable Λ such that

(i) Xi are independent conditional on Λ

(ii) [reconstructability] p(Λ=λ|X1,...,^Xi,...,Xn)=p(Λ=λ|X1,...,Xn) for all i=1

[This is not really reconstructability, more like a stability property. The information is contained in many parts of the system... I might al... (read more)

Inspired by this Shalizi paper defining local causal states. The idea is so simple and elegant I'm surprised I had never seen it before.

Basically, starting with a a factored probability distribution Xt=(X1(t),...,Xkt(t)) over a dynamical DAG Dt we can use Crutchfield causal state construction locally to construct a derived causal model factored over the dynamical DAG as X′t where X′t is defined by considering the past and forward lightcone of Xt defined as L−(Xt),L+(Xt) all those points/ variables Yt2 which influence Xt respectively are influenced by Xt (in a causal interventional sense) . Now take define the equivalence relatio on realization at∼bt of L−(Xt) (which includes Xt by definition)

^{[1]}whenever the conditional probability distribution p(L+(Xt)|at)=p(L+(Xt)|bt) on the future light cones are equal.These factored probability distributions over dynamical DAGs are called 'fields' by physicists. Given any field F(x,t) we define a derived local causal state field ϵ(F(x,t)) in the above way. Woah!

... (read more)

Reasons to think Lobian Cooperation is importantUsually the modal Lobian cooperation is dismissed as not relevant for real situations but it is plausible that Lobian cooperation extends far more broadly than what is proved currently.

It is plausible that much of cooperation we see in the real world is actually approximate Lobian cooperation rather than purely given by traditional game-theoretic incentives.

Lobian cooperation is far stronger in cases where the players resemble each other and/or have access to one another's blueprint. This is ... (read more)

Evidence Manipulation and Legal Admissible Evidence[This was inspired by Kokotaljo's shortform on comparing strong with weak evidence]

In the real world the weight of many pieces of weak evidence is not always comparable to a single piece of strong evidence. The important variable here is not strong versus weak per se but the source of the evidence. Some sources of evidence are easier to manipulate in various ways. Evidence manipulation, either consciously or emergently, is common and a large obstactle to truth-finding.

Consider aggregating many ... (read more)

Roko's basiliskis a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development.Why Roko's basilisk probably doesn't work for simulation fidelity reasons:Roko's basilisk threatens to simulate and torture you in the future if you don't comply. Simulation cycles cost resources. Instead of following through on torturing our wo... (read more)

Imagine a data stream

...X−3,X−2,X−1,X0,X1,X2,X3...

assumed infinite in both directions for simplicity. Here X0 represents the current state ( the "present") and while ...X−3,X−2,X−1 and X1,X2,X3,... represents the future

Predictible Information versus Predictive InformationPredictible informationis the maximal information (in bits) that you can derive about the future given the access to the past.Predictive informationis the amount of bits that you need from the past to make that optimal prediction.Suppose you are... (read more)

"The links between logic and games go back a long way. If one thinks of a debate as a kind of game, then Aristotle already made the connection; his writings about syllogism are closely intertwined with his study of the aims and rules of debating. Aristotle’s viewpoint survived into the common medieval name for logic: dialectics. In the mid twentieth century Charles Hamblin revived the link between dialogue and the rules of sound reasoning, soon after Paul Lorenzen had connected dialogue to constructive foundations of logic."from the Stanford Encyclopedia ... (read more)"I dreamed I was a butterfly, flitting around in the sky; then I awoke. Now I wonder: Am I a man who dreamt of being a butterfly, or am I a butterfly dreaming that I am a man?"- ZhuangziQuestions I have that you might have too:

In this shortform I will try and... (read more)

The Vibes of Mathematics:Q:

What is it like to understand advanced mathematics? Does it feel analogous to having mastery of another language like in programming or linguistics?A:Except nobody wants to hear about it at parties.

Vibes of Maths: Convergence and Divergencelevel 0: A state of ignorance.you live in a pre-formal mindset. You don't know how to formalize things. You don't even know what it would even mean 'to prove something mathematically'. This is perhaps the longest. It is the default state of a human. Most anti-theory sentiment comes from this state. Since you've neveYou can't productively read Math books. You often decry that these mathematicians make books way too hard to read. If they only would take the time to explain things simply you would understand.

level 1 : all math is amorphous blobYou know the basic of writing an epsilon-delta proof. Although you don't know why the rules of maths are this or that way you can at least follow the recipes. You can follow simple short proofs, albeit slowly.

You know there are differen... (read more)

Agent Foundations Reading List [Living Document]This is a stub for a living document on a reading list for Agent Foundations.

CausalityBook of Why, Causality - Pearl

Probability theoryLogic of Science - Jaynes

Ambiguous Counterfactuals[Thanks to Matthias Georg Mayer for pointing me towards ambiguous counterfactuals]

Salary is a function of eXperience and Education

S=aE+bX

We have a candidate C with given salary, experience (X=5) and education (E=5).

Their current salary is given by

S=a⋅5+b⋅5

We 'd like to consider the counterfactual where they didn't have the education (E=0). How do we evaluate their salary in this counterfactual?

This is slightly ambiguous - there are two counterfactuals:

E=0,X=5 or E=0,X=10

In the second c... (read more)

Hopfield Networks = Ising Models = Distributions over Causal models?Given a joint probability distributions p(x1,...,xn) famously there might be many 'Markov' factorizations. Each corresponds with a different causal model.

Instead of choosing a particular one we might have a distribution of beliefs over these different causal models. This feels basically like a Hopfield Network/ Ising Model.

You have a distribution over nodes and an 'interaction' distribution over edges.

The distribution over nodes corresponds to the joint probability di... (read more)

Insights as Islands of Abductive Percolation?I've been fascinated by this beautiful paper by Viteri & DeDeo.

What is a mathematical insight? We feel intuitively that proving a difficult theorem requires discovering one or more key insights. Before we get into what the Dedeo-Viteri paper has to say about (mathematical) insights let me recall some basic observations on the nature of insights:

(see also my previous shortform)- There might be a unique decomposition, akin to prime factorization. Alternatively, there might many roads to Rome: some theorems

... (read more)