This is a special post for short-form writing by Alexander Gietelink Oldenziel. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.

This is a special post for short-form writing by Alexander Gietelink Oldenziel. Only they can create top-level comments. Comments here also appear on the Shortform Page and All Posts page.

Alexander Gietelink Oldenziel's Shortform

40Alexander Gietelink Oldenziel

5Daniel Murfet

5Garrett Baker

4Alexander Gietelink Oldenziel

7JBlack

4Nathan Helm-Burger

5Carl Feynman

2Nathan Helm-Burger

2Alexander Gietelink Oldenziel

3Carl Feynman

1nim

1Tao Lin

3Carl Feynman

3quetzal_rainbow

2Alexander Gietelink Oldenziel

1quetzal_rainbow

4Alexander Gietelink Oldenziel

5Carl Feynman

2Alexander Gietelink Oldenziel

14Alexander Gietelink Oldenziel

5Daniel Murfet

3RHollerith

8Thomas Kwa

3Alexander Gietelink Oldenziel

2Noosphere89

13Alexander Gietelink Oldenziel

1Zach Furman

2Alexander Gietelink Oldenziel

1Zach Furman

3Alexander Gietelink Oldenziel

1Zach Furman

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

13Alexander Gietelink Oldenziel

7PhilGoetz

1Daniel Murfet

11Alexander Gietelink Oldenziel

8Alexander Gietelink Oldenziel

5Alexander Gietelink Oldenziel

2Dagon

2Alexander Gietelink Oldenziel

4Alexander Gietelink Oldenziel

4Alexander Gietelink Oldenziel

14Alexander Gietelink Oldenziel

8johnswentworth

4Alexander Gietelink Oldenziel

5Noosphere89

4Alexander Gietelink Oldenziel

2Richard_Kennaway

3Alexander Gietelink Oldenziel

3Alexander Gietelink Oldenziel

3Vladimir_Nesov

2Michaël Trazzi

2niplav

2acertain

2Alexander Gietelink Oldenziel

2Thomas Kwa

2Alexander Gietelink Oldenziel

1lukehmiles

2Alexander Gietelink Oldenziel

3Alexander Gietelink Oldenziel

3Alexander Gietelink Oldenziel

3Alexander Gietelink Oldenziel

2Vladimir_Nesov

1Richard_Kennaway

1TAG

2Alexander Gietelink Oldenziel

4Viliam

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2JBlack

2Alexander Gietelink Oldenziel

1Daniel Murfet

2Alexander Gietelink Oldenziel

1Daniel Murfet

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

2Alexander Gietelink Oldenziel

1Alexander Gietelink Oldenziel

2ChristianKl

1Alexander Gietelink Oldenziel

1Alexander Gietelink Oldenziel

1Alexander Gietelink Oldenziel

Why don't animals have guns?Or why didn't evolution evolve the Hydralisk?

Evolution has found (sometimes multiple times) the camera, general intelligence, nanotech, electronavigation, aerial endurance better than any drone, robots more flexible than any human-made drone, highly efficient photosynthesis, etc.

First of all let's answer another question: why didn't evolution evolve the wheel like the alien wheeled elephants in His Dark Materials?

Is it biologically impossible to evolve?

Well, technically, the flagella of various bacteria is a proper wheel.

No the likely answer is that wheels are great when you have roads and suck when you don't. Roads are build by ants to some degree but on the whole probably don't make sense for an animal-intelligence species.

Aren't there animals that use projectiles?Hold up. Is it actually true that there is not a single animal with a gun, harpoon or other projectile weapon?

Porcupines have quils, some snakes spit venom, a type of fish spits water as a projectile to kick insects of leaves than eats insects. Bombadier beetles can produce an explosive chemical mixture. Skunks use some other chemicals. Some snails shoot harpoons from very c... (read more)

Corrupting influencesThe EA AI safety strategy has had a large focus on placing EA-aligned people in A(G)I labs. The thinking was that having enough aligned insiders would make a difference on crucial deployment decisions & longer-term alignment strategy. We could say that the strategy is an attempt to

corruptthe goal of pure capability advance & making money towards the goal of alignment. This fits into a larger theme that EA needs to get close to power to have real influence.[See also the large donations EA has made to OpenAI & Anthropic. ]

Whether this strategy paid off... too early to tell.

What has become apparent is that the large AI labs & being close to power have had a strong corrupting influence on EA epistemics and culture.

- Many people in EA now think nothing of being paid Bay Area programmer salaries for research or nonprofit jobs.
- There has been a huge influx of MBA blabber being thrown around. Bizarrely EA funds are often giving huge grants to for profit organizations for which it is very unclear whether they're really EA-aligned in the long-term or just paying lip service. Highly questionable that EA should be trying to do venture

... (read more)Fractal Fuzz: making up for sizeGPT-3 recognizes 50k possible tokens. For a 1000 token context window that means there are (5⋅105)103≈105000 possible prompts. Astronomically large. If we assume the output of a single run of gpt is 200 tokens then for each possible prompt there are ≈102500 possible continuations.

GPT-3 is probabilistic, defining for each possible prompt x (≈105000) a distribution q(x) on a set of size 102500, in other words a 102500−1 dimensional space.

^{[1]}Mind-boggingly large. Compared to these numbers the amount of data (40 trillion tokens??) and the size of the model (175 billion parameters) seems absolutely puny in comparison.

I won't be talking about the data, or 'overparameterizations' in this short, that is well-explained by Singular Learning Theory. Instead, I will be talking about nonrealizability.

Nonrealizability & the structure of natural dataRecall the setup of (parametric) Bayesian learning: there is a sample space Ω, a true distribution q(x) on Ω and a parameterized family of probability distributions p(x|w),w∈W⊂Rd.

It is often assumed that the true distrib... (read more)

The Vibes of Mathematics:Q:

What is it like to understand advanced mathematics? Does it feel analogous to having mastery of another language like in programming or linguistics?A:Except nobody wants to hear about it at parties.

Vibes of Maths: Convergence and Divergencelevel 0: A state of ignorance.you live in a pre-formal mindset. You don't know how to formalize things. You don't even know what it would even mean 'to prove something mathematically'. This is perhaps the longest. It is the default state of a human. Most anti-theory sentiment comes from this state. Since you've neveYou can't productively read Math books. You often decry that these mathematicians make books way too hard to read. If they only would take the time to explain things simply you would understand.

level 1 : all math is amorphous blobYou know the basic of writing an epsilon-delta proof. Although you don't know why the rules of maths are this or that way you can at least follow the recipes. You can follow simple short proofs, albeit slowly.

You know there are differen... (read more)

Pockets of Deep ExpertiseWhy am I so bullish on academic outreach? Why do I keep hammering on 'getting the adults in the room'?

It's not that I think academics are all Super Smart.

I think rationalists/alignment people correctly ascertain that most professors don't have much useful to say about alignment & deep learning and often say silly things. They correctly see that much of AI congress is fueled by labs and scale not ML academia. I am bullish on non-ML academia, especially mathematics, physics and to a lesser extent theoretical CS, neuroscience, some parts of ML/ AI academia. This is because

while I think 95 % of academia is bad and/or uselessthere arePockets of Deep Expertise.Most questions in alignment are close to existing work in academia in some sense - but we have to make the connection!A good example is 'sparse coding' and 'compressed sensing'. Lots of mech.interp has been rediscovering some of the basic ideas of sparse coding. But there is vast expertise in academia about these topics. We should leverage these!

Other examples are singular learning theory, computational mechanics, etc

[see also Hanson on rot, generalizations of the second law to nonequilibrium systems (Baez-Pollard, Crutchfield et al.) ]

Imperfect Persistence of Metabolically Active EnginesAll things rot. Indidivual organisms, societies-at-large, businesses, churches, empires and maritime republics, man-made artifacts of glass and steel, creatures of flesh and blood.

Conjecture #1There is a lower bound on the amount of dissipation / rot that any metabolically-active engine creates.Conjecture #2Metabolic Rot of an engine is proportional to (1) size and complexity o... (read more)Idle thoughts about UDASSA I: the Simulation hypothesisI was talking to my neighbor about UDASSA the other day. He mentioned a book I keep getting recommended but never read where characters get simulated and then the simulating machine is progressively slowed down.

One would expect one wouldn't be able to notice from inside the simulation that the simulating machine is being slowed down.

This presents a conundrum for simulation style hypotheses: if the simulation can be slowed down 100x without the insiders noticing, why not 1000x or 10^100x or ... (read more)

[This is joint thinking with Sam Eisenstat. Also thanks to Caspar Oesterheld for his thoughtful comments. Thanks to Steve Byrnes for pushing me to write this out.]

The Hyena problem in long-term planningLogical induction is a nice framework to think about bounded reasoning. Very soon after the discovery of logical induction people tried to make logical inductor decision makers work. This is difficult to make work: one of two obstacles is

Obstacle 1: Untaken Actions are not ObservableCaspar Oesterheld brilliantly solved this problem by using auction ma... (read more)

Latent abstractions Bootlegged.Let X1,...,Xn be random variables distributed according to a probability distribution p on a sample space Ω.

Defn. A (weak) natural latent of X1,...,Xn is a random variable Λ such that

(i) Xi are independent conditional on Λ

(ii) [reconstructability] p(Λ=λ|X1,...,^Xi,...,Xn)=p(Λ=λ|X1,...,Xn) for all i=1

[This is not really reconstructability, more like a stability property. The information is contained in many parts of the system... I might al... (read more)

Inspired by this Shalizi paper defining local causal states. The idea is so simple and elegant I'm surprised I had never seen it before.

Basically, starting with a a factored probability distribution Xt=(X1(t),...,Xkt(t)) over a dynamical DAG Dt we can use Crutchfield causal state construction locally to construct a derived causal model factored over the dynamical DAG as X′t where X′t is defined by considering the past and forward lightcone of Xt defined as L−(Xt),L+(Xt) all those points/ variables Yt2 which influence Xt respectively are influenced by Xt (in a causal interventional sense) . Now take define the equivalence relatio on realization at∼bt of L−(Xt) (which includes Xt by definition)

^{[1]}whenever the conditional probability distribution p(L+(Xt)|at)=p(L+(Xt)|bt) on the future light cones are equal.These factored probability distributions over dynamical DAGs are called 'fields' by physicists. Given any field F(x,t) we define a derived local causal state field ϵ(F(x,t)) in the above way. Woah!

... (read more)

Reasons to think Lobian Cooperation is importantUsually the modal Lobian cooperation is dismissed as not relevant for real situations but it is plausible that Lobian cooperation extends far more broadly than what is proved currently.

It is plausible that much of cooperation we see in the real world is actually approximate Lobian cooperation rather than purely given by traditional game-theoretic incentives.

Lobian cooperation is far stronger in cases where the players resemble each other and/or have access to one another's blueprint. This is ... (read more)

"I dreamed I was a butterfly, flitting around in the sky; then I awoke. Now I wonder: Am I a man who dreamt of being a butterfly, or am I a butterfly dreaming that I am a man?"- ZhuangziQuestions I have that you might have too:

In this shortform I will try and... (read more)

(conversation with Scott Garrabrant)

Destructive CriticismSometimes you can say something isn't quite right but you can't provide an alternative.

Difference between 'generation of ideas' and 'filtration of ideas' - i.e. babble and prune.

ScottG: Bayesian learning assumes we are in a babble-rich environment and only does pr... (read more)

Reasonable interpretations of Recursive Self Improvement are either trivial, tautological or false?- (Trivial) AIs will do RSI by using more hardware - trivial form of RSI
- (Tautological) Humans engage in a form of (R)SI when they engage in meta-cognition. i.e. therapy is plausibly a form of metacognition. Meta-cognition is plausible one of the remaining hallmarks of true general intelligence. See Vanessa Kosoy's "Meta-Cognitive Agents".

... (read more)In this view, AGIs will naturally engage in meta-cognition because they're generally intelligent. The

Trivial but important

Aumann agreement can fail for purely epistemic reasons because real-world minds do not do Bayesian updating. Bayesian updating is intractable so realistic minds sample from the prior. This is how e.g. gradient descent works and also how human minds work.

In this situation a two minds can end in two different basins with similar loss on the data. Because of computational limitations. These minds can have genuinely different expectation for generalization.

(Of course this does not contradict the statement of the theorem which is correct.)

Imprecise Information theoryWould like a notion of entropy for credal sets. Diffractor suggests the following:

let C⊂Credal(Ω) be a credal set.

Then the entropy of C is defined as

HDiffractor(C)=suppH(p)

where H(p) denotes the usual Shannon entropy.

I don't like this since it doesn't satisfy the natural desiderata below.

Instead, I suggest the following. Let meC∈C denote the (absolute) maximum entropy distribution, i.e. H(meC)=maxp∈CH(p) and let H(C)=Hnew(C)=H(mec).

Desideratum 1: H({p}... (read more)

Roko's basiliskis a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development.Why Roko's basilisk probably doesn't work for simulation fidelity reasons:Roko's basilisk threatens to simulate and torture you in the future if you don't comply. Simulation cycles cost resources. Instead of following through on torturing our wo... (read more)

All concepts can be learnt. All things worth knowing may be grasped. Eventually.

All can be understood - given enough time and effort.

For Turing-complete organism, there is no qualitive gap between knowledge and ignorance.

No qualitive gap but one. The true qualitative difference: quantity.

Often we simply miss a piece of data. The gap is too large - we jump and never reach the other side. A friendly hominid who has trodden the path before can share their journey. Once we know the road, there is no mystery. Only effort and time. Some hominids choose not to share their journey. We keep a special name for these singular hominids: genius.

Abnormalised sampling?Probability theory talks about sampling for probability distributions, i.e.

normalized measures. However, non-normalized measures abound: weighted automata, infra-stuff, uniform priors on noncompact spaces, wealth in logical-inductor esque math, quantum stuff?? etc.Most of probability theory constructions go through just for arbitrary measures, doesn't need the normalization assumption. Except, crucially, sampling.

What does it even mean to sample from a non-normalized measure? What is

~~unnormalized~~abnormal sampling?I don't know.... (read more)

SLT and phase transitionsThe morphogenetic SLT story says that during training the Bayesian posterior concentrates around a series of subspaces W0(1)⇝...⇝W0(n) with rlcts λ1<...<λn and losses L1=L(w1),...,Ln=L(wn),wi∈W0(i). As the size of the data sample N is scaled the Bayesian posterior makes transitions W0(i)⇝W0(i+1) trading off higher complexity (higher λi+1>λi) for better accuracy (lower loss Li+1<Li).

This is the radical new framework of SLT: phase transitions happen i... (read more)

Alignment by Simulation?I've heard this alignment plan that is a variation of 'simulate top alignment researchers' with an LLM. Usually the poor alignment researcher in question is Paul.

This strikes me as deeply unserious and I am confused why it is having so much traction.

That AI-assisted alignment is coming (indeed, is already here!) is undeniable. But even somewhat accurately simulating a human from textdata is a crazy sci-fi ability, probably not even physically possible. It seems to ascribe nearly magical abilities to LLMs.

Predicting... (read more)

Optimal Forward-chaining versus backward-chaining.In general, this is going to depend on the domain. In environments for which we have many expert samples and there are many existing techniques backward-chaining is key. (i.e. deploying resources & applying best practices in business & industrial contexts)

In open-ended environments such as those arising Science, especially pre-paradigmatic fields backward-chaining and explicit plans breakdown quickly.

Incremental vs CumulativeIncremental: 90% forward chaining 10% backward chaining f... (read more)

Thin versus Thick ThinkingThick:aggregate many noisy sources to make a sequential series of actions in mildly related environments, model-free RLcarnal sins: failure of prioritization / not throwing away enough information , nerdsnipes, insufficient aggegration, trusting too much in any particular model, indecisiveness, overfitting on noise, ignoring consensus of experts/ social reality

default of the ancestral environment

CEOs, general, doctors, economist, police detective in the real world, trader

Thin: precise, systematic analysis, preferably ... (read more)[Thanks to Vlad Firoiu for helping me]

An Attempted Derivation of the Lindy EffectWikipedia:

Laplace Rule of SuccesionWhat is the probability that the Sun will rise tomorrow, given that is has risen every day for 5000 years?

Let p denote the probability that the Sun will rise tomorrow. A priori we have no information on the value of&... (read more)

Generalized Jeffrey Prior for singular models?For singular models the Jeffrey Prior is not well-behaved for the simple fact that it will be zero at minima of the loss function.

Does this mean the Jeffrey prior is only of interest in regular models? I beg to differ.

Usually the Jeffrey prior is derived as parameterization invariant prior. There is another way of thinking about the Jeffrey prior as arising from an 'indistinguishability prior'.

The argument is delightfully simple: given two weights w1,w2∈W if they encode the same distributi... (read more)

"The links between logic and games go back a long way. If one thinks of a debate as a kind of game, then Aristotle already made the connection; his writings about syllogism are closely intertwined with his study of the aims and rules of debating. Aristotle’s viewpoint survived into the common medieval name for logic: dialectics. In the mid twentieth century Charles Hamblin revived the link between dialogue and the rules of sound reasoning, soon after Paul Lorenzen had connected dialogue to constructive foundations of logic."from the Stanford Encyclopedia ... (read more)Ambiguous Counterfactuals[Thanks to Matthias Georg Mayer for pointing me towards ambiguous counterfactuals]

Salary is a function of eXperience and Education

S=aE+bX

We have a candidate C with given salary, experience (X=5) and education (E=5).

Their current salary is given by

S=a⋅5+b⋅5

We 'd like to consider the counterfactual where they didn't have the education (E=0). How do we evaluate their salary in this counterfactual?

This is slightly ambiguous - there are two counterfactuals:

E=0,X=5 or E=0,X=10

In the second c... (read more)

Insights as Islands of Abductive Percolation?I've been fascinated by this beautiful paper by Viteri & DeDeo.

What is a mathematical insight? We feel intuitively that proving a difficult theorem requires discovering one or more key insights. Before we get into what the Dedeo-Viteri paper has to say about (mathematical) insights let me recall some basic observations on the nature of insights:

(see also my previous shortform)- There might be a unique decomposition, akin to prime factorization. Alternatively, there might many roads to Rome: some theorems

... (read more)Evidence Manipulation and Legal Admissible Evidence[This was inspired by Kokotaljo's shortform on comparing strong with weak evidence]

In the real world the weight of many pieces of weak evidence is not always comparable to a single piece of strong evidence. The important variable here is not strong versus weak per se but the source of the evidence. Some sources of evidence are easier to manipulate in various ways. Evidence manipulation, either consciously or emergently, is common and a large obstactle to truth-finding.

Consider aggregating many ... (read more)

Imagine a data stream

...X−3,X−2,X−1,X0,X1,X2,X3...

assumed infinite in both directions for simplicity. Here X0 represents the current state ( the "present") and while ...X−3,X−2,X−1 and X1,X2,X3,... represents the future

Predictible Information versus Predictive InformationPredictible informationis the maximal information (in bits) that you can derive about the future given the access to the past.Predictive informationis the amount of bits that you need from the past to make that optimal prediction.Suppose you are... (read more)

Agent Foundations Reading List [Living Document]This is a stub for a living document on a reading list for Agent Foundations.

CausalityBook of Why, Causality - Pearl

Probability theoryLogic of Science - Jaynes

Hopfield Networks = Ising Models = Distributions over Causal models?Given a joint probability distributions p(x1,...,xn) famously there might be many 'Markov' factorizations. Each corresponds with a different causal model.

Instead of choosing a particular one we might have a distribution of beliefs over these different causal models. This feels basically like a Hopfield Network/ Ising Model.

You have a distribution over nodes and an 'interaction' distribution over edges.

The distribution over nodes corresponds to the joint probability di... (read more)