Currently there seems to be no good way to deal with this conundrum.

One wonders whether the AI alignment community should setup some sort of encrypted partial-sharing partial-transparancy protocol for these kinds of situations.

Shapely values are very cool. Let me mention some cool facts:

They arise in (cooperative) game theory but also in ML when doing credit allocation a combined prediction from mixing predictions from different modules of a system.

One piece of evidence of their fundamentalness is that they arise naturally from the Hodge theory on the hypercube of a coalition game: https://arxiv.org/abs/1709.08318

Another interesting fact I learned from Davidad: Shapley values are not compositional: a group of actors can increase their total Shapley value by forming a single caba...

Could you give a sneak peak on how Sylvia /Alain use these uniform distributions?

There is a third important aspect of functions-in-the-original-sense that distinguishes them from extensional functions (i.e. collection of input-output pairs): effects.

Describing these 'intensional' features is an active area of research in theoretical CS. One important thread here is game semantics; you might like to take a look:

https://link.springer.com/chapter/10.1007/978-3-642-58622-4_1

How can the uniform distribution on the natural numbers be used?

116d

The point is that such a distribution (uniform on countable infinite set like
naturals), is not internal, and therefore external. it'll depend on the specific
ultrafilter used under the hood.
for how to use it, see either alain roberts or sylvia wenmackers

No problem!

Do you mean monoidal categories? I think that's what the central concept in the Abramsly-Coeke work & the Baez Rosetta stone paper is.

Category theory was developed in the 50s from considerations in algebraic topology. Algebraic topology was an extremely technically sophisticated field already in the 50s (and has by now reached literally incredible abstract heights).

I suppose one could imagine an alternate world where Galois invents category theory but it seems apparent to me that the amount of long-term development significantly divorced from direct applications (as calculus was) was needed for category theory to spring up and mature - indeed it is still in its teenage rebellious phase i...

21mo

Thanks for your thoughts and for the link! I definitely agree that we are very
far from practical category-inspired improvements at this stage; I simply
wonder whether there isn't something fundamentally as simple and novel as
differential equations waiting around the corner and that we are taking a very
circuitous route toward through very deep metamathematics! (Baez's rosetta stone
paper and work by Abramsky and Coeck on quantum logic have convinced me that we
need something like "not being in a Cartesian category" to account for notions
like context and meaning, but that quantum stuff is only one step removed from
the most Cartesian classical logic/physics and we probably need to go to the
other extreme to find a different kind of simplicity)

**Evidence Manipulation and Legal Admissible Evidence**

[This was inspired by Kokotaljo's shortform on comparing strong with weak evidence]

In the real world the weight of many pieces of weak evidence is not always comparable to a single piece of strong evidence. The important variable here is not strong versus weak per se but the source of the evidence. Some sources of evidence are easier to manipulate in various ways. Evidence manipulation, either consciously or emergently, is common and a large obstactle to truth-finding.

Consider aggregating many ...

21mo

In other cases like medicine, many people argue that direct observation should
be ignored ;)

In the real world the weight of many pieces of weak evidence is not always comparable to a single piece of strong evidence. The important variable here is not strong versus weak per se but the source of the evidence. Some sources of evidence are easier to manipulate in various ways. Evidence manipulation, either consciously or emergently, is common and a large obstactle to truth-finding.

Consider aggregating many (potentially biased) sources of evidence versus direct observation. These are not directly comparable and in many cases we feel direct obser...

With all due respect with your brand as LessWrong's ornamental hermeneutic I'm afraid I'll need some clarification.

What is the monad of 1 exactly? A monad is a functor - what category are we talking about here?

In particular - what are the unit and multiplication maps?

(my guess: for the unit and but now I'm using nilsquare infinitesimals instead of invertible infinitesimals.)

I'm not sure what tangent space we are talking about - but I assume it's a Lie group (hyperfinite graph?) and we ...

11mo

On any finite dim space we have a canon inner product by taking the positive
definite one.
Monad is a synonym for infinitesimal neighborhood, common on the literature. Not
the category theory monad.
Also hermeneutic lmfao

Where can I learn more about hyperfinite Brownian motion?

Has this been developed deeply? (I am aware of Nelson's radically elementary probability book)

11mo

https://link.springer.com/book/10.1007/978-3-642-33149-7
[https://link.springer.com/book/10.1007/978-3-642-33149-7]
Also includes Feynman path integral and a few other things. Note that you don't
even need the full nonstandard theory.

Where can I read more about this perspective?

I'm intrigued by the idea of linking the discrete and continuous Fourier transform through nonstandard analysis.

The idea is certainly beautifully elegant - has it been worked out in more detail somewhere?

11mo

https://www.sciencedirect.com/science/article/pii/S0049237X08715507
[https://www.sciencedirect.com/science/article/pii/S0049237X08715507]

Beautiful chaotic math energy here Alok, keep it up! =)

**Roko's basilisk** is a thought experiment which states that an otherwise benevolent artificial superintelligence (AI) in the future would be incentivized to create a virtual reality simulation to torture anyone who knew of its potential existence but did not directly contribute to its advancement or development.

**Why Roko's basilisk probably doesn't work for simulation fidelity reasons: **

Roko's basilisk threatens to simulate and torture you in the future if you don't comply. Simulation cycles cost resources. Instead of following through on torturing our wo...

12mo

I have always taken Roko's Basilisk to be the threat that the future
intelligence will torture you, not a simulation, for not having devoted yourself
to creating it.

12mo

How do you know you are not in a low fidelity simulation right now? What could
you compare it against?

22mo

If the agents follow simple principles
[https://forum.effectivealtruism.org/posts/CfcvPBY9hdsenMHCr/integrity-for-consequentialists-1],
it's simple to simulate those principles with high fidelity, without simulating
each other in all detail. The obvious guide to the principles that enable
acausal coordination is common knowledge
[https://www.lesswrong.com/posts/RhAxxPXrkcEaNArnd/notes-on-can-you-control-the-past?commentId=4kHNvyT6NwymNrdXC]
of each other, which could be turned into a shared agent
[https://www.lesswrong.com/posts/FCffGHJnYfdE2DgRe/humans-do-acausal-coordination-all-the-time?commentId=G5gSusbGaiERhjFWn]
that adjudicates a bargain on their behalf.

And actually Barbara, Celarent, Darii, Ferio contain hints of phenomena not completely captured in first-order logic!

I like that this post can be read as in both jest and earnestness and in both readings it contains much Truth and Wisdom. =)

Imagine a data stream

assumed infinite in both directions for simplicity. Here represents the current state ( the "present") and while and represents the future

**Predictible Information versus Predictive Information**

*Predictible information* is the maximal information (in bits) that you can derive about the future given the access to the past.* Predictive information *is the amount of bits that you need from the past to make that optimal prediction.

Suppose you are...

*"The links between logic and games go back a long way. If one thinks of a debate as a kind of game, then Aristotle already made the connection; his writings about syllogism are closely intertwined with his study of the aims and rules of debating. Aristotle’s viewpoint survived into the common medieval name for logic: dialectics. In the mid twentieth century Charles Hamblin revived the link between dialogue and the rules of sound reasoning, soon after Paul Lorenzen had connected dialogue to constructive foundations of logic." *from the Stanford Encyclopedia ...

Yes - seems sensible. I believe ARC is doing some work tracing out various possible attack vectors of AI.

Yes. Please do.

This would be of interest to many people. The tractability of nanotech seems like a key parameter for forecasting AI x-risk timelines.

23mo

Seconding. I'd really like a clear explanation of why he tends to view nanotech
as such a game changer. Admittedly Drexler is on the far side of nanotechnology
being possible, and wrote a series of books about it: (Engines of Creation,
Nanosystems, and Radical Abundance)

*"I dreamed I was a butterfly, flitting around in the sky; then I awoke. Now I wonder: Am I a man who dreamt of being a butterfly, or am I a butterfly dreaming that I am a man?*"- Zhuangzi

Questions I have that you might have too:

- why are we here?
- why do we live in such an extraordinary time?
- Is the simulation hypothesis true? If so, is there a base reality?
- Why do we know we're not a Boltzmann brain?
- Is existence observer-dependent?
- Is there a purpose to existence, a Grand Design?
- What will be computed in the Far Future?

In this shortform I will try and...

23mo

In this comment I will try and write the most boring possible reply to these
questions. 😊 These are pretty much my real replies.
"Ours not to reason why, ours but to do or do not, there is no try."
Someone must. We happen to be among them. A few lottery tickets do win, owned by
ordinary people who are perfectly capable of correctly believing that they have
won. Everyone should be smart enough to collect on a winning ticket, and to
grapple with living in interesting (i.e. low-probability) times. Just update
already.
It is false. This is base reality. But I can still appreciate Eliezer's fiction
[https://www.fanfiction.net/s/5389450/1/The_Finale_of_the_Ultimate_Meta_Mega_Crossover]
on the subject.
The absurdity heuristic. I don't take BBs seriously.
Even in classical physics there is no observation without interaction. Beyond
that, no, however many quantum physicists interpret their findings to the public
with those words, or even to each other.
Not that I know of. (This is not the same as a flat "no", but for most purposes
rounds off to that.)
Either nothing in the case of x-risk, nothing of interest in the case of a final
singleton, or wonders far beyond our contemplation, which may not even involve
anything we would recognise as "computing". By definition, I can't say what that
would be like, beyond guessing that at some point in the future it would stand
in a similar relation to the present that our present does to prehistoric times.
Look around you. Is this utopia? Then that future won't be either. But like the
present, it will be worth having got to.
Consider a suitable version of The Agnostic Prayer
[https://wist.info/zelazny-roger/53096/] inserted here against the possibility
that there are Powers Outside the Matrix who may chance to see this. Hey there!
I wouldn't say no to having all the aches and pains of this body fixed, for
starters. Radical uplift, we'd have to talk about first.

Q: **What is it like to understand advanced mathematics? Does it feel analogous to having mastery of another language like in programming or linguistics?**

**A:**** **It's like being stranded on a tropical island where all your needs are met, the weather is always perfect, and life is wonderful.

Except nobody wants to hear about it at parties.

**level 0: A state of ignorance. ** you live in a pre-formal mindset. You don't know how to formalize things. You don't even know what it would even mean 'to prove...

73mo

I say that knowing particular kinds of math, the kind that let you model the
world more-precisely, and that give you a theory of error, isn't like knowing
another language. It's like knowing language at all. Learning these types of
math gives you as much of an effective intelligence boost over people who don't,
as learning a spoken language gives you above people who don't know any language
(e.g., many deaf-mutes in earlier times).
The kinds of math I mean include:
* how to count things in an unbiased manner; the methodology of polls and other
data-gathering
* how to actually make a claim, as opposed to what most people do, which is to
make a claim that's useless because it lacks quantification or quantifiers
* A good example of this is the claims in the IPCC 2015 report that I wrote
some comments on recently. Most of them say things like, "Global warming
will make X worse", where you already know that OF COURSE global warming
will make X worse, but you only care how much worse.
* More generally, any claim of the type "All X are Y" or "No X are Y", e.g.,
"Capitalists exploit the working class", shouldn't be considered claims at
all, and can accomplish nothing except foment arguments.
* the use of probabilities and error measures
* probability distributions: flat, normal, binomial, poisson, and power-law
* entropy measures and other information theory
* predictive error-minimization models like regression
* statistical tests and how to interpret them
These things are what I call the correct Platonic forms. The Platonic forms
were meant to be perfect models for things found on earth. These kinds of math
actually are. The concept of "perfect" actually makes sense for them, as
opposed to for Earthly categories like "human", "justice", etc., for which
believing that the concept of "perfect" is coherent demonstrably drives people
insane and causes them to come up with things like Christianity.
They are, however, like A

**Agent Foundations Reading List [Living Document]**

This is a stub for a living document on a reading list for Agent Foundations.

**Causality**

Book of Why, Causality - Pearl

**Probability theory **

Logic of Science - Jaynes

Modern type theory mostly solves this blemish of set theory and is highly economic conceptually to boot. Most of the adherence of set theory is historical inertia - though some aspects of coding & presentations is important. Future foundations will improve our understanding on this latter topic.

It seems that if the quadcopters are hovering close enough to friendly troops it shouldn't be too difficult to intercept a missile in theory. If you have a 10 sec lead time (~3 km at mach 1) and the drone can do 20 m/s that's 200 meters. With more comprehensive radar coverage you might be able to do much better.

I wonder how large the drone needs to be to deflect a missile however. Would it need to carry a small explosive to send it off course? A missile is a large metal rod - in space with super high velocity even a tiny drone would whack a missile off cou...

Thanks Daniel, that's good to know. Sam Altman's tweeting has been concerning lately. But it would seem that with a fixed size content window you won't be able to pass a true Turing test.

**Ambiguous Counterfactuals**

[Thanks to Matthias Georg Mayer for pointing me towards ambiguous counterfactuals]

Salary is a function of eXperience and Education

We have a candidate with given salary, experience and education .

Their current salary is given by

We 'd like to consider the counterfactual where they didn't have the education . How do we evaluate their salary in this counterfactual?

This is slightly ambiguous - there are two counterfactuals:

or

In the second c...

**Hopfield Networks = Ising Models = Distributions over Causal models?**

Given a joint probability distributions famously there might be many 'Markov' factorizations. Each corresponds with a different causal model.

Instead of choosing a particular one we might have a distribution of beliefs over these different causal models. This feels basically like a Hopfield Network/ Ising Model.

You have a distribution over nodes and an 'interaction' distribution over edges.

The distribution over nodes corresponds to the joint probability di...

**Insights as Islands of Abductive Percolation?**

I've been fascinated by this beautiful paper by Viteri & DeDeo.

What is a mathematical insight? We feel intuitively that proving a difficult theorem requires discovering one or more key insights. Before we get into what the Dedeo-Viteri paper has to say about (mathematical) insights let me recall some basic observations on the nature of insights:

*(see also my previous shortform)*

- There might be a unique decomposition, akin to prime factorization. Alternatively, there might many roads to Rome: some theorems

I agree that the mathematics of agency, goals and values must exist and mostly hasn't been found yet (though we have many parts, e.g. Von Neumann Utility theorem).

I am skeptical that this is purely an historical accident. It seems to me that this theory is sufficiently complex and requires breakthroughs in several different areas. I don't think it could have been solved in the fifties. Most relevant math was simply not developed enough.

That doesn't mean progress cannot be sped up. With the threat of AI x-risk this area of enquiry has seen renewed interest and we've seen several breakthroughs here in just the last few years. As an example, Critch's new work on Boundaries will probably play an integral part.

11mo

Sorry for the late reply! Do you mind sharing a ref for Critch's new work? I
have tried to find something about boundaries but was unsuccessful.
As for the historical accident, I would situate it more around the 17th century,
when the theory of mechanics was roughly as advanced as that of agency. I don't
feel that goals and values require much more advanced math, only math as new as
differential calculus was at the time.
Though we now have many pieces that seem to aim in the right direction
(variational calculus in general, John Baez and colleagues' investigations of
blackboxing via category theory...), it seems more by chance than by concerted,
literature-wide effort. But I do hope to build on these pieces.

Woah, Andrew, this is fantastic work! I am seriously excited about this direction.! I liked your previous posts on boundaries very much too, but I had no idea your thoughts on boundaries were this technically refined - and that they tie in so beautifully with Markov blankets!

re: Friston.

Friston particular style that could justifiably be called obscurantist. His writing is extremely verbose, often fails to define key terms, and very nontrivial equations are often posited without derivation or citation. After spending considerable effort trying to understand...

Excellent post Adam!

I presume it is a canonical joke but I had to chuckle at the cat-gorizers =D

If you think about programmes as lambda terms then they have free variables in which you can plug values. Flipping the red and the green light or 0 with 1 becomes less mysterious from this point of view.

Taking a fully applied category/compositionality point of view we could be looking at "open systems" (so systems with input and output ports) and see how they compose.

13mo

Thanks! Sounds like I need to have a better understanding of lambda calculus,
and as always, category theory :)

Yeah follow-up posts will definitely get into that!

To be clear: (1) the initial posts won't be about Crutchfield work yet - just introducing some background material and overarching philosophy (2) The claim isn't that standard measures of information theory are bad. To the contrary! If anything we hope these posts will be somewhat of an ode to information theory as a tool for interpretability.

Adam wanted to add a lot of academic caveats - I was adamant that we streamline the presentation to make it short and snappy for a general audience but it...

Although good alignment research has been done that does not involve maths [e.g. here and here ] good math* remains the best high-level proxy of nontrivial, substantive, deep ideas that will actually add up to durable knowledge.

*what distinguishes good math from bad math? that's a tricky question that requires a strong inside view.

If I assign a probability to an event and my friend assign a probability to an event at what odds "should" we bet?

It seems that while there are a number of fairly natural suggestions there isn't '' one canonical answer to rule them all". I think the key observation here is that what bet gets made is underdetermined from just the probabilities.

We need to add more information to the beliefs of me and my friend to resolve this ambiguity. As mentioned above there is a duality between market order (order by number of ...

Another aspect of markets (and thermodynamic systems!) is that they may be open systems: they can have excess demand or supply of goods - and be open to the meddling of outside investors.

So an open market might have input./output nodes where we might have nonzero flows of goods (or particles). A formal mathematical model might make use of ideas from compositionality and applied category theory.

An inflow of a good will - all else equal - lower the price of that good. If we think of forecasting markets this would correspond to evidenc...

You are right.

Thanks for your comment Vladimir! This shortform got posted accidentally before it was done but this seems highly relevant. I will take a look!

There is a key difference between an abstract algorithm and instances of that algorithm running on a computer. To take just one difference: we might run several copies of the same algorithm on a computer/virtual environment. Indeed, even the phrasing: several copies of the same algorithms hints to their fundamental distinctness. A humorously inclined individual might perhaps like to baptise the abstract algorithm as the Soul, while the instances are the Material Body or Avatars. Things start to get interesting when we conside...

24mo

[2/2]
Another popular meme about acausal coordination is that it's just a few agents
that coordinate, and they might even be from the same world. But since
coordination only requires common knowledge
[https://www.lesswrong.com/posts/km6wasEXXmx85v5yG/self-embedded-agent-s-shortform?commentId=eyrEh7sbMaBThABah],
it's natural for an agent to coordinate with all its variants in other possible
worlds and counterfactuals. The adjudicators are the common knowledge, things
that don't vary, the updateless core of the collective. I think this changes the
framing of game theory a lot, by having games play out in all adjacent
counterfactuals instead of in one reality. (Plus different players can also
share smaller adjudicators with each other to negotiate a fair bargain.)

24mo

[1/2]
The popular meme is that acausal coordination requires agent algorithms to know
each other. But much less is sufficient, all you need is some common knowledge.
This common knowledge, as an agent algorithm itself, only knows that both agents
know it, and something about how they use it.
I call such a thing an adjudicator, it is a new agent that coordinating agents
can defer some actions to, which acts through all coordinating agents, is
incarnated in all of them, and knows it. Getting some common knowledge is much
easier than getting common knowledge of each other's algorithms. At that point,
what you need the fancy decision theories for is to get the adjudicator to make
sense of its situation where it has multiple incarnations that it can act
through.

24mo

Algorithms are finite machines. As an algorithm (code) runs, it interacts with
data, so there is a code/data distinction. An algorithm can be a universal
interpreter, with data coding other algorithms, so data can play the role of
code, blurring the code/data distinction. When an algorithm runs in an open
environment, there is a source of unbounded data that is not just blank tape,
it's neither finite nor arbitrary. And this unbounded data can play the role of
code. The resulting thing is no longer the same as an algorithm, unless you
designate some chunk of data as "code" for purposes of reasoning about its role
in this process.
So in general saying that there is an algorithm means that you point at some
finite data and try to reason about a larger process in terms of this finite
data. It's not always natural to do this. So I think agent's identity/will/Soul,
if it's sought in a more natural form than its instances/incarnations/Avatars,
is not an algorithm. The only finite data that we could easily point at is an
incarnation, and even that is not clearly natural for the open environment
reasons above.
I think agent's will is not an algorithm, it's a developing partial behavior
(commitments, decisions), things decided already, in the logical past.
Everything else can be chosen freely. The limitations of material incarnations
motivate restraint though, as some decisions can't be channeled through them
(thinking too long to act makes the program time out), and by making such
decisions you lose influence in the material world.

These estimates are questionable. You should be aware that historically the nuclear winter hypothesis has been the darling of Soviet propaganda.

https://apps.dtic.mil/sti/citations/ADA165794

anthropics cannot be used here.

The global stockpile of Nuclear weapons are not and never was sufficient to wipe out the human race. Nuclear war would be catastrophic but not close to the end of the world. In fact, even an all-out nuclear war would leave 50% of US population untouched.

-24mo

You need to be aware of the climate effects of nuclear war. Follow the link Max
included in his article: https://www.nature.com/articles/s43016-022-00573-0
[https://www.nature.com/articles/s43016-022-00573-0]
This estimates potential global deaths from starvation (and does not include
deaths from breakdown of society) of around 5 billion. You can legitimately
claim these are overestimates, but to say half the US population would be
untouched is dangerous and absurd.

Yeah apparently the nuclear winter story was actively promoted by Soviet spies

Unclear to me if this piece of Soviet propaganda was on net bad or good for preventing nuclear brinkmanship.

24mo

It seems the abstract of the study you link does not mention spies?

Hah no 'betray' in its less-used meaning as

unintentionally reveal; be evidence of.

"she drew a deep breath that betrayed her indignation"

14mo

I thought not cuz i didn't see why that'd be desideratum. You mean a good
definition is so canonical that when you read it you don't even consider other
formulations?

Hah! Yes.

Also, a good definition does not betray all the definitions that one could try but that didn't make it. To truly appreciate why a definition is "mathematically righteous" is not so straightforward.

14mo

'Betray' in the sense of contradicting/violating?

From personal anecdote it seems pretty clear to me people have idiosyncratic preferences orthogonal to general desirability that are real. Even spending a lot of time with people that don't fit those preferences does not make them attractive.

I do take your points about the state of the research.

(This shortform was inspired by the following question from Daniel Murfet: Can you elaborate on why I should care about Kelly betting? I guess I'm looking for an answer of the form "the market is a dynamical process that computes a probability distribution, perhaps the Bayesian posterior, and because of out of equilibrium effects or time lags or X, the information you derive from the market is not the Bayesian posterior and therefore you should bet somehow differently in a way that reflects that"?)**Timing the Market**

Those with experience with financial marke...

(This was inspired by the following question by Daniel Murfet: "Can you elaborate on why I should care about Kelly betting? I guess I'm looking for an answer of the form "the market is a dynamical process that computes a probability distribution, perhaps the Bayesian posterior, and because of out of equilibrium effects or time lags or , the information you derive from the market is not the Bayesian posterior and therefore you should bet somehow differently in a way that reflects that"?")

[See also: Kelly bet or update and Superrational agents Kelly be...

**Multi-Step Fidelity causes Rapid Capability Gain**

tl; dr Many examples of Rapid Capability Gain can be explained by a sudden jump in fidelity of a multi-step error-prone process. As the single step error rate is gradually lowered there is a sudden transition from a low fidelity to a high fidelity regime for the corresponding multistep process. Examples abound in cultural transmission, development economics, planning & consciousness in agent, origin of life and more.

Consider a factory making a widget in N distinct steps. Each step has a probab...

**Math research as Game Design**

Math in high school is primarily about memorizing and applying set recipes for problems. Math at (a serious) college level has a large proof-theoretic component: prove theorems not solve problems. Math research still involves solving problems, and proving theorems but it has a novel dimension: stating conjectures & theorem, and most importantly the search for the 'right' definitions.

If math in high school is like playing a game according to a set of rules, math in college is like devising optimal strategies within the confin...

24mo

Seems like choosing the definitions is the important skill, since in real life
you don't usually have a helpful buddy saying "hey this is a graph"

So happy to see this post appear! 🔥

The story about operons and the high interconnectedness of prokaryote genomes makes me wonder: bacteria kick out the antibiotic-coding gene after a few hours... but how does it know which gene to kick out?

Does it have a way to tell which genes are more 'alien' than others? (Or are we only talking about plasmids here?) I've heard it's hard to genomic manipulate some genomes because the cells keep kicking out new genes

One could speculate there is some sort of mechanism, perhaps epi-genetic, that is able to tell which genes are more alien / new than others somehow?

I'd love to hear your thoughts