LESSWRONG
LW

1235
Daniel C
142Ω86410
Message
Dialogue
Subscribe

Master's student in applied mathematics, funded by Center on Long-Term Risk to investigate the cheating problem in safe pareto-improvements. Former dovetail fellow with @Alex_Altair. 

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Alignment as uploading with more steps
Daniel C19h32

Yes I agree

I think it's similar to CIRL except less reliant on the reward function & more reliant on the things we get to do once we solve ontology identification

Reply
Alignment as uploading with more steps
Daniel C1d10

An alternative to pure imitation learning is to let the AI predict observations and build its world model as usual (in an environment containing humans), then develop a procedure to extract the model of a human from that world model.

This is definitely harder than imitation learning (probably requires solving ontology identification+ inventing new continual learning algorithms) but should yield stronger guaranteees & be useful in many ways:

  • It's basically "biometric feature conditioning" on steroids, (with the right algorithms) the AI will leverage whatever it knows about physics, psychology, neuroscience to form its model of the human, and continue to improve its human model as it learns more about the world (this will require ontology identification)
  • We can continue to extract the model of the current human from the current world model & therefore keep track of current preferences. With pure imitation learning it's hard to reliably sync up the human model with the actual human's current mental state (e.g. the actual human is entangled with the environment in a way that the human model isn't unless the human wears sensors at all times). If we had perfect upload tech this wouldn't be much of an issue, but seems significant especially at early stages of pure imitation learning
    • In particular, if we're collecting data of human actions under different circumstances, then both the circumstance and the human's brain state will be changing, & the latter is presumably not observable. It's unclear how much more data is needed to compensate for that
  • We often want to run the upload/human model on counterfactual scenarios: Suppose that there is a part of the world that the AI infers but doesn't directly observe, if we want to use the upload/human model to optimize/evaluate that part of the world, we'd need to answer questions like "How would the upload influence or evaluate that part of the world if she had accurate beliefs about it?". It seems more natural to achieve that when the human model was originally already entangled with the rest of the world model than if it resulted from imitation learning
Reply
johnswentworth's Shortform
Daniel C7d*30

(Was in the middle of writing a proof before noticing you did it already)

 

I believe the end result is that if we have Y=(Y1,Y2), X=(X1,X2,X3) with P(Y|X)=P(Y1|X1,X3)P(Y2|X2,X3) (X1 upstream of Y1, X2 upstream of Y2, X3 upstream of both),

then maximizing I(X;Y) is equivalent to maximizing I(Y1;X1,X3)+I(Y2;X2,X3)−I(Y1;Y2).

 

& for the proof we can basically replicate the proof for additivity except substituting the factorization P(X1,X2,X3)=P(X3)P(X1|X3)P(X2|X3) as assumption in place of independence, then both directions of inequality will result in  I(Y1;X1,X3)+I(Y2;X2,X3)−I(Y1;Y2).

[EDIT: Forgot −I(Y1;Y2) term due to marginal dependence P(Y1,Y2)≠P(Y1)P(Y2)]

Reply
Natural Latents: Latent Variables Stable Across Ontologies
Daniel C10d50

  ∀x:P[X=x|MA]=P[X=x|MB]   
 

 

I think a subtle point is that this is saying we merely have to assume predictive agreement of distributions marginalized over the latent variables ΛA/ΛB, but once we assume that & the naturality conditions, then even as each agent receive more information about X to update their distributions & latent variables Λi, the deterministic constraints between the latents will continue to hold.

Or if a human and AI start out with predictive agreement over some future observables, & the AI's latent satisfy mediation while human's latent satisfy redundancy, then we could send the AI out to update on information about those future observables, and humans can (in principle) estimate the redundant latent variable they care about from the AI's latent without observing the observables themselves. The remaining challenge is that humans often care about things that are not approximately deterministic w.r.t observables from typical sensors.

Reply
Agent foundations: not really math, not really science
Daniel C22d30

With normal science, there's a phenomenon that we observe, and what we want is to figure out the underlying laws. With AI systems, it's more accurate to say that we know the underlying laws (such as the mathematics of computation, and the "initial conditions" of learning algorithms) and we're trying to figure out what phenomena will occur (e.g. what fraction of them will undergo instrumental convergence).

 

I’d say part of agent foundations is the reverse: We know what phenomena will probably occur (extreme optimization by powerful agent) and what phenomena we want to cause (alignment). And we’re trying to understand the underlying laws that could cause those phenomena (algorithms behind general intelligence that have not been invented yet) so that we can steer them towards the outcomes we want.

Reply
(∃ Stochastic Natural Latent) Implies (∃ Deterministic Natural Latent)
Daniel C22d*60

Congrats!

Some interesting directions I think this opens up: Intuitively, given a set of variables X, we want natural latents to be approximately deterministic across a wide variety of (collections of) variables, and if a natural latent Y is approximately deterministic w.r.t a subset of variables S⊆X, then we want S to be as small as possible (e.g. strong redundancy is better than weak redundancy when the former is attainable)

The redundancy lattice seems natural for representing this: Given an element of the redundancy lattice α⊂P(X),  we say Y is a redund over α if it’s approximately deterministic w.r.t each subset in α. E.g. Λ is weakly redundant over X if it’s a redund over {{¯¯¯¯¯¯Xi}|Xi∈X} (approximately deterministic function of each ¯¯¯¯¯¯Xi), and strongly redundant if it’s a redund over {{Xi}|Xi∈X}.  If Y is a redund over α⊂P(X), our intuitive desiderata for natural latents correspond to α containing more subsets (more redundancy), and each subset Ai∈α being small (less “synergy”). Combine this with the mediation condition can probably give us a notion of pareto-optimality for natural latents.

Another thing we could do is when we construct pareto-optimal natural latents Y over X, we add them to the original set of variables to augment the redundancy lattice, so that new natural latents can be approximately deterministic functions over (collections of) existing natural latents, and this naturally allows us to represent the “hierarchical nature of abstractions” where lower-level abstractions makes it easier to compute higher-level ones.

A concrete setting where this can be useful is where a bunch of agents receive different but partially overlapping sets of observations and aims to predict partially overlapping domains. Having a fine grained collection of natural latents redundant across different elements of the redundancy lattice means we get to easily zoom in on the smaller subset of latent variables that’s (maximally) redundantly represented by all of the agents (& be able to tell which domains of predictions these latents actually mediate).

Reply1
Futarchy's fundamental flaw
Daniel C3mo10

There's a chicken-and-egg problem here[...] and then using that assumption to prove that markets are causal.

 That argument was more about accomodating "different traders with different beliefs", but here's an independent argument for market being causal:

When I cause a particular effect/outcome, that means I mediate the influence between the cause of my action and the effect/outcome of my action, the cause of my action is conditionally independent of the effect of my action given me

Futarchy is a similar case: There may be many causes that influence market prices, which in turn determines the decision chosen, & market prices mediate the influence between the cause of market prices (e.g. different traders' beliefs) and the decision chosen. Any information can only influence what decision will be chosen through influencing the market prices. This seems like what it means for market to be causal (In a bayesnet, the decision chosen will literally only have market prices as the parent, assuming we commit to using futarchy to choose decisions).

 

 


 

Reply
Futarchy's fundamental flaw
Daniel C3mo10

The first expectation needs to be conditioned on the market activating. (That is not conditionally independent of u given d1 in general.)

 

If we commit to using futarchy to choose decision, then market 1 activating will have exactly the same truth conditions as executing d1, so "market activating and d1" would be the exact same thing as "d1" itself (commiting to use futarchy to choose decision means we assign 0 probability to "first market activating & execute d2" or "Second market activating & execute d1")

Different people have different beliefs, so the expectations are different for different traders. You can't write "E" without specifying for which trader.

Yes, we can replace with E_i, and then argue that traders with accurate beliefs will accumulate more money over time, making market estimates more accurate in the limit
 


 

Reply
Futarchy's fundamental flaw
Daniel C3mo10

My main objection to this logic is that there doesn't seem to be any reflection of the idea that different traders will have different beliefs.[...] All my logic is based on a setup where different traders have different beliefs.


 Over time, traders who have more accurate beliefs (& act rationally according to those beliefs) will accumulate more money in expectation (& vice versa), so in the limit we can think of futarchy as aggregating the beliefs of different traders weighted by how accurate their beliefs were in the past

So I don't think the condition "p1>E[u|d1]" really makes sense? [...]and this makes it unlikely that the market will converge to E[u|d1].

If I pay p1 for a contract in market 1, my expected payoff is:

(E[u|d1]−p1)P(d1)+0×P(d2) (since I get my money back if d2/market 2 is activated)

this is negative iff p1>E[u|d1] and positive iff p1<E[u|d1]

and if we commit to using futarchy to choose the decision, then d1 is chosen iff market 1 activates, so E_i[u|d1, market 1 activates] should equal E_i[u|d1]
 

Reply
Futarchy's fundamental flaw
Daniel C3mo30

We want you to pay more for a contract for coin A, since that’s the coin you think is more likely to be heads (60% vs 59%). But if you like money, you’ll pay more for a contract on coin B. You’ll do that because other people might figure out if it’s an always-heads coin or an always-tails coin. If it’s always heads, great, they’ll bid up the market, it will activate, and you’ll make money. If it’s always tails, they’ll bid down the market, and you’ll get your money back.

 

Let's call "Bidding on B, hoping that other people will figure out if B is an always-head or always-tails coin" strategy X,  and call "Figure out if B is an always-head or always-tail myself & bid accordingly, or if I can't, bid on A because it's better in expectation" strategy Y.

 

If I believe that sufficient number of people in the market are using strategy Y, then it's beneficial for me to use strategy X, and insofar as my beliefs about the market are accurate, this is okay, because sufficient number of people using strategy Y means the market will actually figure out if B is always-head or always-tail, then bid accordingly. So the market selects the right decision, insofar as my beliefs about the market is correct (Note that I'm never incentivized to place a bid on B so large that it causes B to activate, since I don't actually know if B is always-head).

 

On the other hand, if I believe that the vast majority of people in the market are using strategy X instead of strategy Y, then it's no longer beneficial for me to use strategy X myself, I should instead use strategy Y because the market doesn't actually do the work of finding out if coin B is always-head for me. Other traders who have accurate beliefs about the market will switch to strategy Y as well, until there is a sufficient number of trader to push the market towards the right decision.

So insofar as people have accurate beliefs about the market, the market will end up selecting the right decision (either sufficient number of people use strategy Y, in which case it's robust for me to use strategy X, or not enough people are using strategy Y, in which case people are incentivized to switch to Y)

More generally, what's the argument that the market will always select the decision that leads to he higher expected payout?

"Always" might be too strong, but very informally:

Suppose that we have we have decision d1 d2, with outcome/payoff u & conditional market prices p1 (corresponds to d1) p2 (corresponds to d2)
 

if p1>E[u|d1], then traders are incentivized to sell & drive down p1. Similarly they will be incentivized to bid up p1 if p1<E[u|d1]. So p1 will tend toward E[u|d1]. We can argue similar for p2 tending towards E[u|d2]

Since we choose the decision with the higher price, and prices tend towards the expected payoff given that decision, the market end up choosing the decision that leads to the higher expected payoff.


 

Reply
Load More
16Sleeping Experts in the (reflective) Solomonoff Prior
Ω
15d
Ω
0
29Towards building blocks of ontologies
7mo
0
11Can subjunctive dependence emerge from a simplicity prior?
Q
1y
Q
0
19Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
1y
18
23What program structures enable efficient induction?
1y
5
22My decomposition of the alignment problem
1y
22