Gurkenglas's Comments

[Personal Experiment] Training YouTube's Algorithm

That seems silly, given the money on the line and that you can have your ML architecture take this into account.

Causal Abstraction Intro

decided to invest in a high-end studio

I didn't catch that this was a lie until I clicked the link. The linked post is hard to understand - it seems to rely on the reader being similar enough to the author to guess at context. Rest assured that you are confusing someone.

Counterfactual Induction

So the valuation of any propositional consequence of A is going to be at least 1, with equality reached when it does as much of the work of proving bottom as it is possible to do in propositional calculus. Letting valuations go above 1 doesn't seem like what you want?

Counterfactual Induction

Then that minimum does not make a good denominator because it's always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)

Counterfactual Induction
a magma [with] some distinguished element

A monoid?

min,ϕ(A,ϕ⊢⊥) where ϕ is a propositional tautology given A

Propositional tautology given A means A⊢ϕ, right? So ϕ=⊥ would make L small.

When would an agent do something different as a result of believing the many worlds theory?

An agent might care about (and acausally cooperate with) all versions of himself that "exist". MWI posits more versions of himself. Imagine that he wants there to exist an artist like he could be, and a scientist like he could be - but the first 50% of universes that contain each are more important than the second 50%. Then in MWI, he could throw a quantum coin to decide what to dedicate himself to, while in CI this would sacrifice one of his dreams.

Moloch feeds on opportunity

"I have trouble getting myself doing the right thing, focusing on what selfish reasons I have to do it helps." sounds entirely socially reasonable to me. Maybe that's just because we here believe that picking and choosing what x=selfish arguments to listen to is not aligned with x=selfishness.

Towards a New Impact Measure

is penalized whenever the action you choose changes the agent's ability to attain other utilities. One thing an agent might do to leave that penalty at zero is to spawn a subagent, tell it to take over the world, and program it such that if the agent ever tells the subagent it has been counterfactually switched to another reward function, the subagent is to give the agent as much of that reward function as the agent might have been able to get for itself, had it not originally spawned a subagent.

This modification of my approach came not because there is no surgery, but because the penalty is |Q(a)-Q(Ø)| instead of |Q(a)-Q(destroy itself)|. is learned to be the answer to "How much utility could I attain if my utility function were surgically replaced with ?", but it is only by accident that such a surgery might change the world's future, because the agent didn't refactor the interface away. If optimization pressure is put on this, it goes away.

If I'm missing the point too hard, feel free to command me to wait till the end of Reframing Impact so I don't spend all my street cred keeping you talking :).

Towards a New Impact Measure

Assessing its ability to attain various utilities after an action requires that you surgically replace its utility function with a different one in a world it has impacted. How do you stop it from messing with the interface, such as by passing its power to a subagent to make your surgery do nothing?

Towards a New Impact Measure

If it is capable of becoming more able to maximize its utility function, does it then not already have that ability to maximize its utility function? Do you propose that we reward it only for those plans that pay off after only one "action"?

Load More