Wiki Contributions


Deleted comments archive?

I think downvoting things to minus infinity is almost always better than deleting them. (One exception is purging all content posted by new users banned for spam/nonsense.) A warning before temporary suspension of posting privileges if it's not heeded should be equally effective where a comment would normally be deleted as discouragement from further engagement in some current drama.

Mati_Roy's Shortform

The arguments for instrumental convergence don't apply to the smaller processes that take place within a world fully controlled by an all-powerful agent, because the agent can break Moloch's back. If the agent doesn't want undue resource acquisition to be useful for you, it won't be, and so on.

The expectation that humans would value preservation of values is shaky, it's mostly based on the instrumental convergence argument, that doesn't apply in this setting. So it might actually turn out that human preference says that value preservation is not good for individual people, that value drift in people is desirable. Absence of value drift is still an instrumental goal for the agent in charge of the world that works for the human preference that doesn't drift. This agent can then ensure that the overall shape of value drift in the people who live in the world is as it should be, that it doesn't descend into madness.

Value drift only makes sense where the abstraction of values makes sense. Does my apartment building have a data integrity problem, does it fail some hash checks? This doesn't make sense, the apartment building is not a digital data structure. I think it's plausible that some AGIs of the non-world-eating variety lack anything that counts as their preference, they are not agents. In a world dominated by such AGIs some people would still set up smaller agents merely for the purpose of their own preference management (this is the overhead I alluded to in the previous comment). But for those who don't and end up undergoing unchecked value drift (with no agents to keep it in line with what values-on-reflection approve of), the concept of values is not necessarily important either. This too might be the superior alternative, more emphasis on living long reflection than on being manipulated into following its conclusions.

Has anyone done decision theories using turing machines?

A deterministic system still has parts, and we can talk about how its behavior depends on those parts, as we vary the parts and thus the system (or just replace the parts with alternatives that might or might not be semantically the same as the original parts; you can replace "5+7" with "2+10" without changing its meaning). The trouble with decision making is that there are inexplicable vague desiderata for how to carve up the system into parts to represent the way an agent controls it. If this is done in a dumb straightforward way, you get orthodox CDT. The setting for doing this is a vague sensible way is algorithmic (as opposed to physical) decision making, but it's unclear how to actually operationalize it. Various proof search schemes give OK results, but don't solve any clear formalizations of this. The (informal) idea of "spurious proofs" that are an issue for the proof search decision algorithms is related.

Maybe the agent should always two-box?

Considerations that make this sound sensible are the content of the ASP problem linked above. I wrote my comment for the case where it should've said "one-box" there, for otherwise the subsequent claim that "Clearly no two such Turing machines exist" wants to be false (when this is appropriately formalized).

Mati_Roy's Shortform

Learning distills memories in models that can be more reasonably bounded even for experience on astronomical timescales. It's not absolutely necessary to keep the exact record of everything. What it takes to avoid value drift is another issue though, this might incur serious overhead.

Value drift in people is not necessarily important, might even be desirable, it's only clear for the agent in charge of the world that there should be no value drift. But even that doesn't necessarily make sense if there are no values to work with as a natural ingredient of this framing.

Has anyone done decision theories using turing machines?

In the example from the post, agent acts like the counterexample program in the proof of unsolvability of the halting problem. It looks at a purported Oracle, and acts in a way that contradicts Oracle's prediction, whatever that is. This is not always possible, for example if Oracle's answer can be "don't know" (which can't be contradicted) or if the agent has a time limit that Oracle doesn't. And generally, programs don't have to literally simulate each other, they can reason about each other instead, which often allows both to understand how the other behaves.

In Newcomb's problem, Omega might just leave the box empty if it can't predict the agent, so this behavior will result in a loss. A related issue is Agent Simulates Predictor (ASP problem).

If M2 outputs "no $1M", output "two-box".

You probably meant "one-box" here. Otherwise the agent just always two-boxes, in both cases (and sometimes diverges).

Has anyone done decision theories using turing machines?

Formalization of decision making with a program that reasons about its environment that is also a program:

Using provability logic to sidestep the issues with proof search bounds or need for oracles:

Cooperation in PD with two bounded-runtime programs that reason about each other:

Feature idea: Notification when a parent comment is modified

It's not a good feature without settling the notification pollution objection. I sometimes edit comments like 10 times for typos and wording. This would be fine if there is an opt-in flag to intentionally push the update notifications when I judge my own edit as substantial.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

This comment mostly makes good points in their own right, but I feel it's highly misleading to imply that those points are at all relevant to what Unreal's comment discussed. A policy doesn't need to be crucial to be good. A working doesn't need to be worse than terrible to get attention to its remaining flaws. Inaccuracy of a bug report should provoke a search for its better form, not nullify its salience.

My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage)

outsiders as "normies"

I've seen the term used a few times on LW. Despite the denotational usefulness, it's very hard to keep it from connotationally being a slur, not without something like there being an existing slur and the new term getting defined to be its denotational non-slur counterpart (how it actually sounds also doesn't help).

So it's a good principle to not give it power by using it (at least in public).

How to deal with unknown probability distributions?

It's unclear from the initial term-seeking and this response, so just to make sure: it's a standard term, the usual reference is

  • M Li, P Vitányi, An Introduction to Kolmogorov Complexity and Its Applications.
Load More