Cleo Nardo - LessWrong

if a lab has 100 million AI employs and 1000 human employees then you only need one human employee to spend 1% of their allotted AI headcount on your pet project and you’ll have 1000 AI employees

Shortform

Cleo Nardo3mo30

seems correct, thanks!

Shortform

Cleo Nardo3mo150

Why do decision-theorists say "pre-commitment" rather than "commitment"?

e.g. "The agent pre-commits to 1 boxing" vs "The agent commits to 1 boxing".

Is this just a lesswrong thing?

https://www.lesswrong.com/tag/pre-commitment

Will quantum randomness affect the 2028 election?

Cleo Nardo4mo30

Steve Byrnes argument seems convincing.

If there’s 10% chance that the election depends on an event which is 1% quantum-random (e.g. the weather) then the overall event is 0.1% random.

How far back do you think an omniscient-modulo-quantum agent could‘ve predicted the 2024 result?

2020? 2017? 1980?

A Shutdown Problem Proposal

Cleo Nardo4mo80

The natural generalization is then to have one subagent for each time at which the button could first be pressed (including one for “button is never pressed”, i.e. the button is first pressed at ). So subagent $\infty$ maximizes E[ $u_{1}$ | do( $\forall t : {button}_{t}$ = unpressed), observations], and for all other times subagent T maximizes E[ $u_{2}$ | do( $\forall t < T : {button}_{t}$ = unpressed, ${button}_{T}$ = pressed), observations]. The same arguments from above then carry over, as do the shortcomings (discussed in the next section).

Can you explain how this relates to Elliot Thornley's proposal? It's pattern matching in my brain but I don't know the technical details.

Uncertainty in all its flavours

Cleo Nardo4mo20

For the sake of potential readers, a (full) distribution over is some $γ : X \to [0, 1]$ with finite support and $\sum x \in X γ (x) = 1$ , whereas a subdistribution over $X$ is some $γ : X \to [0, 1]$ with finite support and $\sum x \in X γ (x) \leq 1$ . Note that a subdistribution $γ$ over $X$ is equivalent to a full distribution over $X + 1$ , where $X + 1$ is the disjoint union of $X$ with some additional element, so the subdistribution monad can be written $Δ (- + 1)$ .

I am not at all convinced by the interpretation of $(- + 2)$ here as terminating a game with a reward for the adversary or the agent. My interpretation of the distinguished element $⊥$ in $(- + 1)$ is not that it represents a special state in which the game is over, but rather a special state in which there is a contradiction between some of one's assumptions/observations.

Doesn't the Nirvana Trick basically say that these two interpretations are equivalent?

Let $(- + 2)$ be $X \mapsto X + {0, 1}$ and let $(- + 1)$ be $X \mapsto X + {0}$ . We can interpret $\lor$ as possibility, $0$ as a hypothesis consistent with no observations, and $1$ as a hypothesis consistent with all observations.

Alternatively, we can interpret $\lor$ as the free choice made by an adversary, $0$ as "the game terminates and our agent receives minimal disutility", and $1$ as "the game terminates and our agent receives maximal disutility". These two interpretations are algebraically equivalent, i.e. $(\lor, 0, 1)$ is a topped and bottomed semilattice.

Unless I'm mistaken, both $P_{f}^{+} \circ Δ \circ (- + 2)$ and $P_{f}^{+} \circ Δ \circ (- + 1)$ demand that the agent may have the hypothesis "I am certain that I will receive minimal disutility", which is necessary for the Nirvana Trick. But $P_{f}^{+} \circ Δ \circ (- + 2)$ also demands that the agent may have the hypothesis "I am certain that I will receive maximal disutility". The first gives bounded infrabayesian monad and the second gives unbounded infrabayesian monad. Note that Diffractor uses $P_{f}^{+} \circ Δ \circ (- + 2)$ in Infra-Miscellanea Section 2.

AI Safety Chatbot

Cleo Nardo5mo110

cool!

What LLM is this? GPT-3?
Considered turning this into a customer gpt?

Don't Share Information Exfohazardous on Others' AI-Risk Models

Cleo Nardo5mo40

Okay, mea culpa. You can state the policy clearly like this:

"Suppose that, if you hadn't been told by someone who thinks $X$ is exfohazardous, then you wouldn't have known $X$ before time $t$ . Then you are obligated to not tell anyone $X$ before time $t$ ."

Don't Share Information Exfohazardous on Others' AI-Risk Models

Cleo Nardo5mo64

yep, if that's OP's suggestion then I endorse the policy. (But I think it'd be covered by the more general policy of "Don't share information someone tells you if they wouldn't want you to".) But my impression is that OP is suggesting the stronger policy I described?

Don't Share Information Exfohazardous on Others' AI-Risk Models

Cleo Nardo5mo20

“Don't share information that’s exfohazardous on others' models, even if you disagree with those models, except if your knowledge of it isn’t exclusively caused by other alignment researchers telling you of it.”

So if Alice tells me about her alignment research, and Bob thinks that Alice’s alignment research is exfohazardous, then I can’t tell people about Alice’s alignment research?

Unless I’ve misunderstood you, that’s a terrible policy.

Why am I deferring to Bob, who is completely unrelated? Why should I not using my best judgement, which includes the consideration that Bob is worried? What does this look like in practice, given someone people think everything under the sun is exfohazardous?

Of course, if someone tells me some information and asks me not to share it then I won’t — but that’s not a special property of AI xrisk.

LESSWRONG
LW

Sequences

Posts

Wiki Contributions

Comments