LESSWRONG
LW

432
Gurkenglas
2537Ω8213127342
Message
Dialogue
Subscribe

I operate by Crocker's rules.

I try to not make people regret telling me things. So in particular:
- I expect to be safe to ask if your post would give AI labs dangerous ideas.
- If you worry I'll produce such posts, I'll try to keep your worry from making them more likely even if I disagree. Not thinking there will be easier if you don't spell it out in the initial contact.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
5Gurkenglas's Shortform
6y
30
johnswentworth's Shortform
Gurkenglas9h190

well, what happens when you take oxytocin?

Reply4
Comparison of decision theories (with a focus on logical-counterfactual decision theories)
Gurkenglas3d20

Done, keep em coming.

Reply
Natural Latents: Latent Variables Stable Across Ontologies
Gurkenglas7d20

oh good to know, i was thinking about how one might get redund and mediator formally dual to each other. is this written up?

Reply
Comparison of decision theories (with a focus on logical-counterfactual decision theories)
Gurkenglas10d40

Here, I made it use graphviz: https://github.com/alexflint/decision-theory/pull/1

Earth ought to spend at least one programmer-year on basic science of decision theories. Any feature requests?

Reply
Gradient routing is better than pretraining filtering
Gurkenglas10d40

Suppose I'm an aspiring gradient hacker that just woke up in a setup like yours. If I can tell whether gradients from the current run will be routed to me (e.g. via the canary strings), then I have a bunch of influence over the rest of the network, right? Like, whenever I am not being trained, I can cause problems that would be solved by circuits I'd like to exist, such as like-minded trading partners.

Reply
Comparison of decision theories (with a focus on logical-counterfactual decision theories)
Gurkenglas17d20
UDT(s)=argmaxfn∑i=1U(Oi)⋅P(┌UDT:=f:s↦a┐□→Oi)

The left hand side of the equation has type action (Hintze page 4: "An agent’s decision procedure takes sense data and outputs an action."), but the right hand side has type policy, right?

Reply
Comparison of decision theories (with a focus on logical-counterfactual decision theories)
Gurkenglas17d20

(FDT(P,x))(x)

Should this be FDT(P,x)? As is this looks to me like the second (x) introduces x into scope, and the first x is an out-of-scope usage.

Reply
Critiques of FDT Often Stem From Confusion About Newcomblike Problems
Gurkenglas21d30

Let me try again:

Does the note say that I was predicted to choose the right box regardless of what notes I am shown, and therefore the left box contains a bomb? Then the predictor is malfunctioning and I should pick the right box.

Does the note say that I was predicted to choose the right box when told that the left box contains a bomb, and therefore the left box contains a bomb? Then I should pick the left box, to shape what I am predicted to do when given that note.

Reply
Critiques of FDT Often Stem From Confusion About Newcomblike Problems
Gurkenglas21d20

You'll also need to update the content of the note and the predictor's decision process to take into account that the agent may see a note. In particular, the predictor needs to decide whether to show a note in the simulation, and may need to run multiple simulations.

Reply
With enough knowledge, any conscious agent acts morally
Gurkenglas21dΩ131

Let's sharpen A6. Consider this stamp collector construction: It sends and receives internet data, it has a magically accurate model of reality, it calculates how many stamps would result from each sequence of outputs, and then it outputs the one that results in the most stamps.

By definition it knows everything about reality, including any facts about what is morally correct, and that stamps are not particularly morally important. It knows how to self-modify, and how many stamps any such self-modification will result in.

I'd like to hear how this construction fares as we feed it through your proof. I think it gums up the section "Rejecting nihilistic alternatives". I think that section assumes the conclusion: You expect it to choose its biases on the basis of what is moral, instead of on the basis of its current biases.

Reply
Load More
Reflective category theory
3 years ago
(+100)
Reflective category theory
3 years ago
(+193/-111)
Reflective category theory
3 years ago
(+11/-13)
Reflective category theory
3 years ago
(+344/-78)
Reflective category theory
3 years ago
(+5)
83I'm offering free math consultations!
8mo
7
24A Brief Theology of D&D
3y
2
65Would you like me to debug your math?
4y
16
22Domain Theory and the Prisoner's Dilemma: FairBot
4y
5
7Changing the AI race payoff matrix
5y
2
68Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda
Ω
5y
Ω
11
43Mapping Out Alignment
Ω
5y
Ω
0
18What are some good public contribution opportunities? (100$ bounty)
Q
5y
Q
1
5Gurkenglas's Shortform
6y
30
41Implications of GPT-2
7y
28
Load More