LESSWRONG
LW

quetzal_rainbow
2623Ω3136970
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1quetzal_rainbow's Shortform
3y
163
Cole Wyeth's Shortform
quetzal_rainbow6d30

There is a difference between adoption as in "people are using it" and adoption as in "people are using it in economically productive way". I think supermajority of productivity from LLMs is realized as pure consumer surplus right now.

Reply
Help me understand: how do multiverse acausal trades work?
quetzal_rainbow7d42

We can send space ship beyond event horizon and still care about what is going to happen on it after it crosses event horizon, despite this being utterly irrelevant to our genetic fitness in causal sense. If we are capable to develop such preferences, I don't see any strong reason to develop strongly-monoverse decision theory.

Multiversal acausal trading is just logical consequence of LDT and I expect majority of powerful agents to have LDT-style decision theory, not LDT-but-without-multiverse decision theory.

Reply
Help me understand: how do multiverse acausal trades work?
quetzal_rainbow7d40

This is really weird line of reasoning, because "multiversal trading" doesn't mean "trading with entire multiverse", it means "finding suitable trading partner in multiverse".

First of all, there is a very-broad-but-well-defined class of agents which humans belong to. It's class of agents with indexical preferences. It's likely that indexical preferences are relatively weird in multiverse, but they are simple enough to be considered in any sufficiently broad list of preferences, as certain sort of curiosity for multiversal decision theorists. 

For what we know, out universe is going to end one way or another (heat death, cyclic collapse, Big Rip or something else). Because we have indexical preferences, we would like to escape universe in subjective continuity. Because, ceteris paribus, we can be provided with very small shares of reality to have subjective continuity, it creates large gains from trade for any non-indexical-caring entities. 

(And if our universe is not going to end, it means that we have effectively infinite compute, therefore, we actually can perform a lot of acausal trading.)

Next, there are large restrictions on search space. As you said, we both should be able to consider each other. I think, say, considering physics in which analogs of quantum computers can solve NP-problems in polynomial time is quite feasible - we have rich theory of approximation and we are going to discover even more of it. 

Another restriction is around preferences. If their preferences is something we can do, like molecular squiggles, then we should restrict ourselves to something sufficiently similar to our physics. 

We can go further and restrict preferences to sufficiently concave, such that we consider broad class of agents, each of which may have some very specific hard to specify peak of utility function (like very precise molecular squiggles), but have common broad basin of good enough states (they would like to have precise molecular squiggles, but they would consider it sufficient payment if we just produce a lot of granite spheres). 

Given all these restrictions, I don't find it plausible to believe that future human-aligned superintelligences with galaxies of computronium won't find any way to execute trade, given the incentives. 

Reply
Cole Wyeth's Shortform
quetzal_rainbow7d20

I don't think it's reasonable to expect such evidence to appear after such short period of time. There were no hard evidence that electricity is useful in a sense you are talking about until 1920s. Current LLMs are clearly not AGIs in a sense that they can integrate into economy as migrant labor, therefore, productivity gains from LLMs are bottlenecked on users. 

Reply
Help me understand: how do multiverse acausal trades work?
quetzal_rainbow8d83

Problem 1 is wrong objection.

CDT agents are not capable to cooperate in Prisoner's dilemma, therefore, they are selected out. EDT agents are not capable to refuse to pay in XOR blackmail (or, symmetrically, pay in Parfit's hitchhiker), therefore, they are selected out.

I think you will be interested in this paper.

Reply
The Problem
quetzal_rainbow1mo92

Yudkowsky wrote in letter for Time Magazine:

To visualize a hostile superhuman AI, don’t imagine a lifeless book-smart thinker dwelling inside the internet and sending ill-intentioned emails. Visualize an entire alien civilization, thinking at millions of times human speeds, initially confined to computers—in a world of creatures that are, from its perspective, very stupid and very slow.

And, if anything, That Alien Message was even earlier.

Reply1
My Empathy Is Rarely Kind
quetzal_rainbow1mo3121

I think proper guide for alignment researcher here is to:

  1. Understand other people as made-of-gears cognitive engines, i.e., instead of "they don't bother to apply effort for some reason" "they don't bother to apply effort because they learned in the course of their life that extra effort is not rewarded", or something like that. You don't even need to build comprehensive model, you just can list more than two hypotheses about possible gears and not assume "no gears, just howling abyss".
  2. Realize that it would require supernatual intervention for them to have your priorities and approaches. 
Reply
Thane Ruthenis's Shortform
quetzal_rainbow2mo3-2

Systematically avoiding all situations where you're risking someone's life in exchange for a low-importance experience would assemble into a high-importance life-ruining experience for you (starving to death in your apartment, I guess?).

We can easily ban speed above 15km/h for any vehicles except ambulances. Nobody starves to death in this scenario, it's just very inconvenient. We value convenience lost in this scenario more than lives lost in our reality, so we don't ban high-speed vehicles. 

Ordinal preferences are bad and insane and they are to be avoided.

What's really wrong with utilitarianism is that you can't, actually, sum utilities: it's a type error, because utilities are invariant up to affine transform, what would their sum mean?

The problem, I think, that humans naturally conflate two types of altruism. First type is caring about other entities mental state. Second type is "game-theoretic" or "alignment-theoretic" altruism: generalized notion of what does that mean to care about someone else's values. Roughly, I think that good type of the second type of altruism requires you to do fair bargaining in interests of entity you are being altruistic towards. 

Let's take "World Z" thought experiment. The problem from the second type altruism perspective is that total utilitarian gets very large utility from this world, while all inhabitants of this world, by premise, get very small utility per person, which is unfair division of gains. 

One may object: why not create entities who think that very small share of gains is fair? My answer is that if entity can be satisfied with infinitesimal share of gains, it also can be satisfied with infinitesimal share of anthropic measure, i.e., non-existence, and it's more altruistic to look for more demanding entities to fill universe with.

My general problem with animal welfare from bargaining perspective is that most of animals probably don't have sufficient agency to have any sort of representative in bargaining. We can imagine CEV of shrimp which is negative utilitarian and wants to kill all shrimp, or positive utilitarian which thinks that even very painful existence is worth it, or CEV that prefers shrimp swimming in heroin, or something human-like, or something totally alien, and sum of these guesses probably sums up to "do not torture and otherwise do as you please".

Reply
Just Make a New Rule!
quetzal_rainbow2mo366

Nobody likes rules. Rules are costly: first, they constrict the space of available actions or force you to expend resources to do something. Second, rules are costly to follow: you need to pay attention and remember all relevant rules and calculate all ways they interact. Third, in real life, rules aren't simple! After you left area of "don't kill", every rule has ambiguities and grey areas and strict dependency on judgement of enforcing authority. 

If everybody was good and smart, we wouldn't need rules. We would just publish "hey, lead is toxic, don't put it into dishes" and everybody would just stop using lead. After that, even if somebody continued using lead, everybody would just ask and conduct analysis and stop buying lead-tainted commodities and everybody still using lead would go bankrupt. 

Conversely, if everybody was good and smart, we wouldn't need authorities! Everybody would just do what's best.

You don't need to be utility minimizer to do the damage through rules. You need just to be the sort of person who likes to argue over rules to paralyze functioning of almost every group. Like, 95% of invocations of authority outside of legal sphere can be described as "I decided to stop this argument about rules, so I stop it". Heck, even Supreme Court functions mostly in this way. 

There are different societies. In broad society, whose goal is just "live and let live", sure, you can go for simple univerally-enforceable rules. In inclusive parts of society, like public libraries and parks and streets - same. it doesn't work for everything else. Like, there can't be comprehensive rules why you can('t) be fired from startup. There are CEOs and HRs and they make judgements about how productive you are et cetera and if their judgement is unfavorable, you get fired. Sure, there is a labor law, but your expences (including reputational) on trying to stay are probably going to be much higher than whatever you can hope to get. There are some countries where it's very hard to be fired, but such countries also don't have rich startup culture.  

Reply2
Thane Ruthenis's Shortform
quetzal_rainbow2mo20

The concept of weird machine is the closest to be useful here and an important quetion here is "how to check that our system doesn't form any weird machine here".

Reply
Load More
10Linkpost: Predicting Empirical AI Research Outcomes with Language Models
3mo
1
19Definition of alignment science I like
8mo
0
15How do you shut down an escaped model?
Q
1y
Q
8
15Training of superintelligence is secretly adversarial
2y
2
8There is no sharp boundary between deontology and consequentialism
2y
2
17Where Does Adversarial Pressure Come From?
2y
1
7Predictable Defect-Cooperate?
2y
1
55They are made of repeating patterns
2y
4
10How to model uncertainty about preferences?
Q
2y
Q
2
3What literature on the neuroscience of decision making can you recommend?
Q
2y
Q
0
Load More