All of Shiroe's Comments + Replies

Exactly. I wish the economic alignment issue was brought up more often.

You're right. I'm updating towards illusionism being orthogonal to anthropics in terms of betting behavior, though the upshot is still obscure to me.

I agree realism is underrated. Or at least the term is underrated. It's the best way to frame ideas about sentientism (in the sense of hedonic utilitarianism). On the other hand, you seem to be talking more about rhetorical benefits of normative realism about laws.

Most people seem to think phenomenal valence is subjective, but that's confusing the polysemy of the word "subjective", which can mean either arbitrary or bound to a first-person subject. All observations (including valenced states like suffering) are subjective in the second sense, but not in th... (read more)

it is easy to cooperate on the shared goal of not dying

Were you here for Petrov Day? /snark

But I'm confused what you mean about a Pivotal Act being unnecessary. Although both you and a megacorp want to survive, you each have very different priors about what is risky. Even if the megacorp believes your alignment program will work as advertised, that only compels them to cooperate with you if they are (1) genuinely concerned about risk in the first place, (2) believe alignment is so hard that they will need your solution, and (3) actually possess the institutional coordination abilities needed.

And this is just for one org.

World B has a 1, maybe minus epsilon chance of solving alignment, since the solution is already there.

That is totum pro parte. It's not World B which has a solution at hand. It's you who have a solution at hand, and a world that you have to convince to come to a screeching halt. Meanwhile people are raising millions of dollars to build AGI and don't believe it's a risk in the first place. The solution you have in hand has no significance for them. In fact, you are a threat to them, since there's very little chance that your utopian vision will match up wit... (read more)

I do not think a pivotal act is necessary, primarily because it's much easier to coordinate around negative goals like preventing their deaths than positive goals. That's why I'm so optimistic, it is easy to cooperate on the shared goal of not dying even if value differences after that are large.

Okay, let's operationalize this.

Button A: The state of alignment technology is unchanged, but all the world's governments develop a strong commitment to coordinate on AGI. Solving the alignment problem becomes the number one focus of human civilization, and everyone just groks how important it is and sets aside their differences to work together.

Button B: The minds and norms of humans are unchanged, but you are given a program by an alien that, if combined with an AGI, will align that AGI in some kind of way that you would ultimately find satisfying.

World ... (read more)

I actually think A or B is a large improvement compared to the world as it exists today, but B wins due to the stakes and the fact that they already have the solution, but world A doesn't have the solution pre-loaded, and with extremely important decisions, B wins over A. World A is much better than today, to the point that a civilizational scale effort would probably succeed about 95-99.9% of the time, primarily because they understand deceptive alignment. World B has a 1, maybe minus epsilon chance of solving alignment, since the solution is already there. Both, of course are far better than our world.

I agree that the political problem of globally coordinating non-abuse is more ominous than solving technical alignment. If I had the option to solve one magically, I would definitely choose the political problem.

What it looks like right now is that we're scrambling to build alignment tech that corporations will simply ignore, because it will conflict with optimizing for (short-term) profits. In a word: Moloch.

I would choose the opposite, because I think the consequences of the first drastically outweigh the second.

It's happened before though. Despite being one of those 2 friends, I've already been forced to change my habits and regard videocalls as a valid form of communication.

none of this requires seperate privileged existence different from the environment around us; it is our access consciousness that makes us special, not our hard consciousness.

That sounds like a plausible theory. But, if we reject that there is a separate 1st person perspective, doesn't that entail that we should be Halfers in the SBP? Not saying it's wrong. But it does seem to me like illusionism/elimitivism has anthropic consequences.

2the gears to ascension8mo
hmm. it seems to me that the sleeping mechanism problem is missing a perspective - there are more types of question you could ask the sleeping mechanism that are of interest. I'd say the measure increased by waking is not able to make predictions about what universe it is; but that, given waking, the mechanism should estimate the average of the two universe's wake counts, and assume the mechanism has 1.5 wakings of causal impact on the environment around the awoken mechanism. In other words, it seems to me that the decision-relevant anthropic question is how many places a symmetric process exists; inferring the properties of the universe around you, it is invalid to update about likely causal processes based on the fact that you exist; but on finding out you exist, you can update about where your actions are likely to impact, a different measure that does not allow making inferences about, eg, universal constants. if, for example, the sleeping beauty problem is run ten times, and each time the being wakes, it is written to a log; after the experiment, there will be on average 1.5x as many logs as there are samples. but the agent should still predict 50%, because the predictive accuracy score is a question of whether the bet the agent makes can be beaten by other knowledge. when the mechanism wakes, it should know it has more action weight in one world than the other, but that doesn't allow it to update about what bet most accurately predicts the most recent sample. two thirds of the mechanism's actions occur in one world, one third in the other, but the mechanism can't use that knowledge to infer about the past. I get the sense that I might be missing something here. the thirder position makes intuitive sense on some level. but my intuition is that it's conflating things. I've encountered the sleeping beauty problem before and something about it unsettles me - it feels like a confused question, and I might be wrong about this attempted deconfusion. but this expla

I can see how a computer could simulate any anthropic reasoner's thought process. But if you ran the sleeping beauty problem as a computer simulation (i.e. implemented the illusionist paradigm) aren't the Halfers going to be winning on average?

Imagine the problem as a genetic algorithm with one parameter, the credence. Wouldn't the whole population converge to 0.5?

I think the solution to the sleeping beauty problem depends on how exactly the bets are evaluated. The entire idea is that in one branch you make a bet once, but on the other branch you make a bet twice. Does it mean that if you make a correct guess in the latter branch, you win twice as much money? Or despite making (the same) bet twice, you only get the money once? Depending on the answer, the optimal bet probability is either 1/2 or 1/3.

Can you explain what you mean by "underdetermined" in this context? How is there any ambiguity in resolving the payouts if the game is run as a third person simulation?

If I program a simulation of the SBP and run it under illusionist principles, aren't the simulated Halfers going to inevitably win on average? After all, it's a fair coin.

It depends upon how you score it, which is why both the original problem and various decision-problem variants are underdetermined.

So you'd say that it's coherent to be an illusionist who rejects the Halfer position in the SBP?

Sure. Also coherent to be an illusionist who accepts the Halfer position in the SBP. It's an underdetermined problem.

I'm fine with everything on LW ultimately being tied to alignment. Hardcore materialism being used as a working assumption seems like a good pragmatic measure as well. But ideally there should also be room for foundational discussions like "how do we know our utility function?" and "what does it mean for something to be aligned?" Having trapped priors on foundational issues seems dangerous to me.

Thanks. That solved my issue.

What would it be conscious of, though? Could it feel a headache when you gave it a difficult riddle? I don't think a look-up table can be conscious of anything except for matching bytes to bytes. Perhaps that corresponds to our experience of recognizing that two geometric forms are identical.

We're not conscious of internal computational processes at that level of abstraction (like matching bits). We're conscious of outside inputs, and of the transformations of the state-machine-which-is-us from one state to the next. Recognizing two geometric forms are identical would correspond to giving whatever output we'd give in reaction to that.

Does anyone know of work dealing with the interaction between anthropic reasoning and illusionism/elimitivism?

What about a large look-up table that mapped conversation so far -> what to say next and was able to pass the Turing test? This program would have all the external signs of consciousness, but would you really describe it as a conscious being in the same way that you are?

That wouldn't fit into our universe (by about 2 metaorders of magnitude). But yes, that simple software would indeed have an equivalent consciousness, with the complexity almost completely moved from the algorithm to the data. There is no other option.

Unless the conscious algorithm in question will experience states that are not valence-neutral, I see no issue with creating or destroying instances of it. The same applies to any other type of consciousness. It seems implausible to me that any of our known AI architectures could instantiate such non-neutral valences, even if they do seem plausibly able to instantiate other kinds of experiences (e.g. geometric impressions).

3derek shiller8mo
I'm not particularly worried that we may harm AIs that do not have valenced states, at least in the near term. The issue is more over precedent and expectations going forward. I would worry about a future in which we create and destroy conscious systems willy-nilly because of how it might affect our understanding of our relationship to them, and ultimately to how we act toward AIs that do have morally relevant states. These worries are nebulous, and I very well might be wrong to be so concerned, but it feels risky to rush into things.

I'd love to hear about why anthropic reasoning made such a big difference for your prediction-market prediction. EDIT: Nevermind. Well played.

Quick note on the Ponzo illusion: In my view, seeing the top bar as longer is actually a more primitive, fundamental observation. The idea that the bars ought to appear as the same length is an additional interpretative layer thrown on top of this, justified by geometric principles and theories about human visual perception. The direct (or "raw") observation, however, is that the top bar appears longer.

Question: anyone know of some work on the connection between anthropic paradoxes and illusionism? (I couldn't figure out how to make a "Question" type post.)

In the top-right corner of Lesswrong, you should be able to see your username. In my view it is third from the right, alongside a star for the karma summary and a bell for alerts. If you click on your username, a drop-down menu should appear. On mine, the very first item is "New Question" which will open up a draft Question.

What does "no indication" mean in this context? Can you translate that into probability speak?

No indication in this context means that: 1. Our current paradigm is almost depleted. We are hitting the wall with both data (PaLM uses 780B tokens, there are 3T tokens publicly available, additional Ts can be found in closed systems, but that's it) and compute (We will soon hit Landauer's limit so no more exponentially cheaper computation. Current technology is only three orders of magnitude above this limit). 2. What we currently have is very similar to what we will ultimately be able to achieve with current paradigm. And it is nowhere near AGI. We need to solve either the data problem or the compute problem. 3. There is no practical possibility of solving the data problem => We need a new AI paradigm that does not depend on existing big data. 4. I assume that we are using existing resource nearly optimally and no significantly more powerful AI paradigm will be created until we have significantly more powerful computers. To have more significantly more powerful computers, we need to sidestep Landauer's limit, e.g. by using reversible computing or other completely different hardware architecture. 5. There is no indication that such architecture is currently in development and ready to use. It will probably take decades for such architecture to materialize and it is not even clear whether we are able to build such computer with our current technologies. We will need several technological revolutions before we will be able to increase our compute significantly. This will hamper the development of AI, perhaps indefinitely. We might need significant advances in material science, quantum science etc to be theoretically able to build computers that are significantly better than what we have today. Then, we will need to develop the AI algorithms to run on them and hope that it is finally enough to reach AGI-levels of compute. Even then, it might take additional decades to actually develop the algorithms.

Yes. Rogue AGI is scary, but I'm far more concerned about human misuse of AGI. Though in the end, there may not be that much of a distinction.

There's a big difference between teleology ... and teleonomy

I disagree. Any "purposes" are limited to the mind of a beholder. Otherwise, you'll be joining the camp of the child who thinks that a teddy bear falls to the ground because it wants to.

Work to offer the solutions and let them make their own, informed choice.

The problem is that the bureaucrats who make the decision of whether gene drives are allowed aren't the same people as the ones who are dying from malaria. Every day that you postpone the eradication of malaria by trying to convince bureaucrats, over a thousand people will die from the disease in question. Most of them, many of whom are infants, had no ability to meaningfully affect their political situation.

I guess it is logically coherent that a bean sprout could have values of its own. But what would it mean for a bean sprout to value something?

You might say, its evolutionary teleology is what it values. But it's only in your human mind that there is such a thing as that teleology, which was an idea your mind created to help it understand the world. By adopting such a non-sentientist view, your brain hasn't stepped down from its old tyranny, but only replaced one of its notions with a more egalitarian sounding one. This pleases your brain, but the bean sprout had no say.

There's a big difference between teleology (humans projecting purposiveness onto inanimate matter) and teleonomy (humans recognizing evolutionary adaptations that emerged to embody convergent instrumental goals that promote the final goals of survival and reproduction). The latter is what I'm talking about with this essay. The biological purposes are not just in the mind of the beholder.

It may help to link to this for context.

Also, what is your impression of Stop Gene Drives? Do their arguments about risks to humans seem in good faith, or is "humans don't deserve to play god!" more like their real motive?

That's true that it could set a bad precedent. But it also could set a bad precedent to normalize letting millions of people die horribly just to avoid setting a bad precedent. It's not immediately clear to me which is worse in the very-long-run.

I think there is a rather large gap between saying it's wrong to force your solution on others to save them from themselves and normalizing letting millions of people die (for whatever reasons). Work to offer the solutions and let them make their own, informed choice. As has been noted by some already, it is not even clear that such forced actions are even required. Rushing to act without even bothering to try working with those being helped. That type of heavy-handed help seems completely uncalled for at this stage.

Something I didn't see mentioned: is there any concern that a sudden elimination of malaria could cause a population surge, with cascading food shortage effects? I have no idea how population dynamics work, so it's non-obvious to me whether there's a potential problem there. Even if so, though, that still wouldn't be an argument to not do the gene drive, but just to make the appropriate preparations beforehand.

8Timothy Underwood8mo
It wouldn't. First the time it takes for population changes to happen is very slow compared tithe business cycles that drive adaptations to economic changes. Second, eliminating malaria is considerably more likely to reduce population growth than increase it.

First I thought it was a computer chip, then I thought it was Factorio.

How should this affect one's decision to specialize in UI design versus other areas of software engineering? Will there be fewer GUIs in the future, or will the "audience" simply cease to be humans?

6Daniel Kokotajlo8mo
IMO, one probably shouldn't be specializing in UI design at the moment. Then again, other areas of software engineering might not be any better. That said, most of what's driving my advice here comes from my background views on AI timelines and not from Adept specifically.

Personhood is a separate concept. Animals that may lack a personal identity conception may still have first person experiences, like pain and fear. Boltzmann brains supposedly can instantiate brief moments of first person experience, but they lack personhood.

The phrase "first person" is a metaphor borrowed from the grammatical "first person" in language.

“We ran the experiment of email being a truly open P2P protocol… That experiment failed” (@patio11)

I must be missing something here. How does this fit in with the rest of the tweets in that list?

It's just a digest of my Twitter. I mostly prune it to progress-relevant stuff but sometimes I include stuff that's adjacent

People also talk about a slow takeoff being risky. See the "Why Does This Matter" section from here.

I don't doubt that slow take-off is risky. I rather meant that foom is not guaranteed, and risk due a not-immediately-omnipotent AI make be more like a catastrophic, painful war.

I'm not a negative utilitarian, for the reason you mention. If a future version of myself was convinced that it didn't deserve to be happy, I'd also prefer that its ("my") values be frustrated rather than satisfied in that case, too.

Are you an illusionist about first person experience? Your concept of suffering doesn't seem to have any experiential qualities to it at all.

1the gears to ascension9mo
no, I consider any group of particles that have any interaction with each other to contain the nonplanning preferences of the laws of physics, and agency can arise any time a group of particles can predict another group of particles and seek to spread their intent into the receiving particles. not quite panpsychist - inert matter does not contain agency. but I do view agency as a continuous value, not a discrete one.

the information defining a self preserving agent must not be lost into entropy, and any attempt to reduce suffering by ending a life when that life would have continued to try to survive is fundamentally a violation that any safe ai system would try to prevent.

Very strongly disagree. If a future version of myself was convinced that it deserved to be tortured forever, I would infinitely prefer that my future self be terminated than have its ("my") new values satisfied.

1Tensor White9mo
That's symmetrical with: if a future version of yourself was convinced that it deserved to not exist forever, you would infinitely prefer that your future self be unsatisfied than have its ("your") new existence terminated. Minimizing suffering (NegUtilism) is an arbitrary moral imperative. A moral imperative to maximize happiness (PosUtilism) is at least as valid.

Can you elaborate what such a process would be? Under illusionism, there is no first person perspective in which values can be disclosed (namely, for hedonic utilitarianism).

Ilusionism denies the reality of qualia, not personhood.

While it's true that AI alignment raises difficult ethical questions, there's still a lot of low-hanging fruit to keep us busy. Nobody wants an AI that tortures everyone to death.

Shiroe -- my worry is that if we focus only on the 'low-hanging fruit' (e.g. AI aligned with individuals, or with all of humanity), we'll overlook the really dangerous misalignments among human individuals, families, groups, companies, nation-states, religions, etc. that could be exacerbated by access to powerful AI systems. Also, while it's true that very few individuals or groups want to torture everyone to death, there are plenty of human groups (eg anti-natalists, eco-extremists, etc) that advocate for human extinction, and that would consider 'aligned AI' to be any AI aligned with their pro-extinction mission.

It will be interesting to see if EA succumbs to rot, or whether its principles are strong enough to scale.

Do you believe that the pleasure/pain balance is an invalid reason for violently intervening in an alien civilization's affairs? Is this true by principle, or is it simply the case that such interventions will make the world worse off in the long run?

4Lone Pine9mo
I would take it on a case by case basis. If we know for sure that an alien civilization is creating an enormous amount of suffering for no good reason (eg for sadistic pleasure), then intervening is warranted. But we should acknowledge this is equivalent to declaring war on the civ, even if the state of war is short period of time (due to a massive power differential). We should not go to war if there is possibility of negotiation. Consider the following thought experiment. It's the far future and physics has settled on a consensus that black holes contain baby universes and that our universe is inside a black hole in a larger universe, which we'll call the superverse. Also, we have the technology to destroy black holes. Some people argue that the black holes in our universe contain universes with massive amounts of suffering. We cannot know for sure what the pleasure/pain balance is in these baby universes, but we can guess, and many have come to the conclusion that a typical universe has massively more pain than pleasure. So we should destroy any and all black holes and their baby universes, to prevent suffering. (To simplify the moral calculus, we'll assume that destroying black holes doesn't give us valuable matter and energy. The thought experiment gets a lot more interesting if we relax this assumption, but principles remain.) The problem here is that there is no room to live in this moral system. It's an argument for the extinction of all life (except for life that is provably net-positive). The aliens that live in the superverse could just as well kill us since they have no way of knowing what the pleasure/pain balance is here in our universe. And I'm not just making an argument from acausal trade with the superverse. I do think it is in principle wrong to destroy a life on an unprovable assumption that most life is net-negative. I also don't think that pleasure and pain alone should be the moral calculus. In my view, all life has a fundamental beauty and

Criticism of one of your links:

those can all be ruled out with a simple device: if any of these things were the case, could that causate onto whether such an intuition fires? for all of them, the answer is no: because they are immaterial claims, the fact of them being true or false cannot have causated my thoughts about them. therefore, these intuitions must be discarded when reasoning about them.

Causation, which cannot be observed, can never overrule data. The attempted comparison involves incompatible types. Causation is not evidence, but a type of inter... (read more)

Because, so the argument goes, if the AI is powerful enough to pose any threat at all, then it is surely powerful enough to improve itself (in the slowest case, coercing or bribing human researchers, until eventually being able to self-modify). Unlike humans, the AI has no skill ceiling, and so the recursive feedback loop of improvement will go FOOM in a relatively short amount of time, though how long that is is a matter of question.

Isn't there a certain amount of disagreement about whether FOOM is the necessary thing to happen?

The space of possible minds/algorithms is so vast, and that problem is so open-ended, that it would be a remarkable coincidence if such an AGI had a consciousness that was anything like ours. Most details of our experience are just accidents of evolution and history.

Does an airplane have a consciousness like a bird? "Design an airplane" sounds like a more specific goal, but in the space of all possible minds/algorithms that goal's solutions are quite undetermined, just like flight.

3Steven Byrnes9mo
My airplane comment above was a sincere question, not a gotcha or argument or anything. I was a bit confused about what you were saying and was trying to suss it out. :)  Thanks. I do disagree with you though. Hmm, here’s an argument. Humans invented TD learning [], and then it was discovered that human brains (and other animals) incorporate TD learning too. Similarly, self-supervised learning is widely used in both AI and human brains, as are distributed representations and numerous other things. If our expectation is “The space of possible minds/algorithms is so vast…” then it would be a remarkable coincidence for TD learning to show up independently in brains & AI, right? How would you explain that? I would propose instead an alternative picture, in which there are a small number of practical methods which can build intelligent systems. In that picture (which I subscribe to, more or less), we shouldn’t be too surprised if future AGI has a similar architecture to the human brain. Or in the most extreme version of that picture, we should be surprised if it doesn’t! (At least, they’d be similar in terms of how they use RL and other types of learning / inference algorithms; I don’t expect the innate drives a.k.a. reward functions to be remotely the same [], at least not by default.)

Utilitarianism seems to demand such a theory of qualitative experience, but this requires affirming the reality of first-person experience. Apparently, some people here would rather stick their hand on a hot stove than be accused of "dualism" (whatever that means) and will assure you that their sensation of burning is an illusion. Their solution is to change the evidence to fit the theory.

It does if you're one of the Cool People like me who wants to optimize their qualitative experience, but you can build systems that optimize some other utility target. So this isn't really quite true. This is true.

I'm not quite convinced that illusionism is decision-irrelevant in the way you propose. If it's true that there is no such thing as 1st-person experience, then such experience cannot disclose your own values to you. Instead, you must infer your values indirectly through some strictly 3rd-person process. But all external probing of this sort, because it is not 1st-person, will include some non-zero degree of uncertainty.

One paradox that this leads to is the willingness to endure vast amounts of (purportedly illusory) suffering in the hope of winning, in exc... (read more)

Or some other first person process.

Creating or preventing conscious experiences from happening has a moral valence equivalent to how that conscious experience feels. I expect most "artificial" conscious experiences created by machines to be neutral with respect to the pain-pleasure axis, for the same reason that randomly generated bitmaps rarely depict anything.

4Steven Byrnes9mo
What if the machine is an AGI algorithm, and right now it’s autonomously inventing a new better airplane design? Would you still expect that?

Great work! I hope more people take your direction, with concrete experiments and monitoring real systems as they evolve. The concern that doing this will backfire somehow simply must be dismissed as untimely perfectionism. It's too late at this point to shun iteration. We simply don't have time left for a Long Reflection about AI alignment, even if we did have the coordination to pull that off.

Load More