# All of Dach's Comments + Replies

But, sure, if you're somehow magically unhackable and very good at keeping the paperclipper boxed until you fully understand it, then there's a chance you can trade, and you have the privilege of facing the next host of obstacles.

Now's your chance to figure out what the next few obstacles are without my giving you spoilers first. Feel free to post your list under spoiler tags in the comment section.

Ideas:

1. Someone else definitely builds and deploys an UFAI before you finish studying Clippy. (This would almost always happen?)
2. Clippy figures out that it's in a
...

Is Humbali right that generic uncertainty about maybe being wrong, without other extra premises, should increase the entropy of one's probability distribution over AGI, thereby moving out its median further away in time?

One thought is that this argument fails to give accurate updates to other people. Almost 100% of people would give AGI medians much further away than what I think is reasonable, and if this method wants to be a generally useful method for getting better guesses by recognizing you

...

If I try to answer that question as written, I'd say that any time I see a probability estimate with on-the-order-of-hundreds of zeroes, when I know that event actually happened (at least) once in Earth's past light cone, I'm going to assume there is an error in the model that generated the estimate, whether I know what it is or not.

I would agree for pretty much any other topic. This is an event required for people to be around to observe it. Imagine a universe in which abiogenesis events really were absurdly rare- unlikely to ever occur in a given observa...

I'm not aware of an argument that there was only on abiogenesis event on Earth, just the observation that all known surviving lineages come from a universal common ancestor fairly early on. In principle that would be compatible with any number of initial events. It's just that once a given lineage evolved enough adaptions/improvements, it would spread and take over, and then no new lineage would be able to compete/get started.

Your observation is an argument for only one abiogenesis event, and your claim that one would spread and take over and no new lineag...

2AnthonyC2y
That's fair, and I genuinely wasn't trying to nitpick, it is a very good question. If I try to answer that question as written, I'd say that any time I see a probability estimate with on-the-order-of-hundreds of zeroes, when I know that event actually happened (at least) once in Earth's past light cone, I'm going to assume there is an error in the model that generated the estimate, whether I know what it is or not. So what I way trying to point to is that if a catalytic cycle of many (much smaller) RNA strands was sufficient for an abiogenesis event, that could lower the probability estimate enough to make such events more likely by enough that there could have been multiple even just on Earth without straining credulity, and the world today would likely look basically the same either way since the more-competitive biochemistry would have long since reach fixation (and/or the lineages could have merged in some analog of later endosymbiosis events).

While cool, I didn't expect indefinite self-replication to be hard under these circumstances. The enzymes work by combining two halves of the other enzyme- i.e. they are not self-replicating using materials we would expect to ever naturally occur, they are self-replicating using bisected versions of themselves.

I've slightly downgraded my estimate for the minimum viable genome size for self-replicating RNA because I wasn't thinking about complicated groups of cross-catalyzing RNA.

Scott: if you believe that people have auras that can implant demons into your mind then you're clearly insane and you should seek medical help.

Also Scott: beware this charismatic Vassar guy, he can give you psychosis!

These so obviously aren't the same thing- what's your point here? If just general nonsense snark, I would be more inclined to appreciate it if it weren't masquerading as an actual argument.

People do not have auras that implant demons into your mind, and alleging so is... I wish I could be more measured somehow. But it's insane and you should ...

Why is this limit unique? Why can't we be working on "distribution inefficiencies and similar" for the next 100 years?

In the case of real GDP per capita per hour worked, this limit is exactly unique- "distribution inefficiencies and similar" doesn't apply. Indeed, this is tautologically true as you say. Think about what it would look like for an increase in real GDP per capita per hour worked to not have the form of "Something allowed for more work to be done per hour per person". It wouldn't look like anything- that doesn't make any sense.

I would complete...

Self-driving technology is advancing and will soon(ish) allow us to move cars without humans being directly involved, except in terms of maintenance and management. This will be a major boon because it will partially remove humans from the equation- the bottleneck is partially removed. This has no real bearing on the title statement- I even remark about this in my post.

The "universality" here is trivial- here is a copy-paste of part of my response to a similar comment:

For everyone to become richer without working harder, we must develop technologies that a

...

For everyone to become richer without working harder, we must develop technologies that allow more work to be done per man-hour. Aside from working out distribution inefficiencies and similar, this is the unique limit on prosperity. This is what I mean by "humans are the universal bottleneck"- we only have so many man-hours, and any growth is going to be of the form "With the same amount of hours, we do more".

Some segments of the economy have not had as much growth in the above department. For example, houses are assembled manually- all major parts must be...

1[comment deleted]2y

Interesting, thank you.

Is the quadrupling of drag and octupling of rolling resistance related to the assumption that drag is proportional to the surface area of the side on which the drag is produced, and that rolling resistance is proportional to weight? Either way, cost would still decrease due to larger and more complex engines, as rolling resistance per kg would not change.

Of course, railway sizes are fixed, so there is little to be done. I was just speculating where the relative efficiency of cargo ships comes from. I made an edit at the end of the post which contains a very rough approximation of how large savings on wages are in the case of cargo container ships.

Yes. It's a little more complex than this since rolling resistance is irrespective of speed, whereas drag increases with speed. But if you're aiming for efficiency you'll go at low speeds, so we can hold speed fixed and see what happens as we scale. Even if the engine is 100% efficient, you still lose energy to drag and rolling resistance, so sooner or later increased engine efficiency doesn't buy you very much. I think it's the fact they're so much larger, and so drag/capacity is very low.

I would expect fuel efficiency to be related to the size and complexity of the engine. Producing some amount of force is going to require the same amount of fuel assuming energy loss due to resistance/friction is the same, and the engine is the same.

If true, we could e.g. have absurdly large trains on lots of rails? I would expect energy loss due to rubbing on rails and changing elevation to be similar to energy loss due to rubbing on water.

As you double each dimension, capacity octuples, drag quadruples, but rolling resistance octuples. Ships only have drag from the water, but trains also have rolling resistance from the tracks. This means trains don't get significantly more efficient as they grow larger, but ships do.

This seems wrong. Imagine some country doesn't have unobtainium, a mineral which is rare and also not particularly useful. You can't get it at any price. Then it finds some, and soon enough many citizens have unobtainium paper holders. Does this mean GDP has grown by a factor of infinity? Hell no, most people would gladly exchange their paper holders for something more useful but also previously obtainable.

Think about it this way. Suppose we have some device that was moderately valuable which everyone needed to own exactly one of, and it costs \$100 per yea...

Is that your real disagreement with the experience machine?

I think if you accept the premise that the machine somehow magically truly simulates perfectly and indistinguishably from actual reality, in such a way that there is absolutely no way of knowing the difference between the simulation and the outside universe, then the simulated universe is essentially isomorphic to reality, and we should be fully indifferent. I'm not sure it even makes sense to say either universe is more "real", since they're literally identical in every way that matters (for the d

...

I can confirm that this still works. Sum of the price of all Nos is \$14.77, payoff is \$15.

So, I guess the question boils down to, how seriously should I consider switching into the field of AI Alignment, and if not, what else should I do instead?

I think you should at least take the question seriously. You should consider becoming in involved in AI Alignment to the extent that you think doing so will be the highest value strategy, accounting for opportunity costs. An estimate for this could be derived using the interplay between your answers to the following basic considerations:

• What are the most promising methods for pursuin
...

You were welcome to write an actual response, and I definitely would have read it. I was merely announcing my advanced intent to not respond in detail to any following comments, and explaining why in brief, conservative terms. This is seemingly strictly better- it gives you new information which you can use to decide whether or not you want to respond. If I was being intentionally mean, I would have allowed you to write a detailed comment and never responded, potentially wasting your time.

If your idea of rudeness is constructed in this (admittedly inconvenient) way, I apologize.

2Said Achmiz3y
I read this comment with interest, and with the intent of responding to your points—it seemed to me that there was much confusion to be resolved here, to the benefit of all. Then I got to your last line. It is severely rude to post a detailed fisking of an interlocutor’s post/comment, and to then walk away. If you wish to bow out of the discussion, that is your right, but it is both self-indulgent and disrespectful to first get in a last word (much less a last several hundred words). Strongly downvoted.

Why, exactly, is this our only job (or, indeed, our job at all)? Surely it’s possible to value present-day things, people, etc.?

The space that you can affect is your light cone, and your goals can be "simplified" to "applying your values over the space that you can affect", therefore your goal is to apply your values over your light cone. It's you're "only job".

There is, of course, a specific notion that I intended to evoke by using this rephrasing: the idea that your values apply strongly over humanity's vast future. It's possible to value present-day thi...

2Said Achmiz3y
Whether the future “matters more than today” is not a question of impersonal fact. Things, as you no doubt know, do not ‘matter’ intransitively; they matter to someone. So the question is, does “the future” (however construed) matter to me more than “today” (likewise, however construed) does? Does “the future” matter to my hypothetical friend Alice more than today does, or to her neighbor Bob? Etc. And any of these people are fully within their right to answer in the negative. Note that you’re making a non-trivial claim here. In past discussions, on Less Wrong and in adjacent spaces, it has been pointed out that our ability to predict future consequences of our actions drops off rapidly as our time horizon recedes into the distance. It is not obvious to me that I am in any particularly favorable position to affect the course of the distant future in any but the most general ways (such as contributing to, or helping to avert, human extinction—and even there, many actions I might feasibly take could plausibly affect the likelihood of my desired outcome in either the one direction or the other). I would need to (a) have different values than those I currently have, and (b) gain (implausibly, given my current understanding of the world) the ability to predict the future consequences of my actions with an accuracy vastly greater than that which is currently possible (for me or for anyone else). Sorry, no. There is a categorical difference between bringing a person into existence and affecting a person’s future life, contingent on them being brought into existence. It of course makes sense to speak of doing the latter sort of thing “for” the person-to-be, but such isn’t the case for the former sort of thing. To the contrary: your point hinges on this. You may of course discuss or not discuss what you like, but by avoiding this topic, you avoid one of the critical considerations in your whole edifice of reasoning. Your conclusion is unsupportable without committing to

This doesn't require faster than light signaling. If you and the copy are sent way with identical letters, that you open after crossing each other's event horizons. You learn want was packed with your clone when you open your letter. Which lets you predict what your clone will find.

Nothing here would require the event of your clone seeing the letter to affect you. You are affected by the initial set up.

Another example would be if you learn a star that has crossed your cosmic event horizon was 100 solar masses, it's fair to infer that it will become a black

...

I don't understand why you're calling a prior "inference". Priors come prior to inferences, that's the point.

SIA is not isomorphic to "Assign priors based on Kolmogorov Complexity". If what you mean by SIA is something more along the lines of "Constantly update on all computable hypotheses ranked by Kolmogorov Complexity", then our definitions have desynced.

Also, remember: you need to select your priors based on inferences in real life. You're a neural network that developed from scatted particles- your priors need to have actually entered into your brain ...

1ike3y

That's surprisingly close, but I don't think that counts. That page explains that the current dynamics behind phosphate recycling are bad as a result of phosphate being cheap- if phosphate was scarce, recycling (and potentially the location of new phosphate reserves, etc.) would become more economical.

My formulation of those assumptions, as I've said, is entirely a prior claim.

You can't gain non-local information using any method, regardless of the words or models you want to use to contain that information.

If you agree with those priors and Bayes, you get those assumptions.

You cannot reason as if you were selected randomly from the set of all possible observers. This allows you to infer information about what the set of all possible observers looks like, despite provably not having access to that information. There are practical impli...

1ike3y
>This allows you to infer information about what the set of all possible observers looks like I don't understand why you're calling a prior "inference". Priors come prior to inferences, that's the point. Anyway, there are arguments for particular universal priors, e.g. the Solomonoff universal prior. This is ultimately grounded in Occam's razor, and Occam can be justified on grounds of usefulness.  >This is a real world example that demonstrates the flaws with these methods of reasoning. The complexity is not unnecessary. It clearly is unnecessary - nothing in your examples requires there to be tiling, you should give an example with a single clone being produced, complete with the priors SIA gives as well as your theory, along with posteriors after Bayesian updating.  >SIA has additional physically incoherent implications I don't see any such implications. You need to simplify and more fully specify your model and example.

The version of the post I responded to said that all probes eventually turn on simulations.

The probes which run the simulations of you without the pop-up run exactly one. The simulation is run "on the probe."

Let me know when you have an SIA version, please.

I'm not going to write a new post for SIA specifically- I already demonstrated a generalized problem with these assumptions.

The up until now part of this is nonsense - priors come before time. Other than that, I see no reason to place such a limitation on priors, and if you formalize this I can pro

...
1ike3y
>The fact that you use some set of priors is a physical phenomenon. Sure, but irrelevant. My prior is exactly the same in all scenarios - I am chosen randomly from the set of observers according to the Solomonoff universal prior. I condition based on my experiences, updating this prior to a posterior, which is Solomonoff induction. This process reproduces all the predictions of SIA. No part of this process requires information that I can't physically get access to, except the part that requires actually computing Solomonoff as it's uncomputable. In practice, we approximate the result of Solomonoff as best we can, just like we can never actually put pure Bayesianism into effect.  Just claiming that you've disproven some theory with an unnecessarily complex example that's not targeted towards the theory in question and refusing to elaborate isn't going to convince many.  You should also stop talking as if your paradoxes prove anything. At best, they present a bullet that various anthropic theories need to bite, and which some people may find counter-intuitive. I don't find it counter-intuitive, but I might not be understanding the core of your theory yet.  >SIA is asserting more than events A, B, and C are equal prior probability. Like what?  I'm going to put together a simplified version of your scenario and model it out carefully with priors and posteriors to explain where you're going wrong.

If you reject both the SIA and SSA priors (in my example, SIA giving 1/3 to each of A, B, and C, and SSA giving 1/2 to A and 1/4 to B and C), then what prior do you give?

I reject these assumptions, not their priors. The actual assumptions and the methodology behind them have physically incoherent implications- the priors they assign may still be valid, especially in scenarios where it seems like there are exactly two reasonable priors, and they both choose one of them.

Whatever prior you give you will still end up updating as you learn information. There's

...
0ike3y
My formulation of those assumptions, as I've said, is entirely a prior claim.  If you agree with those priors and Bayes, you get those assumptions.  You can't say that you accept the prior, accept Bayes, but reject the assumption without explaining what part of the process you reject. I think you're just rejecting Bayes, but the unnecessary complexity of your example is complicating the analysis. Just do Sleeping Beauty with the copies in different light cones.  I'm asking for your prior in the specific scenario I gave.

Can you formulate this as a challenge to SIA in particular? You claim that it affects SIA, but your issue is with reference classes, and SIA doesn't care about your reference class.

The point is that SIA similarly overextends its reach- it claims to make predictions about phenomena that could not yet have had any effect on your brain's operation, for reasons demonstrated with SSA in the example in the post.

Your probability estimates can only be affected by a pretty narrow range of stuff, in practice, and because SIA does not deliberately draw the line...

1ike3y
If you reject both the SIA and SSA priors (in my example, SIA giving 1/3 to each of A, B, and C, and SSA giving 1/2 to A and 1/4 to B and C), then what prior do you give? Whatever prior you give you will still end up updating as you learn information. There's no way around that unless you reject Bayes or you assert a prior that places 0 probability on the clones, which seems sillier than any consequences you're drawing out here.
1ike3y
The version of the post I responded to said that all probes eventually turn on simulations. Let me know when you have an SIA version, please. The up until now part of this is nonsense - priors come before time. Other than that, I see no reason to place such a limitation on priors, and if you formalize this I can probably find a simple counterexample. What does it even mean for a prior to correspond to a phenomena? All SIA is doing is asserting events A, B, and C are equal prior probability. (A is living in universe 1 which has 1 observer, B and C are living in universe 2 with 2 observers and being the first and second observer respectively. B and C can be non-local.) If you knew for a fact that something couldn't have had an impact, this might be valid. But in your scenarios, these could have had an impact, yet didn't. It's a perfectly valid update. You should simplify to having exactly one clone created. In fact, I suspect you can state your "paradox" in terms of Sleeping Beauty - this seems similar to some arguments people give against SIA there, claiming one does not acquire new evidence upon waking. I think this is incorrect - one learns that one has woken in the SB scenario, which on SIA's priors leads one to update to the thirder position.

It's possible, but very improbable. We have vastly more probable concerns (misaligned AGI, etc.) than resource depletion sufficient to cripple the entire human project.

What critical resources is Humanity at serious risk of depleting? Remember that most resources have substitutes- food is food.

3CronoDAS3y
Phosphate rock? https://en.wikipedia.org/wiki/Peak_phosphorus
3TAG3y
The resources required to get off the planet and access other resources are huge .

Why do you seem to imply that burning fossil fuels would help at all the odds of the long term human project?

I don't imply that. For clarification:

I would waste any number of resources if that was what was best for the long-term prospects of Humanity. In practice, that means that I'm willing to sacrifice really really large amounts of resources that we won't be able to use until after we develop AGI or similar, in exchange for very very small increases to our probability of developing aligned AGI or similar.

Because I think we won't be able to u

...
3Emiya3y
Understood, I apologise for misunderstanding your position on fossils fuels. I feel there was a specific attempt from my side to interpret it with that meaning, even if the example used didn't necessarily implied it was something you endorse, and that it was due to a negative gut reaction I had while reading what you wrote.   We seem to agree on the general principles that humanity technological level will not stay the same for the next hundred years, and that some level of the changes we are producing on the environment are to be avoided to improve mankind future's condition.   I do feel that allowing the actions of humanity to destroy every part of the environment that hasn't been proved useful is an engagement in an extremely reckless form of optimism, though. It's certainly part of the attitude that got us to the point where being careful with our effect on current temperature levels and avoiding to loose most of our water resources has become a pretty difficult global challenge.  From what I read on industrial regulations so far, in most nations pollutants functionally have to be proven harmful before it can be considered forbidding their release in the environment, and I'm 100% sure it's at least the current approach in the country most users from this site are. All in all, our species is nowhere near the point to be immune from the feedbacks our environment can throw at us. By our actions, one third of current animal and vegetable species are currently going extinct.  That is one huge Chesterton Fence we're tearing down. We simply don't know in how many way such a change on the system we're living in can go wrong for us.   I'd agree that the greatest "currently existing risks to my survival" are natural causes. I intend this category as "risks that are actively killing people who are living in similar conditions to my own now". However, if we talk about the main "future risks to my survival", as in "risks that currently are killing a low number of
2jasoncrawford3y
Indeed, there is an active “degrowth” movement. cf. Giorgos Kallis: https://greattransition.org/publication/the-degrowth-alternative

I suspect that if people really understood the cost to future people of the contortions we go through to support this many simultaneous humans in this level of luxury, we'd have to admit that we don't actually care about them very much.  I sympathize with those who are saying "go back to the good old days" in terms of cutting the population back to a sustainable level (1850 was about 1.2B, and it's not clear even that was sparse/spartan enough to last more than a few millennia).

There's enough matter in our light cone to support each individual existin...

1Said Achmiz3y
Why, exactly, is this our only job (or, indeed, our job at all)? Surely it’s possible to value present-day things, people, etc.? Seeing as how future humanity (with capital letters or otherwise) does not, in fact, currently exist, it makes very little sense to say that ensuring their existence is something that we would be doing “for” them.
8TAG3y
It's entirely possible to burn through the resources on this planet without getting off this planet . That's a very dicey pinch point
4TAG3y
It's entirely possible to burn through the resources on this planet without getting off this planet . That's a very dicey pinch point
8Emiya3y
Why do you seem to imply that burning fossil fuels would help at all the odds of the long term human project?  Even ignoring the current deaths due to the large scale desertification that Climate Change is causing, it's putting our current society at a very real risk of collapse. Food and water supplies are at risk for the medium term, since we are losing hydrical reserves and cultivations are expected to suffer greatly for the abrupt change in temperature and the increased extreme meteorological events.  At the current rate of fishing, all fish species could be practically extinct by 2050, and for the same date the estimates ranging from 100 million to 1 billion climate refugees. Given how badly our societies reacted to numbers of refugees that weren't even close to that scale, I really don't want to see what will happen. Not to say that currently one species out of three of all animals and vegetal is going extinct and could be gone for the same date. That is a scale of damage to the ecosystem that could easily feedback into who knows what. We are causing the sixth mass extinction on our planet. I feel pretty confident some humans will survive and that technological progress could continue past that, eventually.  But I feel a lot more confident about humanity reaching the stars in an universe where we manage to not make scorched earth of our first planet before we have a way to do that, and I personally don't want to see my personal odds of survival diminishing because I'll have to deal with riots, food shortages, totalitarian fascist governments or... who know? A dying ecosystem is the kind of thing that could  rush us into botching nanotechnology while looking for a way to fix our mess.    Lastly, I really don't see how switching out of fossils would in any way harm our chances to develop as a species.  Every economical estimate I saw said that the costs would be a lot less than the economic damage from climate change alone, many estimates agree that it w

The existence of places like LessWrong, philosophy departments, etc, indicate that people do have some sort of goal to understand things in general, aside from any nitpicking about what is a true terminal value.

I agree- lots of people (including me, of course) are learning because they want to- not as part of some instrumental plan to achieve their other goals. I think this is significant evidence that we do terminally value learning. However, the way that I personally have the most fun learning is not the way that is best for cultivating a perfect underst...

E.g. "maybe you're in an asylum" assumes that it's possible for an asylum to "exist" and for someone to be in it, both of which are meaningless under my worldview.

What do you mean by "reality"? You keep using words that are meaningless under my worldview without bothering to define them.

You're implementing a feature into your model which doesn't change what it predicts but makes it less computationally efficient.

The fact you're saying "both of which are meaningless under my worldview" is damning evidence that your model (or at least your curren...

1ike3y
>The fact you're saying "both of which are meaningless under my worldview" is damning evidence that your model (or at least your current implementation of your model) sucks, because that message transmits useful information to someone using my model but apparently has no meaning in your model.  I don't think that message conveys useful information in the context of this argument, to anyone. I can model regular delusions just fine - what I can't model is a delusion that gives one an appearance of having experiences while no experiences were in fact had. Saying "delusion" doesn't clear up what you mean.  Saying "(True ^ True) = False" also doesn't convey information. I don't know what is meant by a world in which that holds, and I don't think you know either. Being able to say the words doesn't make it coherent.  You went to some severe edge cases here - not just simulation, but simulation that also somehow affects logical truths or creates a false appearance of experience. Those don't seem like powers even an omnipotent being would possess, so I'm skeptical that those are meaningful, even if I was wrong about verificationism in general.  For more ordinary delusions or simulations, I can interpret that language in terms of expected experiences.  >What does it mean for your model to be "true"? Nothing, and this is precisely my point. Verificationism is a criterion of meaning, not part of my model. The meaning of "verificationism is true" is just that all statements that verificationism says are incoherent are in fact incoherent.  >There are infinitely many unique models which will predict all evidence you will ever receive- I established this earlier and you never responded.  I didn't respond because I agree. All models are wrong, some models are useful. Use Solomonoff to weight various models to predict the future, without asserting that any of those models are "reality". Solomonoff doesn't even have a way to mark a model as "real", that's just completely out

This is false. I actually have no idea what it would mean for an experience to be a delusion - I don't think that's even a meaningful statement.

I'm comfortable with the Cartesian argument that allows me to know that I am experiencing things.

Everything you're thinking is compatible with a situation in which you're actually in a simulation hosted in some entirely alien reality (2 + 2 = 3, experience is meaningless, causes follow after effects, (True ^ True) = False, etc, which is being manipulated in extremely contrived ways which produce your exact current ...

1ike3y
>Everything you're thinking is compatible with a situation in which you're actually in a simulation hosted in some entirely alien reality (2 + 2 = 3, experience is meaningless, causes follow after effects, (True ^ True) = False, etc, which is being manipulated in extremely contrived ways which produce your exact current thought processes. I disagree, and see no reason to agree. You have not fully specified this situation, and have offered no argument for why this situation is coherent. Being as this is obviously self-contradictory (at least the part about logic), why should I accept this?  >If you have an argument against this problem, I am especially interested in hearing it The problem is that you're assuming that verificationism is false in arguing against it, which is impermissible. E.g. "maybe you're in an asylum" assumes that it's possible for an asylum to "exist" and for someone to be in it, both of which are meaningless under my worldview.  Same for any other way to cash out "it's all a delusion" - you need to stipulate unverifiable entities in order to even define delusion.  Now, this is distinct from the question of whether I should have 100% credence in claims such as 2+2=4 or "I am currently having an experience". I can have uncertainty as to such claims without allowing for them to be meaningfully false. I'm not 100% certain that verificationism is valid.  >It seems like the fact you can't tell between this situation and reality What do you mean by "reality"? You keep using words that are meaningless under my worldview without bothering to define them.  >The real question of importance is, does operating on a framework which takes specific regular notice of the idea that naïve realism is technically a floating belief increase your productivity in the real world? This isn't relevant to the truth of verificationism, though. My argument against realism is that it's not even coherent. If it makes your model prettier, go ahead and use it. You'll jus

Refer to my disclaimer for the validity of the idea of humans having terminal values. In the context of human values, I think of "terminal values" as the ones directly formed by evolution and hardwired into our brains, and thus broadly shared. The apparent exceptions are rarish and highly associated with childhood neglect and brain damage.

"Broadly shared" is not a significant additional constraint on what I mean by "terminal value", it's a passing acknowledgement of the rare counterexamples.

If that's your argument then we somewhat agree. I'm saying that th...

1TAG3y
The existence of places like LessWrong, philosophy departments, etc, indicate that people do have some sort of goal to understand things in general, aside from any nitpicking about what is a true terminal value. Well, if my goal is the truth, I am going to want the model that corresponds the best, not the model that predicts most efficiently . I've already stated than I am not talking about confirming specific models .

It's also true for "I terminally value understanding the world, whatever the correct model is".

I said e.g, not i.e, and "I terminally value understanding the world, whatever the correct model is" is also a case of trivial values.

First, a disclaimer: It's unclear how well the idea of terminal/instrumental values maps to human values. Humans seem pretty prone to value drift- whenever we decide we like some idea and implement it, we're not exactly "discovering" some new strategy and then instrumentally implementing it. We're more incorporating the new s...

1TAG3y
I never claimed it was a broadly shared terminal value. My argument is that you can't make a one-size-fits-all recommendation of realism or anti realism, because individual values vary.
1Simon Kohuch3y
Looks like an issue of utility vs truth to me. Time to get deontological :) (joke)

This is only true for trivial values, e.g. "I terminally value having this specific world model".

For most utility schemes (Including, critically, that of humans), the supermajority of the purpose of models and beliefs is instrumental. For example, making better predictions, using less computing power, etc.

In fact, humans who do not recognize this fact and stick to beliefs or models because they like them are profoundly irrational. If the sky is blue, I wish to believe the sky is blue, and so on. So, assuming that only prediction is valuable is not question...

1TAG3y
It's also true for "I terminally value understanding the world, whatever the correct model is".

It's a well known tragedy that (unless Humanity gains a perspective on reality far surpassing my wildest expectations) there are arbitrarily many nontrivially unique theories which correspond to any finite set of observations.

The practical consequence of this (A small leap, but valid) is that we can remove any idea you have and make exactly the same predictions about sensory experiences by reformulating our model. Yes, any idea. Models are not even slightly unique- the idea of anything "really existing" is "unnecessary", but literally every belief is "unne...

1ike3y
This is false. I actually have no idea what it would mean for an experience to be a delusion - I don't think that's even a meaningful statement. I'm comfortable with the Cartesian argument that allows me to know that I am experiencing things. On the contrary, it's the naive realist model that doesn't pay rent by not making any predictions at all different from my simpler model. I don't really care if one includes realist claims in their model. It's basically inert. It just makes the model more complicated for no gain.
1TAG3y
You can't define things as (un)necessary without knowing what you value or what goal you are trying to achieve. Assuming that only prediction is valuable is pretty question begging.

Right, that isn't an exhaustive list. I included the candidates which seemed most likely.

So, I think superintelligence is unlikely in general- but so is current civilization. I think superintelligences have a high occurrence rate given current civilization (for lots of reasons), which also means that current civilization isn't that much more likely than superintelligence. It's more justified to say "Superintelligences which make human minds" have a super low occurrence rate relative to natural examples of me and my environment, but that still seems to be a...

(2020 - 10 - 03) EDIT: I have found the solution: the way I was thinking about identity turns out to be silly.

In general, if you update your probability estimates of non-local phenomenon based on anthropic arguments, you're (probably? I'm sure someone has come up with smart counterexamples) doing something that includes the sneaky implication that you're conducting FTL communication. I consider this to be a reductio ad absurdum on the whole idea of updating your probability estimates of non-local phenomena based on anthropic arguments, regardless of the va...

1avturchin3y
Future superintelligences could steal minds to cure "past sufferings" and to prevent s-risks, and to resurrect all the dead. These is actually a good thing, but for the resurrection of the dead they have to run the whole world simulation once again for last few thousands years. In that case it will look almost like normal world.
2TAG3y
You didn't list "superintelligence is unlikely" among the list of possible explanations.

Amusing anecdote: I once tried to give my mother intuition behind Monte Hall with a process similar to this. She didn't quite get it, so I played the game with her a few times. Unfortunately, she won more often when she stayed than when she switched (n ~= 10), and decided that I was misremembering. A lesson was learned, but not by the person I had intended.

Scientific and industrial progress is an essential part of modern life. The opening of a new extremely long suspension bridge would be entirely unsurprising- If it was twice the length of the previous longest, I might bother to read a short article about it. I would assume there would be some local celebration (Though not too much- if it was too well received, why did we not do it before?), but it would not be a turning point in technology or a grand symbol of man's triumph over nature. We've been building huge awe inspiring structures for quite some time ...

So it definitely seems plausible for a reward to be flipped without resulting in the system failing/neglecting to adopt new strategies/doing something weird, etc.

I didn't mean to imply that a signflipped AGI would not instrumentally explore.

I'm saying that, well... modern machine learning systems often get specific bonus utility for exploring, because it's hard to explore the proper amount as an instrumental goal due to the difficulties of fully modelling the situation, and because systems which don't have this bonus will often get stuck in local maximu...

2Anirandis3y
I see what you're saying here, but the GPT-2 incident seems to downplay it somewhat IMO. I'll wait until you're able to write down your thoughts on this at length; this is something that I'd like to see elaborated on (as well as everything else regarding hyperexistential risk.)

Interesting analogy. I can see what you're saying, and I guess it depends on what specifically gets flipped. I'm unsure about the second example; something like exploring new strategies doesn't seem like something an AGI would terminally value. It's instrumental to optimising the reward function/model, but I can't see it getting flipped with the reward function/model.

Sorry, I meant instrumentally value. Typo. Modern machine learning systems often require a specific incentive in order to explore new strategies and escape local maximums. We may see this b...

3Anirandis3y
My thinking was that an AI system that *only* takes values between 0 and + ∞ (or some arbitrary positive number) would identify that killing humans would result in 0 human value, which is its minimum utility. How come? It doesn't seem *too* hard to create an AI that only expends a small amount of its energy on preventing the garbage thing from happening. Please do! I'd love to see a longer discussion on this type of thing. EDIT: just thought some more about this and want to clear something up: I'm a little unsure on this one after further reflection. When this happened with GPT-2, the bug managed to flip the reward & the system still pursued instrumental goals like exploring new strategies: So it definitely seems *plausible* for a reward to be flipped without resulting in the system failing/neglecting to adopt new strategies/doing something weird, etc.

I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be no human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there.

It's hard to talk in specifics because my knowledge on the details of what future AGI architecture might look like is, of course, extremely limited.

As an almost entirely inapplicabl...

4Anirandis3y
Interesting analogy. I can see what you're saying, and I guess it depends on what specifically gets flipped. I'm unsure about the second example; something like exploring new strategies doesn't seem like something an AGI would terminally value. It's instrumental to optimising the reward function/model, but I can't see it getting flipped *with* the reward function/model. My thinking was that a signflipped AGI designed as a positive utilitarian (i.e. with a minimum at 0 human utility) would prefer paperclipping to torture because the former provides 0 human utility (as there aren't any humans), whereas the latter may produce a negligible amount. I'm not really sure if it makes sense tbh. Even if we engineered it carefully, that doesn't rule out screw-ups. We need robust failsafe measures *just in case*, imo. I wonder if you could feasibly make it a part of the reward model. Perhaps you could train the reward model itself to disvalue something arbitrary (like paperclips) even more than torture, which would hopefully mitigate it. You'd still need to balance it in a way such that the system won't spend all of its resources preventing this thing from happening at the neglect of actual human values, but that doesn't seem too difficult. Although, once again, we can't really have high confidence (>90%) that the AGI developers are going to think to implement something like this. There was also an interesting idea I found in a Facebook post about this type of thing that got linked somewhere (can't remember where). Stuart Armstrong suggested that a utility function could be designed as such: Even if we solve any issues with these (and actually bother to implement them), there's still the risk of an error like this happening in a localised part of the reward function such that *only* the part specifying something bad gets flipped, although I'm a little confused about this one. It could very well be the case that the system's complex enough that there isn't just one bit indi

You can't really be accidentally slightly wrong. We're not going to develop Mostly Friendly AI, which is Friendly AI but with the slight caveat that it has a slightly higher value on the welfare of shrimp than desired, with no other negative consequences. The molecular sorts of precision needed to get anywhere near the zone of loosely trying to maximize or minimize for anything resembling human values will probably only follow from a method that is converging towards the exact spot we want it to be at, such as some clever flawless version of reward modelli...

4Anirandis3y
Thanks for the detailed response. A bit of nitpicking (from someone who doesn't really know what they're talking about): I'm slightly confused by this one. If we were to design the AI as a strict positive utilitarian (or something similar), I could see how the worst possible thing to happen to it would be *no* human utility (i.e. paperclips). But most attempts at an aligned AI would have a minimum at "I have no mouth, and I must scream". So any sign-flipping error would be expected to land there. In the example, the AGI was using online machine learning, which, as I understand it, would probably require the system to be hooked up to a database that humans have access to in order for it to learn properly. And I'm unsure as to how easy it'd be for things like checksums to pick up an issue like this (a boolean flag getting flipped) in a database. Perhaps there'll be a reward function/model intentionally designed to disvalue some arbitrary "surrogate" thing in an attempt to separate it from hyperexistential risk. So "pessimizing the target metric" would look more like paperclipping than torture. But I'm unsure as to (1) whether the AGI's developers would actually bother to implement it, and (2) whether it'd actually work in this sort of scenario. Also worth noting is that an AGI based on reward modelling is going to have to be linked to another neural network, which is going to have constant input from humans. If that reward model isn't designed to be separated in design space from AM, someone could screw up with the model somehow. If we were to, say, have U = V + W (where V is the reward given by the reward model and W is some arbitrary thing that the AGI disvalues, as is the case in Eliezer's Arbital post that I linked,) a sign flip-type error in V (rather than a sign flip in U) would lead to a hyperexistential catastrophe. I think this is somewhat likely to be the case, but I'm not sure that I'm confident enough about it. Flipping the direction of updates to the

If you're having significant anxiety from imagining some horrific I-have-no-mouth-and-I-must-scream scenario, I recommend that you multiply that dread by a very, very small number, so as to incorporate the low probability of such a scenario. You're privileging this supposedly very low probability specific outcome over the rather horrifically wide selection of ways AGI could be a cosmic disaster.

This is, of course, not intended to dismay you from pursuing solutions to such a disaster.

6Anirandis4y
I don't really know what the probability is. It seems somewhat low, but I'm not confident that it's *that* low. I wrote a shortform about it last night (tl;dr it seems like this type of error could occur in a disjunction of ways and we need a good way of separating the AI in design space.) I think I'd stop worrying about it if I were convinced that its probability is extremely low. But I'm not yet convinced of that. Something like the example Gwern provided elsewhere in this thread seems more worrying than the more frequently discussed cosmic ray scenarios to me.

In this specific example, the error becomes clear very early on in the training process. The standard control problem issues with advanced AI systems don't apply in that situation.

As for the arms race example, building an AI system of that sophistication to fight in your conflict is like building a Dyson Sphere to power your refrigerator. Friendly AI isn't the sort of thing major factions are going to want to fight with each other over. If there's an arm's race, either something delightfully improbable and horrible has happened, or it's an extremely lopsid...

1Anirandis4y
Can we be sure that we'd pick it up during the training process, though? And would it be possible for it to happen after the training process?