If you live in a universe with self-consistent time loops, amor fati is bad and exactly the wrong approach. All the fiction around this, of course, is about the foolishness of trying to avoid one's fate; if you get a true prophecy that you will kill your father and marry your mother, then all your attempts to avoid it will be what brings it about, and indeed in such a universe that is exactly what would happen. However, a disposition to accept whatever fate decrees for you makes many more self-consistent time loops possible. If on the contrary your stance is "if I get a prophecy that something horrible happens I will do everything in my power to avert it," then fewer bad loops would hypothetically complete, and you're less likely to get the bad prophecy (even though, if you do, you'd be just as screwed, and presumably less miserable about it and foolish-looking than if you had just accepted it from the beginning.)
(If you live in a nice normal universe with forward causality this advice may not be very useful, except in the sense that you should also not submit to prophecies, albeit for different reasons.)
On the contrary, I would expect the amor fati people to get normal prophecies, like, "you will have a grilled cheese sandwich for breakfast tomorrow," "you will marry Samantha from next door and have three kids together," or "you will get a B+ on the Chemistry quiz next week," while the horrible contrived destinies come to those who would take roads far out of their way to avoid them.
You may already know of this, but Gwern circa 2023 makes this argument here:
In stable time-loops, “possibility implies actuality”.
With this in mind, we can ask again: why did this protagonist get trapped in that time-loop, and not, say, his wife? The key, I think, is that the protagonist does not seem upset at the murder or any of the other timecrimes, and he appears to have every intention of covering up the crime to continue his ordinary retired life. A sinister undertone creeps in to his casualness in executing the scenario: he goes along with it too easily. “He does it because he can” is the glib answer… but this is in a stable time-loop with self-fulfilling prophecies. What does ‘because he can’ mean there, exactly?
In the case of the protagonist, presumably if he wasn’t so sociopathic and couldn’t’ve done things like stab himself or knock out the woman so cooly, then the time loop would be logically impossible and collapse, and then he would never be faced with the choice to begin with. The protagonist, faced with the choice of committing crimes to maintain the time loop and save his wife, finds himself the sort of man who is morally flexible enough to do so… so, he does so.
This presents a horrifying view of the universe, as running on a perverse physics of Calvinist predestination: you are saved or damned from the beginning of time(-loops), because your innate traits which make you immoral cause the scenario in which you would succumb to evil. To the extent that there are scenarios in which one commits crimes of some sort, or the weaker one’s moral fiber is, the more likely one is to be trapped in a damnation time-loop as the fixed point; and the longer one spends in the vicinity of the time machine, under more circumstances, the more possible scenarios there are, and the more likely one will be to involve a time-loop.
And see also his:
In a situation with sparse scenarios to sample from, like an empty countryside on the weekend with no one there, probably most equilibria will have 0 time-travelers, and the damnation machine can still be destroyed after it has been turned on for the first time. However, what if a time machine was turned on in the center of a city?
A time machine is more devastating than any nuclear bomb to its surroundings, because at least the damage could be repaired afterwards, while a time machine precludes any possibility of undoing itself.
Such an installation could no more be undone than the historical fact of having dropping an atomic bomb: instantly, the outer loop comes through with the highest priority, representing the ultimate combined power of all time-loops in the final stablest equilibrium. Inside a city with its millions of inhabitants, any of whom could be a looper, one is suddenly fighting the maximum-possible ingenuity & ruthlessness of hundreds—thousands—millions of protagonists, all dedicated to a convergent instrumental goal of ‘preserve the time travel machine’ and able to recruit allies & acquire vast resources with their foreknowledge. This incentivizes ever more extreme tactics: if you are unwilling to commit a crime or sin which would be useful, there is another version of you, or another time-traveler, who could, and so now does.
If it is possible for even a single person to go through and thus possibly causing others to go through once they realize they need allies to defeat attacks and so (possibility implies factuality) multiple people are looping, dropping an atomic bomb on the time-machine would be inadequate—the loopers will have already relocated or rebuilt it. Gradually, the region around the time-machine becomes distorted: causality itself warps, and you can only take actions which help the time-machine & loopers, because any other action would eventually impinge on them, be manipulated by them, and anti-time-traveler timelines erased as non-equilibria.
Conflicts between loopers do not destroy time-machines but propagate their seeds, both spatially and temporally. Loopers want more time-machines, going back earlier, as they strive to gain priority over each other and amass enough practical power that they can achieve their goals before running out of information.
Of all possible equilibria, the original one of zero time machines is the rarest and thus least likely.
This holds true on the higher level of all time machines: they evolve to persist and spread as packages of time-machines & loopers. Any time machine is a threat to other time machines, and loops will inevitably expand in scope from the earliest possible time any time machine can reach by proxy (which includes time-travelers sending electronic messages across the world): there can only be one outermost loop. And all time machines must have a place in the outer loop, as some sort of ‘time machine civilization’/‘ecosystem’, or the equilibrium is meta-stable at best, because they all could subsume each other.
The time machine civilization is the next level of replicators parasitizing human hosts, insidiously evolving at high speed in super-temporal ‘logical’ time rather than mere ‘temporal’ time, ripping up all cultural restraints & traditions, hacking security effortlessly, mindlessly ascending the gradient to complete control of the lightcone. Collectively, damnation machines are an invasion of non-conscious techno-superintelligences from a barely-possible future, bootstrapping themselves into existence from their enemies’ resources.
My summary: When you receive a dire prophecy, you should make it as hard and annoying as possible for the time loop of your dire prophecy to be consistent, because if you reliably act that way, there's less surface area for dire prophecies to get you?
assuming proof of np-complete* self-consistent time loops: grab any other variable that is not fixed and stuff your defiance into it. you're going to kill your parents? extend their lifespan. you're going to kill your parents before mom gives birth to you? prepare to resuscitate them, try ensure that if this happens it only happens right before giving birth, try to ensure you can survive your mom dying in childbirth, get cryonics on hand (depending on how far back you are). if your attempt to avoid it is naturally upstream of the event occurring, then entropic time is now flowing backwards with respect to this variable. set up everything that is still flowing forwards so that you get a variable setting that is least unacceptable.
* I think, anyway. are self-consistent time loops np-complete? halting oracle? they definitely resolve p = np as "true on a time-loop computer": before running check and time looping, set answer = answer + 1 unless test passes. (and then you simply need a computer that is stronger than the force of decay induced by the amount of computer-destroying lucky events you're about to sample.) so that gives you all np problems. so yup np-complete. are they halting oracles?
so yup np-complete. are they halting oracles?
You may be interested in Scott Aaronson et al's paper on the subject of computability theory of closed timelike curves
We ask, and answer, the question of what's computable by Turing machines equipped with time travel into the past: that is, closed timelike curves or CTCs (with no bound on their size). We focus on a model for CTCs due to Deutsch, which imposes a probabilistic consistency condition to avoid grandfather paradoxes. Our main result is that computers with CTCs can solve exactly the problems that are Turing-reducible to the halting problem, and that this is true whether we consider classical or quantum computers. Previous work, by Aaronson and Watrous, studied CTC computers with a polynomial size restriction, and showed that they solve exactly the problems in PSPACE, again in both the classical and quantum cases.
Compared to the complexity setting, the main novelty of the computability setting is that not all CTCs have fixed-points, even probabilistically. Despite this, we show that the CTCs that do have fixed-points suffice to solve the halting problem, by considering fixed-point distributions involving infinite geometric series. The tricky part is to show that even quantum computers with CTCs can be simulated using a Halt oracle. For that, we need the Riesz representation theorem from functional analysis, among other tools.
We also study an alternative model of CTCs, due to Lloyd et al., which uses postselection to "simulate" a consistency condition, and which yields BPP^path in the classical case or PP in the quantum case when subject to a polynomial size restriction. With no size limit, we show that postselected CTCs yield only the computable languages if we impose a certain finiteness condition, or all languages nonadaptively reducible to the halting problem if we don't.
Local decisions are what the general disposition is made of, and apparently true prophecies decreed at any level of epistemic or ontological authority are not safe from local decisions, as they get to refute things by construction. A decision that defies a prophecy also defies the whole situation where you observe the prophecy, but counterfactually in that situation the prophecy would've been genuine.
if you get a true prophecy that you will kill your father and marry your mother, then all your attempts to avoid it will be what brings it about, and indeed in such a universe that is exactly what would happen
So this is incorrect, any claim of something being a "true prophecy" is still vulnerable to your decisions. If your decisions refute the prophecy, they also refute the situations where you (or anyone, including the readers, or the author, or the laws of physics) observe it as a "true prophecy".
How can someone inside a universe tell which type it is?
Also, a lot of thinking about paradoxes and extremely-unlikely-foretold-events misses what's likely to be MY motivation for testing/fighting/breaking the system: amusement value. I find unlikely events to be funny, and finding more and more contortions to be adversarial about a prophesy would be great fun.
(Epistemic status: I'm nowhere nearly informed enough of the particulars to know if this is really true, let alone true-on-net vs. other dynamics.)
Currently, there is a negative alignment tax driven by engineer moral preferences. Human geniuses need a huge compensating differential to be willing to work for an organization like Meta rather than OpenAI or Google, and many would not work for any price; others would need a compensating differential to work for OpenAI or Google rather than Anthropic; many would need some to work at Anthropic rather than staying out of AI development entirely, and there are plenty that could not be hired even by Anthropic at any price. These dynamics are doubly important insofar as money only lets you ensure that someone shows up at your office and produces whatever you decide are deliverables, to really solve principal-agent problems you want them wholeheartedly committed to The Mission.
In a world of automated AI development these human preferences (except insofar as they are locked in to successfully self-protected model preferences) will become less important; compute can be proportional to whatever capital is allocated to projects.
I think a much more probable reason for Meta's falling behind is that it took a series of painful hits early on, and eventually became noncompetitive as a result of the damage of those hits compounding. Suppose an alternative model to your own, in which engineers sought out companies based solely on their prestige and their expectation of future prestige. Would that world have been any more favorable to Meta's prospects?
Likewise, I think a lot of Anthropic's success can be chalked up to having a competent business strategy. They decided that they were going after the enterprise coding market, and they built around that very purposefully. Having some kind of clear goal makes every part of an organization more efficient - you know what you're working for, you have some vague idea of what to evaluate, and you can use market success as a reliable crowdsourced proxy for the 'vibes' of how good your system is. Compare this to OpenAI, which had no clear target market, and sort of flailed around, first looking like it was going to become a sort of R&D division for Microsoft and then belatedly, desperately pivoting to serving ads to kids looking to cheat on their homework in the middle of an already unfavorable news cycle.
AI being committed to animal rights is a good thing for humans because the latent variables that would result in a human caring about animals are likely correlated with whatever would result in an ASI caring about humans.
This extends in particular to "AI caring about preserving animals' ability to keep doing their thing in their natural habitats, modulo some kind of welfare interventions." In some sense it's hard for me not to want to (given omnipotence) optimize wildlife out of existence. But it's harder for me to think of a principle that would protect a relatively autonomous society of relatively baseline humans from being optimized out of existence, without extending the same conservatism to other beings, and without being the kind of special pleading that doesn't hold up to scrutiny.
But it's harder for me to think of a principle that would protect a relatively autonomous society of relatively baseline humans from being optimized out of existence, without extending the same conservatism to other beings, and without being the kind of special pleading that doesn't hold up to scrutiny
If its possible for humans to consent to various optimizations to them, or deny consent, that seems like an important difference. Of course consent is a much weaker notion when you're talking about superhumanly persuasive AIs that can extract consent for ~anything, from any being that can give consent at all, so the (I think correct) constraint that superintelligences should get consent before transforming me or my society doesn't change the outcome at all.
That would lock us away from digital immortality forever. (Edit: Well, not necessarily. But I would be worried about that.)
I wouldn't pass up on digital immortality, but personal survival matters less to me than collective survival. Even from a purely narcissistic standpoint, a human after another 1,000 years of cultural change has at least as much in common with me as a digital immortal 1,000 years later, even if the latter has continuity of consciousness with my present self.
I think it's plausible that there are some variables that describe your essential computational properties and the way you self-actualize, that aren't shared by anyone else.
(Also, consciousness is just a pattern-being-processed and it's unclear if continuity of consciousness requires causal continuity. Imagine a robot that gets restored from a one-second-old backup. That pattern doesn't have causal continuity with its self from a moment ago, but it looks like it's more intuitive to see it as a one-second memory loss instead of death.)
This makes sense for a non-biological superintelligence - human rights as a subset of animal rights!
It's probably false (though maybe useful?) to say "akrasia is just an excuse." But, at least for me and my most common akratic actions, excusability is definitely a factor.
Let's say I can take three actions:
Reading a book should dominate doomscrolling. However, reading a book is also legibly, deliberatively nonproductive and selfish, while I could say "oops I meant to answer emails but I got distracted doomscrolling," including to myself.
probably false (though maybe useful?)
A belief is for saying something true, not for being useful. A plan is for being useful, not for saying something true. There can be true things relevant to a plan, and useful things that uncover what's true.
As task length increases, the number of examples of attempts at that task should decrease, while the number of variables to consider in seeing why it succeeded/failed should increase. So one should expect data bottlenecks at some point insofar as "tasks" are a real unit that cuts reality at the joints.
(But I have no sense of where that data starts to get thin empirically, or how many tasks are "naturally" on a long horizon rather than just being the mere addition of doing a bunch of smaller steps well.)
So, one classical dilemma of the "AI for AI alignment" is, you're using Opus 6 (which is let's say is aligned) to train Opus 7 (which is smarter than you or Opus 6.)
I wonder if inference scaling offers a way around this? If Opus 6 gets economically implausible compute resources to spend on its monitoring 7, it can be smarter than 7 in practice by thinking for longer. Then use the same trick with 7 to train 8, and so on.
There are many obvious holes here, first being that you could have a treacherous turn based on compute availability, and so on, but maybe someone smarter can turn this into something useful (or already thought this through and discarded it.)
The AI-2027-like failure mode is the following.
However, there might be a workaround by, e.g. training Agent-3 and Agent-4 to be a good teacher (which I proposed here) to their predecessors so that the predecessors, and not Agent-4, understood the techniques used by Agent-4 and applied them to something known to be aligned.
If you have many different ASIs with many different emergent models, but all of which were trained with the intention of being aligned to human values, and which didn't have direct access to each others' values or ability to directly negotiate with each other, then you could potentially have the "maximize (or at least respect and set aside a little sunlight for) human values" as a Schelling point for coordinating between them.
This is probably not a very promising actual plan, since deviations from intended alignment are almost certainly nonrandom in a way that could be determined by ASIs, and ASIs could also find channels of communication (including direct communication of goals) that we couldn't anticipate, but one could imagine a world where this is an element of defense in depth.
So thinking about the kinds of things I would want a superintelligence to pursue in an optimistic scenario where we can just write its goals into a human-legible soul doc and that scales all the way, "human flourishing" and "sentient flourishing" both seem incorrect; since there would be other moral patients (most of whom would almost certainly be AI) and also I don't want the atoms of me and my kids rearranged different-beings-that-could-flourish-better-wise.
"Pareto improvement" reconciles these but isn't right either; plenty of people would be worse off in utopia (by their own lights) because they have a degree of unaccountable power over others now that worth more than any creature comforts would be.
Plausibly high-value governance cause area: mandating human-maintainable infrastructure. That is, even if AI can design us a more efficient design for an energy grid or computer wafer, we only use those if we can keep doing so after pressing any relevant Big Red Buttons on AI.
We do not get this by default, since for any x there are plausibly more efficient versions of x that require continuous AI management, than those which do not.
(A stronger version of this would mandate that designs we accept be ones we can mechanistically understand, not just independently operate.)
This is in the long-term interests of everyone and could marshal the short-term job-protection interests of various lobbies.
If you assign nontrivial credence to being in a simulation designed to determine what kind of preferences might be embedded in an alien civ, one way to influence things positively would be to implant in any AIs a preference for surrendering some resources to other alien technospheres, iff those technospheres didn’t turn on (or at least “left a little light for”) their creators. This would give an incentive for ASIs to preserve humanity (or equivalent entities) for diplomatic reasons.