Thane Ruthenis

Wiki Contributions

Comments

Sorted by

But yeah, personally, I think this is all a result of a kind of precious view about experiential continuity that I don't share

Yeah, I don't know that this glyphisation process would give us what we actually want.

"Consciousness" is a confused term. Taking on a more executable angle, we presumably value some specific kinds of systems/algorithms corresponding to conscious human minds. We especially value various additional features of these algorithms, such as specific personality traits, memories, et cetera. A system that has the features of a specific human being would presumably be valued extremely highly by that same human being. A system that has fewer of those features would be valued increasingly less (in lockstep with how unlike "you" it becomes), until it's only as valuable as e. g. a randomly chosen human/sentient being.

So if you need to mold yourself into a shape where some or all of the features which you use to define yourself are absent, each loss is still a loss, even if it happens continuously/gradually.

So from a global perspective, it's not much different than acausal aliens resurrecting Schelling-point Glyph Beings without you having warped yourself into a Glyph Being over time. If you value systems that are like Glyph Beings, their creation somewhere in another universe is still positive by your values. If you don't, if you only value human-like systems, then someone creating Glyph Being bring no joy. Whether you or your friends warped yourself into a Glyph Being in the process doesn't matter.

A dog will change the weather dramatically, which will substantially effect your perceptions.

In this case, it's about alt-complexity again. Sure, a dog causes a specific weather-pattern change. But could this specific weather-pattern change have been caused only by this specific dog? Perhaps if we edit the universe to erase this dog, but add a cat and a bird five kilometers away, the chaotic weather dynamic would play out the same way? Then, from your perceptions' perspective, you wouldn't be able to distinguish between a dog timeline and a cat-and-bird timeline.

In some sense, this is common-sensical. The mapping from reality's low-level state to your perceptions is non-injective: the low-level state contains more information than you perceive on a moment-to-moment basis. Therefore, for any observation-state, there are several low-level states consistent with it. Scaling up: for any observed lifetime, there are several low-level histories consistent with it.

Sure. This setup couldn't really be exploited for optimizing the universe. If we assume that the self-selection assumption is a reasonable assumption to make, inducing amnesia doesn't actually improve outcomes across possible worlds. One out of 100 prisoners still dies. 

It can't even be considered "re-rolling the dice" on whether the specific prisoner that you are dies. Under the SSA, there's no such thing as a "specific prisoner", "you" are implemented as all 100 prisoners simultaneously, and so regardless of whether you choose to erase your memory or not, 1/100 of your measure is still destroyed. Without SSA, on the other hand, if we consider each prisoner's perspective to be distinct, erasing memory indeed does nothing: it doesn't return your perspective to the common pool of prisoner-perspectives, so if "you" were going to get shot, "you" are still going to get shot.

I'm not super interested in that part, though. What I'm interested in is whether there are in fact 100 clones of me: whether, under the SSA, "microscopically different" prisoners could be meaningfully considered a single "high-level" prisoner.

Agreed. I think a type of "stop AGI research" argument that's under-deployed is that there's no process or actor in the world that society would trust with unilateral godlike power. At large, people don't trust their own governments, don't trust foreign governments, don't trust international organizations, and don't trust corporations or their CEOs. Therefore, preventing anyone from building ASI anywhere is the only thing we can all agree on.

I expect this would be much more effective messaging with some demographics, compared to even very down-to-earth arguments about loss of control. For one, it doesn't need to dismiss the very legitimate fear that the AGI would be aligned to values that a given person would consider monstrous. (Unlike "stop thinking about it, we can't align it to any values!".)

And it is, of course, true.

That's probably not what Page meant. On consideration, he would probably have clarified that AI that includes what we value about humanity would be a worthy successor. He probably wasn't even clear on his own philosophy at the time.

I don't see reasons to be so confident in this optimism. If I recall correctly, Robin Hanson explicitly believes that putting any constraints on future forms of life, including on its values, is undesirable/bad/regressive, even though lack of such constraints would eventually lead to a future with no trace of humanity left. Similar for Beef Jezos and other hardcore e/acc: they believe that a worthy future involves making a number go up, a number that corresponds to some abstract quantity like "entropy" or "complexity of life" or something, and that if making it go up involves humanity going extinct, too bad for humanity.

Which is to say: there are existence proofs that people with such beliefs can exist, and can retain these beliefs across many years and in the face of what's currently happening.

I can readily believe that Larry Page is also like this.

Also this:

From Altman: [...] Admitted that he lost a lot of trust with Greg and Ilya through this process. Felt their messaging was inconsistent and felt childish at times. [...] Sam was bothered by how much Greg and Ilya keep the whole team in the loop with happenings as the process unfolded. Felt like it distracted the team.

Apparently airing such concerns is "childish" and should only be done behind closed doors, otherwise it "distracts the team", hm.

Perhaps if you did have the full solution, but it feels like that there are some things of a solution that you could figure out, such that that part of the solution doesn't tell you as much about the other parts of the solution.

I agree with that.

I'd think you can define a tedrahedron for non-euclidean space

If you relax the definition of a tetrahedron to cover figures embedded in non-Euclidean spaces, sure. It wouldn't be the exact same concept, however. In a similar way to how "a number" is different if you define it as a natural number vs. real number.

Perhaps more intuitively, then: the notion of a geometric figure with specific properties is dependent on the notion of a space in which it is embedded. (You can relax it further – e. g., arguably, you can define a "tetrahedron" for any set with a distance function over it – but the general point stands, I think.)

Just consider if you take the assumption that the system would not change in arbitrary ways in response to it's environment. There might be certain constrains. You can think about what the constrains need to be such that e.g. a self modifying agent would never change itself such that it would expect that in the future it would get less utility than if it would not selfmodify.

Yes, but: those constraints are precisely the principles you'd need to code into your AI to give it general-intelligence capabilities. If your notion of alignment only needs to be robust to certain classes of changes, because you've figured out that an efficient generally intelligent system would only change in such-and-such ways, then you've figured out a property of how generally intelligent systems ought to work – and therefore, something about how to implement one.

Speaking abstractly, the "negative image" of the theory of alignment is precisely the theory of generally intelligent embedded agents. A robust alignment scheme would likely be trivial to transform into an AGI recipe.

I am pretty sure you can figure out alignment in advance as you suggest

I'm not so sure about that. How do you figure out how to robustly keep a generally intelligent dynamically updating system on-target without having a solid model of how that system is going to change in response to its environment? Which, in turn, would require a model of what that system is?

I expect the formal definition of "alignment" to be directly dependent on the formal framework of intelligence and embedded agency, the same way a tetrahedron could only be formally defined within the context of Euclidean space.

Exploit your natural motivations

There's a relevant concept that I keep meaning to write about, which I could summarize as: create gradients towards your long-term aspirations.

Humans are general intelligences, and one of the core properties of general intelligence is not being a greedy-optimization algorithm:

  • We can pursue long-term goals even when each individual step towards them is not pleasurable-in-itself (such as suffering through university to get a degree in a field jobs in which require it).
  • We can force ourselves out of local maxima (such as quitting a job you hate and changing careers, even though it'd mean a period of life filled with uncertainty and anxieties).
  • We can build world-models, use them to infer the shapes of our value functions, and plot a path towards their global maximum, even if it requires passing through negative-reward regions (such as engaging in self-reflection and exploration, then figuring out which vocation would be most suitable to a person-like-you).

However, it's hard. We're hybrid systems, combining generally-intelligent planning modules with greedy RL circuitry. The greedy RL circuitry holds a lot of sway. If you keep forcing yourself to do something it assigns negative rewards to, it's going to update your plan-making modules until they stop doing that.

It is much, much easier to keep doing something if every instance of it is pleasurable in itself. If the reward is instead sparse and infrequent, you'd need a lot of "willpower" to keep going (to counteract the negative updates), and accumulating that is a hard problem in itself.

So the natural solution is to plot, or create, a path towards the long-term aspiration such that motion along it would involve receiving immediate positive feedback from your learned and innate reward functions.

A lot of productivity advice reduces to it:

  • Breaking the long-term task into yearly, monthly, and daily subgoals, such that you can feel accomplishment on a frequent basis (instead of only at the end).
  • Using "cross-domain success loops": simultaneously work on several projects, such that you accomplish something worthwhile along at least one of those tracks frequently, and can then harness the momentum from the success along one track into the motivation for continuing the work along other tracks.
    • I. e., sort of trick your reward system into confusing where exactly the reward is coming from.
    • (I think there was an LW post about this, but I don't remember how to find it.)
  • Eating something tasty, or going to a party, or otherwise "indulging" yourself, every time you do something that contributes to your long-term aspiration.
  • Finding ways to make the process at least somewhat enjoyable, through e. g. environmental factors, such as working in a pleasant place, putting on music, using tools that feel satisfying to use, or doing small work-related rituals that you find amusing.
  • Creating social rewards and punishments, such as:
    • Joining a community focused on pursuing the same aspiration as you.
    • Finding "workout buddies".
    • Having friends who'd hold you accountable if you slack off.
    • Having friends who'd cheer you on if you succeed.
  • And, as in Shoshannah's post: searching for activities that are innately enjoyable and happen to move you in the direction of your aspirations.

None of the specific examples here are likely to work for you (they didn't for me). But you might be able to design or find an instance of that general trick that fits you!

(Or maybe not. Sometimes you have to grit your teeth and go through a rewardless stretch of landscape, if you're not willing to budge on your goal/aspiration.)


Other relevant posts:

  • Venkatesh Rao's The Calculus of Grit. It argues for ignoring extrinsic "disciplinary boundaries" (professions, fields) when choosing your long-term aspirations, and instead following an "internal" navigation system when mapping out the shape of the kind-of-thing that someone-like-you is well-suited to doing.
    • Note that this advice goes further than Shoshannah's: in this case, you don't exert any (conscious) control even over the direction you'd like to go, much less your "goal".
    • It's likely to be easier, but the trade-off should be clear.
  • John Wentworth's Plans Are Predictions, Not Optimization Targets. This connection is a bit more rough, but: that post can be generalized to note that any explicit life goals you set for yourself should often be treated as predictions about what goal you should pursue. Recognizing that, you might instead choose to "pursue your goal-in-expectation", which might be similar to Shoshannah's point about "picking a direction, not a goal".
Load More