Wiki Contributions

Comments

Interesting. The main thing that pops out for me is that it feels like your story is descriptive while we try to be normative? I.e. it's not clear to me from what you say whether you would recommend to humans to act in this cooperative way towards distant aliens, but you seem to expect that they will do/are doing so. Meanwhile, I would claim that we should act cooperatively in this way but make no claims about whether humans actually do so.

Does that seem right to you or am I misunderstanding your point?

I'm not sure I understand exactly what you're saying, so I'm just gonna write some vaguely related things to classic acausal trade + ECL:

 

I'm actually really confused about the exact relationship between "classic" prediction-based acausal trade and ECL. And I think I tend to think about them as less crisply different than others. I've tried to unconfuse myself about that for a few hours some months ago and just ended up with a mess of a document. Some intuitive way to differentiate them:

  • ECL leverages the correlation between you and the other agent "directly."
  • "Classic" prediction-based acausal trade leverages the correlation between you and the other agent's prediction of you. (Which, intuitively, they are less in control of than their decision-making.

--> This doesn't look like a fundamental difference between the mechanisms (and maybe there are in-betweeners? But I don't know of any set-ups) but like...it makes a difference in practice or something?

 

On the recursion question:

I agree that ECL has this whole "I cooperate if I think that makes it more likely that they cooperate", so there's definitely also some prediction flavoured thing going on and often, the deliberation about whether they'll be more likely to cooperate when you do will include "they think that I'm more likely to cooperate if they cooperate". So it's kind of recursive.

Note that ECL at least doesn't strictly require that. You can in principle do ECL with rocks "My world model says that conditioning on me taking action X, the likelihood of this rock falling down is higher than if I condition on taking action Y." Tbc, if action X isn't "throw the rock" or something similar, that's a pretty weird world model.  You probably can't do "classic" acausal trade with rocks?

 

Some more not well-in-order not thought-out somewhat incoherent thinking-out-loud random thoughts and intuitions:

More random and less coherent: Something something about how when you think of an agent using some meta-policy to answer the question "What object-level policy should I follow?", there's some intuitive sense in which ECL is recursive in the meta-policy while "classic" acausal trade is recursive in the object-level policy. I'm highly skeptical of this meta-policy object-level policy thing making sense though and also not confident in what I said about which type of trade is recursive in what.

Another intuitive difference is that with classic acausal trade, you usually want to verify whether the other agent is cooperating. In ECL you don't. Also, something something about how it's great to learn a lot about your trade partner for classic acausal trade and it's bad for ECL? (I suspect that there's nothing actually weird going on here and that this is because it's about learning different kinds of things. But I haven't thought about it enough to articulate the difference confidently and clearly.)

The concept of commitment race doesn't seem to make much sense when thinking just about ECL and maybe nailing down where the difference comes from is interesting?

Thanks! I actually agree with a lot of what you say. Lack of excitement about existing intervention ideas is part of the reason why I'm not all in on this agenda at the moment. Although in part I'm just bottlenecked by lack of technical expertise (and it's not like people had great ideas for how to align AIs at the beginning of the field...), so I don't want people to overupdate from "Chi doesn't have great ideas."

With that out of the way, here are some of my thoughts:

  • We can try to prevent silly path-dependencies in (controlled or uncontrolled i.e. misaligned) AIs. As a start, we can use DT benchmarks to study how DT endorsements and behaviour change under different conditions and how DT competence scales with size compared to other capabilities. I think humanity is unlikely to care a ton about AI's DT views and there might be path-dependencies. So like, I guess I'm saying I agree with "let's try to make the AI philosophically competent."
    • This depends a lot on whether you think there are any path-dependencies conditional on ~solving alignment. Or if humanity will, over time, just be wise enough to figure everything out regardless of the starting point.
    • One source of silly path-dependencies is if AIs' native DT depends on the training process and we want to de-bias against that. (See for example this or this for some research on what different training processes should incentivise.) Honestly, I have no idea how much things like that matter. Humans aren't all CDT even though my very limited understanding of evolution is that it should, in the limit, incentivise CDT.
    • I think depending on what you think about the default of how AIs/AI-powered earth-originating civilisation will arrive at conclusions about ECL, you might think some nudging towards the DT views you favour is more or less justified. Maybe we can also find properties of DTs that we are more confident in (e.g. "does this or that in decision problem X" than whole specified DTs, which, yeah, I have no clue. Other than "probably not CDT."
  • If the AI is uncontrolled/misaligned, there are things we can do to make it more likely it is interested in ECL, which I expect to be net good for the agents I try to acausally cooperate with. For example, maybe we can make misaligned AI's utility function more likely to have diminishing returns or do something else that would make its values more porous. (I'm using the term in a somewhat broader way than Bostrom.)
    • This depends a lot on whether you think we have any influence over AIs we don't fully control.
  • It might be important and mutable that future AIs don't take any actions that decorrelate them with other agents (i.e. does things that decrease the AI's acausal influence) before they discover and implement ECL. So, we might try to just make it aware of that early.
    • You might think that's just not how correlation or updatelessness work, such that there's no rush. Or that this is a potential source of value loss but a pretty negligible one.
  • Things that aren't about making AIs more likely to do ECL: Something not mentioned, but there might be some trades that we have to do now. For example, maybe ECL makes it super important to be nice to AIs we're training. (I am mostly lean no on this question (at least for "super important") but it's confusing.) I also find it plausible we want to do ECL with other pre-ASI civilisations who might or might not succeed at alignment and, if we succeed and they fail, part-optimise for their values. It's unclear to me whether this requires us to get people to spiritually commit to this now before we know whether we'll succeed at alignment or not. Or whether updatelessness somehow sorts this because if we (or the other civ) were to succeed at alignment, we would have seen that this is the right policy, and done this retroactively.

Yeah, you're right that we assume that you care about what's going on outside the lightcone! If that's not the case (or only a little bit the case), that would limit the action-relevance of ECL.

(That said, there might be some weird simulations-shenanigans or cooperating with future earth-AI that would still make you care about ECL to some extent although my best guess is that they shouldn't move you too much. This is not really my focus though and I haven't properly thought through ECL for people with indexical values.)

Whoa, I didn't know about this survey, pretty cool! Interesting results overall.

It's notable that 6% of people also report they'd prefer absolute certainty of hell over not existing, which seems totally insane from the point of view of my preferences. The 11% that prefer a trillion miserable sentient beings over a million happy sentient beings also seems wild to me. (Those two questions are also relatively more correlated than the other questions.)

Thanks, I hadn't actually heard of this one before!

edit: Any takes on addictiveness/other potential side effects so far?

First of all:  Thanks for asking. I was being lazy with this and your questions forced me to come up with a response which forced me to actually think about my plan.

Concrete changes

1) I'm currently doing week-daily in-person Pomodoro co-working with a friend, but I had planned that before this post IIRC, and definitely know for a while that that's a huge boost for me.

In-person co-working and the type of work I do seem somewhat situational/hard to sustain/hard to quickly change sometimes. For some reason, (perhaps because I feel a bit meh about virtual co-working) I've never tried Focusmate and this made me more likely to try it in the future if and when my in-person co-working fizzled out.

2) The things that were a high mix of resonating with me and new were "Identifying as hard-working" and "Finding ways of reframing work as non-work" (I was previously aware that often things would be fun if I didn't think of them as work and are "Ugh" as soon as there are work, but just knowing that there is another person who is successfully managing this property of theirs is really encouraging and helpful for thinking about solutions to this.)

Over the last few months, I've introduced the habit of checking in with myself at various times during the day and especially when I'm struggling with something (kind of spontaneous mini meditations). I'm hoping that I can piggy-back on that to try out the identity and reframing thing. (Although this comment just prompted be to actually go and write those down on post-its and hang them where I can see them, so I don't forget, so thanks for asking!)

3) I am currently testing out having a productive hobby for my weekends. (This ties into not reframing work things as "not work".  Also, I am often strict with my weekends in a way that I wanna experiment with relaxing given one of the responses I got.  Also prompted by the concept of doing something enjoyable and rewarding to regenerate instead of resting.) I'll monitor the effects on my mental health on that quite closely because I think it could end up quite badly but has been fun this weekend.

3.5) I often refrain from doing work things I feel energy and motivation for because it's too late in the day or otherwise "not work-time". I think this overall serves me well in various ways. But as a result of this post, I am more likely to try relaxing this into the future a bit. I am already tracking my work and sleep hours, so hopefully, that will give me some basis to check how it affects my productivity.  (And also 4 will hopefully help.)

4) Not directly as a consequence of this post, but related: I started thinking about how to set work targets for different time intervals and consistently setting and reviewing work targets. (It was kind of crazy to realise that I don't already do this! Plans ≠ Targets.) This is a priority for me at the moment and I am interviewing people about this. I expect this to feed into this whole hard-working topic and maybe some of the responses about working hard will influence how I go about this.

Other minor updates or things that I won't try immediately but that I'm more likely to try in the future now:

  • Decided not to prioritise improving diet, exercise, and sleep for the sake of becoming more hard-working.
  • Not being frustrated that there is no magical link: general growth as a person --> more hard-working
  • Maybe: Using the Freedom App (I've made good experiences with Cold Turkey but it's not on my phone.)
  • Maybe: Doing more on paper
  • Maybe: Kanban boards
  • Maybe: Meetings with myself
  • Maybe: Experiment with stimulants (I can get them prescribed but dropped them for various reasons)

Some overall remarks

My biggest update was just learning about people permanently becoming more hard-working at all well into their 20s through means that aren't only either meds or changing roles, meaning there is a point to me trying more non-med things that might increase how hard-working I am in the short-term. Previously, I was really unsure to which degree hard-workingness might just be a very stable trait across a lifetime. At least if you don't drastically change the kind of work you do or your work environment in ways that are difficult to actually pull off.  Tbf, I'm still not sure but am more hopeful than previously.

From that point of view, I found the people who mentioned having a concrete, time-constrained period where they were much more hard-working than previously for some reason and then keeping this going forward even when ~everything about their work situation changed really encouraging.

For context: I tracked my work hours for roughly a year. My week-to-week tends to be very heterogenous and through the tracking, I realised that none of the things I tracked during that year seemed to have any relationship to how much I work week-to-week other than having hard "real" deadlines and the overall trend was very flat, which felt a bit discouraging.

Thank you. This answer was both insightful and felt like a warm hug somehow.

Thanks for posting this! I really enjoyed the read.

 

Feedback on the accompanying poll: I was going to fill it out. Then saw that I have to look up and list the titles I can (not) relate to instead of just being able to click "(strongly) relate/don't relate" on a long list of titles. (I think the relevant function for this in forms is "Matrix" or something). And my reaction was "ugh, work". I think I might still fill it in but I'm muss less likely to. If others feel the same, maybe you wanna change the poll?

Load More