All of metachirality's Comments + Replies

I don't think the specific part of decision theory where people argue over Newcomb's problem is large enough as a field to be subject to the EMH. I don't think the incentives are awfully huge either. I'd compare it to ordinal analysis, a field which does have PhDs but very few experts in general and not many strong incentives. One significant recent result (if the proof works then the ordinal notation in question would be most powerful proven well-founded) was done entirely by an amateur building off of work by other amateurs (see the section on Bashicu Matrix System):

Yes. Well, almost. Schwarz brings up disposition-based decision theory, which appears similar though might not be identical to FDT, and every paper I've seen on it appears to defend it as an alternative to CDT. There are some looser predecessors to FDT as well, such as Hofstadter's superrationality, but that's too different imo.

Given Schwarz' lack of reference to any paper describing any decision theory even resembling FDT, I'd wager that FDT's obviousness is merely only in retrospect.

Whenever I ask questions like "Is this bullshit or not", I'm not expecting a simple binary yes/no answer and it's meant to be shorthand for a question which is similar but longer and harder to word and asking for a more complex, specified answer.

Right now, I'm mostly taking a look at the thing I looked, and when and if (big if) I get far enough, I'll try to get Russian acquaintances to help me look into it further.

Curious about other personal development paradigms/psychotechnologies. So far I've mostly been trying to follow the book the Mind Illuminated and dabbling in feedbackloop-first rationality and tuning cognitive strategies.

Is there anything about those cases that suggest it should generalize to every decision theorist, or that this is as good a proxy for how much FDT works as the beliefs of earth scientists are for whether the Earth is flat or not?

For instance, your samples consist of a philosopher not specialized in decision theory, one unaccountable PhD, and one single person who is both accountable and specializes in decision theory. Somehow, I feel as if there is a difference between generalizing from that and generalizing from every credentialed expert that one could po... (read more)

What's your explanations of why virtually no published papers defend it and no published decision theorists defend it?  You really think none of them have thought of it or anything in the vicinity? 

My claim is that there is not yet people who know what they are talking about, or more precisely, everyone knows roughly as much about what they are talking about as everyone else.

Again, I'd like to know who these decision theorists you talked to were, or at least what their arguments were.

The most important thing here is how you are evaluating the field of decision theory as a whole, how you are evaluating who counts as an expert or not, and what arguments they make, in enough detail that one can conclude that FDT doesn't work without having to rely on your word.

If you look at philosophers with Ph.Ds who study decision theory for a living, and have a huge incentive to produce original work, none of them endorse FDT.  
I mean like, I can give you some names.  My friend Ethan who's getting a Ph.D was one person.  Schwarz knows a lot about decision theory and finds the view crazy--MacAskill doesn't like it either.

So it's crazy to believe things that aren't supported by published academic papers? I think if your standard for "crazy" is believing something that a couple people in a field too underdeveloped to be subject to the EMH disagree with and that there are merely no papers defending it, not any actively rejecting it, then probably you and roughly every person on this website ever count as "crazy".

Actually, I think an important thing here is that decision theory is too underdeveloped and small to be subject to the EMH, so you can't just go "if this crazy hypoth... (read more)

I wouldn't call a view crazy for just being disbelieved by many people.  But if a view is both rejected by all relevant experts and extremely implausible, then I think it's worth being called crazy!   I didn't call people crazy, instead I called the view crazy.  I think it's crazy for the reasons I've explained, at length, both in my original article and over the course of the debate.  It's not about my particular decision theory friends--it's that the fact that virtually no relevant experts agree with an idea is relevant to an assessment of it.   I'm sure Soares is a smart guy!  As are a lot of defenders of FDT.  Lesswrong selects disproportionately for smart, curious, interesting people.  But smart people can believe crazy things--I'm sure I have some crazy beliefs; crazy in the sense of being unreasonable such that pretty much all rational people would give them up upon sufficient ideal reflection and discussion with people who know what they're talking about. 

The thing I disagree with here most is the claim that FDT is crazy. I do not think it is, in fact, crazy to think it is a good idea to adopt a decision theory whose users generically end up winning in decision problems compared to other decision theories.

I also find it suspicious that that part predicates on the opinions of experts we know nothing about, who presumably learned about FDT primarily from you who thinks FDT is bad, and also sort of assumes that MacAskill is any more of an expert on decision theory than Soares. Do not take this as a personal at... (read more)


(Epistemic status: Plausible position that I don't actually believe in.) The correct answer to the leg-cutting dilemma is that you shouldn't cut it, because actually you will end up existing no matter what because Omega has to simulate you to predict your actions, and it's always possible that you're in the simulation. The fact that you always have to be simulated to be predicted makes up for every apparent decision theory paradox, such as not cutting your leg off even when doing so precludes your existence.

The simulation might be a zombie, though. Ie. you don't consciously exist. Arguments against p zombies don't apply here, because computer simulations aren't physical duplicates.

I actually initially wrote off psychonetics because of its woo sounding name. However upon skimming the thing which I linked, I noticed two things that made me suspect that psychonetics might be worth taking seriously:

First, the safety rules and in particular, discouraging religious or spiritual interpretation and encouraging you to stop doing it if you start feeling "an otherworldly presence" or feeling as though you have magical powers and seek out a psychiatrist. Second, the purported benefits are limited in scope. It does not claim to solve all of your... (read more)

Do you have a link to it or something?

the manual? No, it was similar to what you linked to anyway.

I agree. Right now though, I'm mostly unsure how to act on my knowledge of psychonetics' existence (and the existence of cognitive tuning, for that matter). At least for psychonetics, the sensible first step is probably to just read something on it and maybe distill it in a book review or something. Not sure about cognitive tuning in general since it's not really a thing yet.

If you (or anyone else) are interested in diving into psychonetics or cognitive tuning with me then feel free to contact me, though I can't guarantee anything will come of it because of the turbulent chaos of life (or more realistically my laziness).

I know nothing about naturalism but cognitive tuning (beyond just cognitive strategies) seems like its begging to be expanded upon.

It definitely looks like something that, just on its own, could eventually morph into a written instruction manual for the human brain (i.e. one sufficiently advanced to enable people to save the world).

I find it funny that GPT-4 finds the need to account for the possibility that the densities of uranium or gold might have changed as of September 2021.

It's hedging for the possibility that the isotope ratios are changing over time due to the behaviors of intelligent agents like humans. Or at least that's my headcanon.

A lot of your arguments boil down to "This ignores ML and prosaic alignment" so I think it would be helpful if you explained why ML and prosaic alignment are important.

The obvious reply would be that ML now seems likely to produce AGI, perhaps alongside minor new discoveries, in a fairly short time. (That at least is what EY now seems to assert.) Now, the grandparent goes far beyond that, and I don't think I agree with most of the additions. However, the importance of ML sadly seems well-supported.

how did you figure these things out if they were never published on be well tuned?

I didn't, I'm naming some similar things based on their writing that I went through.

I think the prior for aliens having visited Earth should be lower, since it a priori it seems unlikely to me that aliens would interact with Earth but not to an extent which makes it clear to us that they have. My intuition is that its probably rare to get to other planets with sapient life before building a superintelligence (which would almost certainly be obvious to us if it did arrive) and even if you do manage to go to other planets with sapient life, I don't think aliens would not try to contract us if they're anything like humans.

I have tried meditation a little bit although not very seriously. Everything I've heard about it makes me think it would be a good idea to do it more seriously.

Not sure how to be weird without being unuseful. What does a weird but useful background look like?

Also I've already been trying to read a lot but still somewhat dissatisfied with my pace. You mentioned you could read at 3x your previous speed. How did you do that?

1Jonas Hallgren4mo
I actually read less books than I used to, the 3x thing was that I listen to audiobooks at 3x the speed so I read less non-fiction but at a faster pace. Also weird but useful in my head is for example looking into population dynamics to understand alignment failures. When does ecology predict that mode collapse will happen inside of large language models? Understanding these areas and writing about them is weird but it could also a useful bet for at least someone to take. However, this also depends on how much doing the normal stuff is saturated. I would recommend trying to understand the problems and current approaches really well and then come up with ways of tackling them. To get the bits of information on how to tackle them you might want to check out weirder fields since those bits aren't already in the common pool of "alignment information" if that makes sense?

I am pretty anxious about posting this since this is my first post on LessWrong and also about a pretty confusing topic but I'm probably not well calibrated on this front so oh. Also thanks to NicholasKross for taking a look at my drafts.

What other advice/readings do you have for optimizing your life/winning/whatever?

4Jonas Hallgren4mo
Since you asked so nicely, I can give you two other models.  1. Meditation is like slow recursive self-improvement and reprogramming of your mind. It gives focus & mental health benefits that are hard to get from other places. If you want to accelerate your growth, I think it's really good. A mechanistic model of meditation & then doing the stages in the book The Mind Illuminated will give you this. (at least, this is how it has been for me) 2. Try to be weird and useful. If you have a weird background, you will catch ideas that might fall through the cracks for other people. Yet to make those ideas worth something you have to be able to actually take action on them, meaning you need to know how to, for example, communicate. So try to find the Pareto optimal between weird and useful by following & doing stuff you find interesting, but also valuable and challenging. (Read a fuckton of non-fiction books as well if that isn't obvious. Just look up 30 different top 10 lists and you will have a good range to choose from.)

I think this depends on whether you use SIA or SSA or some other theory of anthropics.

Pardon my ignorance; I don't actually know what SIA and SSA stand for.

I have a strong inside view of the alignment problem and what a solution would look like. The main reason why I don't have an as concrete inside view AI timeline is because I don't know enough about ML and I have to defer to get a specific decade. The biggest gap in my model of the alignment problem is what a solution to inner misalignment would look like, although I think it would be something like trying to find a way to avoid wireheading.

My bad. I'm glad to hear you do have an inside view of the alignment problem. If knowing enough about ML is your bottleneck, perhaps that's something you can directly focus on? I don't expect it to be hard for you -- perhaps only about six months -- to get to a point where you have coherent inside models about timelines.

I've checked out John Wentworth's study guide before, mostly doing CS50.

Part of the reason I'm considering getting a degree is so I can get a job if I want and not have to bet on living rent-free with other rationalists or something.

The people I've talked to the most have timelines centering around 2030. However, I don't have a detailed picture of why because their reasons are capabilities exfohazards. From what I can tell, their reasons are tricks you can implement to get RSI even on hardware that exists right now, but I think most good-sounding tricks do... (read more)

Yeah, that's a hard problem. You seem smart: have you considered finding rationalists or rationalist-adjacent people who want to hire you part-time? I expect that the EA community in particular may have people willing to do so and that would give you both experience (to show future employers / clients), connections (to find more part-time / full-time jobs), and money. You just updated towards shortening your timelines by a decade due to what would be between 5 minutes to half an hour of tree-of-thought style reflection. Your reasoning seems entirely social (that is, dependent on other people's signalled beliefs) too, which is not something I would recommend if you want to do useful alignment research. The problem with relying on social evidence for your beliefs about scientific problems is that you both end up with bad epistemics and end up taking negative expected value actions. First: if other people update their beliefs due to social evidence the same way you do, you are vulnerable to a cascade of belief changes (mundane examples: tulip craze, cryptocurrency hype, NFT hype, cult beliefs) in your social community. This is even worse for the alignment problem because of the significant amount of disagreement in the alignment research community itself about details about the problem. Relying on social reasoning in such an epistemic environment will leave you constantly uncertain due to how uncertain you percieve the community is about core parts of the problem. Next: if you do not have inside models of the alignment problem, you shall fail to update accurately given evidence about the difficulty about the problem. Even if you rely on other researchers who have inside / object-level models and update accurately, there is bound to be disagreement between them. Who do you decide to believe? The first thing I recommend you do is to figure out your beliefs and model of the alignment problem using reasoning at the object-level, without relying on what anyone else think

The difference between an expected utility maximizer using updateless decision theory and an entity who likes the number 1 more than the number 2, or who cannot count past 1, or who has a completely wrong model of the world which nonetheless makes it one-box is that the expected utility maximizer using updateless decision theory wins in scenarios outside of Newcomb's problem where you may have to choose to $2 instead of $1, or have to count amounts of objects larger than 1, or have to believe true things. Similarly, an entity that "acts like they have a choice" generalizes well to other scenarios whereas these other possible entities don't.

Yes, agents whose inner model is counting possible worlds, assigning probabilities and calculating expected utility can be successful in a wider variety of situations than someone who always picks 1. No, thinking like "an entity that "acts like they have a choice"" does not generalize well, since "acting like you have choice" leads you to CDT and two-boxing.
  1. I think getting an extra person to do alignment research can give massive amounts of marginal utility considering how few people are doing it and how it will determine the fate of humanity. We're still in the stage where adding an extra person removes a scarily large amount from p(doom), like up to 10% for an especially good individual person, which probably averages to something much smaller but still scarily large when looking at your average new alignment researcher. This is especially true for agent foundations.
  2. I think it's very possible to solve the
... (read more)
2050? That's quite far off, and it makes sense that you are considering university given you expect to have about two decades. Given such a scenario, I would recommend trying to do a computer science/math major, specifically focusing on the subjects listed in John Wentworth's Study Guide that you find interesting. I expect that three years of such optimized undergrad-level study will easily make someone at least SERI MATS scholar level (assuming they start out a high school student). Since you are interested in agent foundations, I expect you shall find John Wentworth's recommendations more useful since his work seems close to (but not quite) agent foundations. Given your timelines, I expect doing an undergrad (that is, a bachelor's degree) would also give you traditional credentials, which are useful to survive in case you need a job to fund yourself. Honestly, I recommend you simply dive right in if possible. One neglected but extremely useful resource I've found is Stampy. The AGI Safety Fundamentals technical course won't happen until September, it seems, but perhaps you can register your interest for it. You can begin reading the curriculum -- at least the stuff you aren't yet familiar with -- almost immediately. Dive deep into the stuff that interests you. Well, I assume you have already done this, or something close to this, and if that is the case, you can ignore the previous paragraph. If possible, could you go into some detail as to why you expect we will get a superintelligence at around 2050? It seems awfully far to me, and I'm curious as to the reasoning behind your belief.

One-boxers win because they reasoned in their head that one-boxers win because of updateless decision theory or something so they "should" be a one-boxer. The decision is predetermined but the reasoning acts like it has a choice in the matter (and people who act like they have a choice in the matter win.) What carado is saying is that people who act like they can move around the realityfluid tend to win more, just like how people who act like they have a choice in Newcomb's problem and one-box in Newcomb's problem win even though they don't have a choice in the matter.

None of this is relevant. I don't like the "realityfluid" metaphor, either. You win because you like the number 1 more than number 2, or because you cannot count past 1, or because you have a fancy updateless model of the world, or because you have a completely wrong model of the world which nonetheless makes you one-box. You don't need to "act like you have a choice" at all. 

I don't think this matters all that much. In Newcomb's problem, even though your decision is predetermined, you should still want to act as if you can affect the past, specifically Omega's prediction.

There is no "ought" or "should" in a deterministic world of perfect predictors. There is only "is". You are an algorithm and Omega knows how you will act. Your inner world is an artifact that gives you an illusion of decision making. The division is simple: one-boxers win, two-boxers lose, the thought process that leads to the action is irrelevant.

I don't believe something can persuade generals to go to war in a short period of time, just because it's very intelligent.

A few things I've seen give pretty worrying lower bounds for how persuasive a superintelligence would be:

Remember that a superintelligence will be at least several orders of magnitude more persuasive than or Stuart Armstrong.

  Believing this seems central to believing high P(doom). But, I think it's not a coherent enough concept to justify believing it. Yes, some people are far more persuasive than others. But how can you extrapolate that far beyond the distribution we obverse in humans? I do think AI will prove to better than humans at this, and likely much better.  But "much" better isn't the same as "better enough to be effectively treated as magic".

Formal alignment proposals avoid this problem by doing metaethics, mostly something like determining what a person would want if they were perfectly rational (so no cognitive biases or logical errors), otherwise basically omniscient, and had an unlimited amount of time to think about it. This is called reflective equilibrium. I think this approach would work for most people, even pretty terrible people. If you extrapolated a terrorist who commits acts of violence for some supposed greater good, for example, they'd realize that the reasoning they used to de... (read more)

Thank you. If I understand your explanation correctly, you are saying that there are alignment solutions that are rooted in more general avoidance of harm to currently living humans. If these turn out to be the only feasible solutions to the not-killing-all-humans problem, then they will produce not-killing-most-humans as a side-effect. Nuke analogy: if we cannot build/test a bomb without igniting the whole atmosphere, we'll pass on bombs altogether and stick to peaceful nuclear energy generation. It seems clear that such limiting approaches would be avoided by rational actors under winner-take-all dynamics, so long as other approaches remain that have not yet been falsified. Follow-up Question: does the "any meaningfully useful AI is also potentially lethal to its operator" assertion hold under the significantly different usefulness requirements of a much smaller human population? I'm imagining limited AI that can only just "get the (hard) job done" of killing most people under the direction of its operators, and then support a "good enough" future for the remaining population, which isn't the hard part because the Earth itself is pretty good at supporting small human populations.

To the first one, they aren't actually suffering that much or experiencing anything they'd rather not experience because they're continuous with you and you aren't suffering.

I don't actually think a simulated human would be continuous in spacetime with the AI because the computation wouldn't be happening inside of the qualia-having parts of the AI.

I think what defines a thing as a specific qualia-haver is not what information it actually holds but how continuous it is with other qualia-having instances in different positions of spacetime. I think that mental models are mostly continuous with the modeler so you can't actually kill them or anything. In general, I think you're discounting the importance that the substrate of a mental model/identity/whatever has. To make an analogy, you're saying the prompt is where the potential qualia-stuff is happening, and isn't merely a filter on the underlying language model.

1Nox ML5mo
One of my difficulties with this is that it seems to contradict one of my core moral intuitions, that suffering is bad. It seems to contradict it because I can inflict truly heinous experiences onto my mental models without personally suffering for it, but your point of view seems to imply that I should be able to write that off just because the mental model happens to be continuous in space-time to me. Or am I misunderstanding your point of view? To give an analogy and question of my own, what would you think about an alien unaligned AI simulating a human directly inside its own reasoning center? Such a simulated human would be continuous in spacetime with the AI, so would you consider the human to be part of the AI and not have moral value of their own?

My immediate thought is that the cat is already out of the bag and whatever risk there was of AI safety people accelerating capabilities is nowadays far outweighed by capabilities hype and in general, much larger incentives, and that the most we can do is to continue to build awareness of AI risk. Something about this line of reasoning strikes me as uncritical though.

Probably not the best person on this forum when it comes to either PR or alignment but I'm interested enough, if only about knowing your plan, that I want to talk to you about it anyways.

PM's are always open, my guy

Will the karma thing affect users who've joined before a certain period of time? Asking this because I joined quite a while ago but have only 4 karma right now.

It's likely to apply starting roughly 4 months ago (i.e. when ChatGPT was released). But, this is just a guess and we may make changes to the policy.

That's not really specific enough. I would describe it as someone being really angry about something, contingent on a certain belief being true, but then when you ask them why they believe that belief, its very weak evidence or something that is the opposite of an open and shut case or something that could vary depending on context and so on and so forth.